As Vlad suggests, the number of topics is a hyper-parameter, and you can
optimize the value using cross-validation. Though there are other
hyper-parameter estimation methods in sklearn I think. There are also many
other closely related projects which could wrap your NMF and report back the
id
With respect to Gaussian processes, there are some good packages in
python already (https://github.com/SheffieldML/GPy,
https://github.com/dfm/george, probably others). In particular, GPy
does not require any other dependencies over and above those already
required by sklearn.
Maybe a reasonable
I'll be there for the conference and workshops.
On Tue, Nov 18, 2014 at 2:36 PM, Kyle Kastner wrote:
> I will be there for everything - glad to meet up before, during, and after!
> Be warned it already started snowing here and is pretty cold... feels like
> -10 C today according to weather.com.
>
Excellent first post Hamezeh, well done. Looking forward to reading
more as the GSOC progresses.
Lee.
On Tue, May 20, 2014 at 10:40 AM, Olivier Grisel
wrote:
> To all GSOC students,
>
> Hamzeh recently published a first blog post about his GSOC:
>
> http://hamzehgsoc.blogspot.fr/2014/05/sparse-
er = make_scorer(v_measure_score, labels_pred=kmeans.predict)
> does what you think it does.
>
> You should stick to
> v_measure_scorer = make_scorer(v_measure_score)
>
>
>
> On 15 May 2014 22:11, Lee Zamparo wrote:
>>
>> Seems the estimator.fit method needs the
lt this morning.
Thanks for all your help,
L.
On Wed, May 14, 2014 at 11:12 AM, Lee Zamparo wrote:
> Combining the helpful suggestions of Andy & Joel I'm tyring the following:
>
> # Make a scoring function for the pipeline
> v_measure_scorer =
> make_scorer(v_measure_sc
t;
>> Then construct the GridSearchCV as:
>>
>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>> scoring=score_clusters)
>>
>> It seems like there should be more predefined scorers available for
>> clustering...
>>
>> Cheers,
>>
&
Hi,
I'm trying to use GridSearchCV and Pipeline to tune the gamma
parameter of kernel PCA. I'd like to use kernel PCA to transform the
data, followed by kmeans to cluster the data, followed by v-measure to
measure the goodness of fit of the clustering.
Here's the relevant snippet of my script
--
Hi Vijay,
You would need to impute your missing values first to use the
implementation of PCA in scikit-learn. Alternatively, you could roll
your own (or find a package somewhere) for a Probabilistic PCA that
*can* handle missing values in the data.
Hope this helps,
Lee.
On Fri, Mar 7, 2014 at
Thanks for the pointers Peter. I'm doing an unrelated project on
covariate shift, and this will be really useful.
Lee.
On Mon, Aug 19, 2013 at 12:46 PM, Peter Prettenhofer
wrote:
> Hi Yogesh,
>
> the work by John Blitzer that I mentioned used the second approach -- its
> described here:
>
> Bli
Congrats Robert!
On Sun, Apr 28, 2013 at 7:56 AM, Robert Layton wrote:
> I just received some good news. My talk "scikit-learn, machine learning
> and cybercrime attribution" has been accepted!
>
> I'll be presenting between the 5th and 7th of July. For those that missed
> the previous emails,
AFAIK, you might not want all the missing values to be imputed at once,
especially if the dimensions of X are large. Maybe something like:
X_transformed = estimator.fit_transform(X) # X contains missing values
X_subset = estimator.inverse_transform(X_transformed,row_subset) # impute
only a subset
+1 for the table of algorithms as discussed from a previous thread
+1 also for the top level menu items, for a flatter hierarchy.
On Mon, Mar 4, 2013 at 5:03 PM, Andreas Mueller wrote:
> On 03/04/2013 10:59 PM, Robert Layton wrote:
> > Sounds like some great changes.
> >
> > For the algorithm
Olivier Grisel
wrote:
> Le 2 avril 2012 18:06, Lee Zamparo a écrit :
>>
>> Regarding the suggested additions, I'm interested in Olivier's
>> suggestion of Power Iteration Clustering, and seeing how it fares
>> against kernel K-means as well as the convex exem
Hi everyone,
Thanks for all your comments on my proposal. I apologize for not
responding earlier, and I'll try to address each of your concerns or
comments in this mail.
@Olivier: my git hub account is lzamparo. I don't have any prior
Cython development experience, but I do have some exposure t
Hello everyone,
I'm a prospective applicant to GSoC 2012, and am drafting a proposal.
I would really appreciate if you could spare some time to give me
feedback. My proposal is centred around sklearn.cluster, so I would
like to ask Andreas Muller, Olivier Grisel or Lars Buitinck if they
would con
16 matches
Mail list logo