Re: [Scikit-learn-general] Topic extraction

2015-04-29 Thread Lee Zamparo
As Vlad suggests, the number of topics is a hyper-parameter, and you can optimize the value using cross-validation.  Though there are other hyper-parameter estimation methods in sklearn I think.  There are also many other closely related projects which could wrap your NMF and report back the id

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-05 Thread Lee Zamparo
With respect to Gaussian processes, there are some good packages in python already (https://github.com/SheffieldML/GPy, https://github.com/dfm/george, probably others). In particular, GPy does not require any other dependencies over and above those already required by sklearn. Maybe a reasonable

Re: [Scikit-learn-general] NIPS

2014-11-18 Thread Lee Zamparo
I'll be there for the conference and workshops. On Tue, Nov 18, 2014 at 2:36 PM, Kyle Kastner wrote: > I will be there for everything - glad to meet up before, during, and after! > Be warned it already started snowing here and is pretty cold... feels like > -10 C today according to weather.com. >

Re: [Scikit-learn-general] [GSOC] blogging progress

2014-05-20 Thread Lee Zamparo
Excellent first post Hamezeh, well done. Looking forward to reading more as the GSOC progresses. Lee. On Tue, May 20, 2014 at 10:40 AM, Olivier Grisel wrote: > To all GSOC students, > > Hamzeh recently published a first blog post about his GSOC: > > http://hamzehgsoc.blogspot.fr/2014/05/sparse-

Re: [Scikit-learn-general] GridSearchCV + Pipeline with v_measure

2014-05-15 Thread Lee Zamparo
er = make_scorer(v_measure_score, labels_pred=kmeans.predict) > does what you think it does. > > You should stick to > v_measure_scorer = make_scorer(v_measure_score) > > > > On 15 May 2014 22:11, Lee Zamparo wrote: >> >> Seems the estimator.fit method needs the

Re: [Scikit-learn-general] GridSearchCV + Pipeline with v_measure

2014-05-15 Thread Lee Zamparo
lt this morning. Thanks for all your help, L. On Wed, May 14, 2014 at 11:12 AM, Lee Zamparo wrote: > Combining the helpful suggestions of Andy & Joel I'm tyring the following: > > # Make a scoring function for the pipeline > v_measure_scorer = > make_scorer(v_measure_sc

Re: [Scikit-learn-general] GridSearchCV + Pipeline with v_measure

2014-05-14 Thread Lee Zamparo
t; >> Then construct the GridSearchCV as: >> >> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >> scoring=score_clusters) >> >> It seems like there should be more predefined scorers available for >> clustering... >> >> Cheers, >> &

[Scikit-learn-general] GridSearchCV + Pipeline with v_measure

2014-05-13 Thread Lee Zamparo
Hi, I'm trying to use GridSearchCV and Pipeline to tune the gamma parameter of kernel PCA. I'd like to use kernel PCA to transform the data, followed by kmeans to cluster the data, followed by v-measure to measure the goodness of fit of the clustering. Here's the relevant snippet of my script --

Re: [Scikit-learn-general] PCA with missing data

2014-03-07 Thread Lee Zamparo
Hi Vijay, You would need to impute your missing values first to use the implementation of PCA in scikit-learn. Alternatively, you could roll your own (or find a package somewhere) for a Probabilistic PCA that *can* handle missing values in the data. Hope this helps, Lee. On Fri, Mar 7, 2014 at

Re: [Scikit-learn-general] Tackling Dataset bias

2013-08-19 Thread Lee Zamparo
Thanks for the pointers Peter. I'm doing an unrelated project on covariate shift, and this will be really useful. Lee. On Mon, Aug 19, 2013 at 12:46 PM, Peter Prettenhofer wrote: > Hi Yogesh, > > the work by John Blitzer that I mentioned used the second approach -- its > described here: > > Bli

Re: [Scikit-learn-general] My talk has been accepted at PyCon AU!

2013-04-28 Thread Lee Zamparo
Congrats Robert! On Sun, Apr 28, 2013 at 7:56 AM, Robert Layton wrote: > I just received some good news. My talk "scikit-learn, machine learning > and cybercrime attribution" has been accepted! > > I'll be presenting between the 5th and 7th of July. For those that missed > the previous emails,

Re: [Scikit-learn-general] Participation in GSoC 2013

2013-03-26 Thread Lee Zamparo
AFAIK, you might not want all the missing values to be imputed at once, especially if the dimensions of X are large. Maybe something like: X_transformed = estimator.fit_transform(X) # X contains missing values X_subset = estimator.inverse_transform(X_transformed,row_subset) # impute only a subset

Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-04 Thread Lee Zamparo
+1 for the table of algorithms as discussed from a previous thread +1 also for the top level menu items, for a flatter hierarchy. On Mon, Mar 4, 2013 at 5:03 PM, Andreas Mueller wrote: > On 03/04/2013 10:59 PM, Robert Layton wrote: > > Sounds like some great changes. > > > > For the algorithm

Re: [Scikit-learn-general] GSoC 2012 pre-application

2012-04-05 Thread Lee Zamparo
Olivier Grisel wrote: > Le 2 avril 2012 18:06, Lee Zamparo a écrit : >> >> Regarding the suggested additions, I'm interested in Olivier's >> suggestion of Power Iteration Clustering, and seeing how it fares >> against kernel K-means as well as the convex exem

Re: [Scikit-learn-general] GSoC 2012 pre-application

2012-04-02 Thread Lee Zamparo
Hi everyone, Thanks for all your comments on my proposal. I apologize for not responding earlier, and I'll try to address each of your concerns or comments in this mail. @Olivier: my git hub account is lzamparo. I don't have any prior Cython development experience, but I do have some exposure t

[Scikit-learn-general] GSoC 2012 pre-application

2012-03-29 Thread Lee Zamparo
Hello everyone, I'm a prospective applicant to GSoC 2012, and am drafting a proposal. I would really appreciate if you could spare some time to give me feedback. My proposal is centred around sklearn.cluster, so I would like to ask Andreas Muller, Olivier Grisel or Lars Buitinck if they would con