Re: [Scikit-learn-general] GSOC 2012

2012-03-07 Thread David Warde-Farley
On 2012-03-07, at 8:45 AM, Olivier Grisel wrote: > I think was DWF calls sparse coding is the LASSO implemented with > coordinate descent (sparse coding with a fixed dictionary). Indeed, the encoding step rather than the dictionary learning step. When benchmarking unsupervised feature learning m

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-07 Thread Alexandre Gramfort
> I guess Alex and I can mentor. approved Alex -- Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be deli

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-07 Thread Gael Varoquaux
On Wed, Mar 07, 2012 at 03:11:09PM +0100, Olivier Grisel wrote: > > Obviously doing this right is quite a lot of work. I think that my group > > could invest  some efforts in this direction. We were starting to discuss > > this a bit. > Multinomial logistic regression with a tweaked coordinate des

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-07 Thread Olivier Grisel
2012/3/6 Gael Varoquaux : > Yes, I was about to answer the same thing: SGD is great when n_samples > > n_features, but the situation n_samples << n_features also exists. > > In such situation, I believe that a cyclic coordinate descent with a > clever way of choosing the coordinates is the fastest

Re: [Scikit-learn-general] GSOC 2012

2012-03-07 Thread Alexandre Gramfort
> We could indeed merge both proposals in a single GSoC proposal, but I > would like to keep them as 2 separate steps with the two examples: > group lens movie recommendation and out-of-core NMF for topic modeling > on wikipedia text. > > We could also work on making the MiniBatchSparseDirectionary

Re: [Scikit-learn-general] GSOC 2012

2012-03-07 Thread Olivier Grisel
2012/3/6 Mathieu Blondel : > Even if they would be useful, I'd rather avoid projects like > "maintenance" or "speed things up". I think projects with a > well-identified goal are more likely to be accepted by the PSF. > > I like Olivier's proposals for SGD-based low-rank and non-negative > matrix f

Re: [Scikit-learn-general] GSOC 2012

2012-03-07 Thread Olivier Grisel
2012/3/6 Alexandre Gramfort : >> a) sparse coding is about 2 orders of magnitude slower than competing >>   implementations right now, making it kind of useless except in toy >>   1996-sized situations (I'm supposed to find a way to benchmark >>   this for Alex, but I can tell you that the situatio

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-07 Thread Alexandre Gramfort
> I love that :) > Then I can finally put my MLP code somewhere ;) give it a start then. the convention should be that the gist contains 1 file with an "if __name__ == '__main__':" that contains an example that people can try. It can/should also contain tests function that start with test_ Alex

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-07 Thread Andreas
On 03/07/2012 09:16 AM, Alexandre Gramfort wrote: >> For n_features> n_samples, I believe that coordinate descent is >> faster in the dual. A primal coordinate descent needs to optimize one >> w_i at a time. Therefore, if your data is high dimensional it can take >> time. Liblinear implements shri

Re: [Scikit-learn-general] HMM Documentation and Development

2012-03-07 Thread Gael Varoquaux
Hi David, There is a pull request that improves a lot the HMM implementation and documentation: https://github.com/scikit-learn/scikit-learn/pull/538 It should be merged anytime. Actually, I almost feel like merging it _now_ as it does improve the situation quite a lot. Additional help on docume

[Scikit-learn-general] HMM Documentation and Development

2012-03-07 Thread David Kadish
Hi, As part of my masters project, I'm going to be working pretty heavily with numpy/scipy and machine learning techniques like SVM and HMM. I noticed that the current HMM implementation lacks docs and has some stability issues and might be removed shortly. I'm interested in trying to revive it, b

Re: [Scikit-learn-general] Interpretation of LogisticRegression coefficients in multiclass case

2012-03-07 Thread Alexandre Gramfort
> For n_features > n_samples, I believe that coordinate descent is > faster in the dual. A primal coordinate descent needs to optimize one > w_i at a time. Therefore, if your data is high dimensional it can take > time. Liblinear implements shrinking to avoid revisiting some > coordinates. Maybe th