Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Olivier Grisel
2012/9/5 Ark : > >> How large (in bytes and in which format)? What are n_samples, >> n_features and n_classes? >> > > Input data is in the form of paragraphs from English literature > n_samples=1, n_features=100,000, n_classes=max 100[still collecting data] Hand how large in bytes? It seems th

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Ark
Ark writes: > > > > How large (in bytes and in which format)? What are n_samples, > > n_features and n_classes? > > > > Input data is in the form of paragraphs from English literature So, raw data -> Countvectorizer -> test, train set -> sgd.fit -> predict is the flow. > n_samples=1,

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Yaroslav Halchenko
it might well was a buginess of numpy 1.7.0b1 (in my case I had 6GB avail). On Wed, 05 Sep 2012, Andreas Mueller wrote: > On 09/05/2012 09:21 PM, Jake Vanderplas wrote: > > I ran into this problem a few weeks ago on the clustering example - I > > figured it was just due to my under-powered netboo

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Ark
> How large (in bytes and in which format)? What are n_samples, > n_features and n_classes? > Input data is in the form of paragraphs from English literature n_samples=1, n_features=100,000, n_classes=max 100[still collecting data]

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Andreas Mueller
On 09/05/2012 09:21 PM, Jake Vanderplas wrote: > I ran into this problem a few weeks ago on the clustering example - I > figured it was just due to my under-powered netbook. If you reduce > n_samples in plot_cluster_comparison.py (from 1500 to, say, 500), it > should run without a problem. Perhap

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Olivier Grisel
2012/9/5 Ark : > What would be the best approach to classify a large dataset with sparse > features, into multiple categories. How large (in bytes and in which format)? What are n_samples, n_features and n_classes? > I referred to the multiclass page in the > sklearn documentation, but was no

[Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Ark
What would be the best approach to classify a large dataset with sparse features, into multiple categories. I referred to the multiclass page in the sklearn documentation, but was not sure on which one to use for multiclass probabilities [top n probabilities would be nice]. I tried usin

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Jake Vanderplas
I ran into this problem a few weeks ago on the clustering example - I figured it was just due to my under-powered netbook. If you reduce n_samples in plot_cluster_comparison.py (from 1500 to, say, 500), it should run without a problem. Perhaps we should think about doing that in master, so th

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Yaroslav Halchenko
indeed seems to build fine with numpy 1.6.2... so watchout -- if someone has time to look into it, now would be a good time to raise concerns if any exist regarding upcoming numpy 1.7 release. Cheers On Wed, 05 Sep 2012, Yaroslav Halchenko wrote: > have anyone ran into similar a situation that d

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Yaroslav Halchenko
have anyone ran into similar a situation that documentation fails to build since running examples while doing sphinx requires too much RAM (or how many GB should be normally present? ;))? in my case I got: [ 2489.091989] Out of memory: Kill process 17211 (sphinx-build) score 792 or sacrifice

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Andreas Müller
Sorry, forgot to tag the final version. Will be fixed in a minute. - Ursprüngliche Mail - Von: "0.12 release of scikit-learn" An: [email protected] Gesendet: Mittwoch, 5. September 2012 18:10:03 Betreff: Re: [Scikit-learn-general] ANN: scikit-learn 0.12 tag me !

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread 0.12 release of scikit-learn
tag me ! push me ! -- scikit-learn -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will in

Re: [Scikit-learn-general] Multiple Output Regression Trees on Sklearn

2012-09-05 Thread Andreas Müller
I think this is correct, the covariance is not accounted for. This is "the right thing to do" if you optimize for hamming loss. - Ursprüngliche Mail - Von: "Flavio Vinicius" An: [email protected] Gesendet: Mittwoch, 5. September 2012 17:18:15 Betreff: Re: [Scikit

Re: [Scikit-learn-general] Multiple Output Regression Trees on Sklearn

2012-09-05 Thread Flavio Vinicius
You mention that: " In our case, when computing the impurity score with respect to a potential split, we simply average the impurity scores with respect to each output." So what you are saying is that you do not account for the covariance of outputs directly. This is somewhat account for when aver

Re: [Scikit-learn-general] Multiple Output Regression Trees on Sklearn

2012-09-05 Thread Gilles Louppe
Hi Flavio, This is similar to [1, section 2.2.2 § "Learning"]. You can also find a complete description in our user guide [2]. [1]: http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2009/DMWG09/dumont-visapp09-shortpaper.pdf [2]: http://scikit-learn.org/dev/modules/tree.html#multi-output-

[Scikit-learn-general] Multiple Output Regression Trees on Sklearn

2012-09-05 Thread Flavio Vinicius
Hello all, I just read the release announcement, congratulations! One new caught my attention was: Regression Trees/Forests which support multiple outputs. Can someone point out any reference (papers) which this implementation was based on? For a while in the past I experimented with the Multivar

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Gilles Louppe
Congratulations! Thanks everyone for the good work! On Wednesday, 5 September 2012, bthirion wrote: > Congrats ! And again, thanks, Andy, > > B > > On 09/05/2012 12:38 AM, Andreas Mueller wrote: > > Dear fellow Pythonistas. > I am pleased to announce the release of scikit-learn 0.12. > This relea

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Peter Prettenhofer
2012/9/5 Andreas Mueller : > On 09/05/2012 09:15 AM, Peter Prettenhofer wrote: >> 2012/9/5 Peter Prettenhofer : >>> 2012/9/5 Andreas Mueller : On 09/05/2012 08:48 AM, Peter Prettenhofer wrote: > 2012/9/5 Lars Buitinck : >> 2012/9/5 Andreas Mueller : >>> I am pleased to announce the

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Andreas Mueller
On 09/05/2012 09:15 AM, Peter Prettenhofer wrote: > 2012/9/5 Peter Prettenhofer : >> 2012/9/5 Andreas Mueller : >>> On 09/05/2012 08:48 AM, Peter Prettenhofer wrote: 2012/9/5 Lars Buitinck : > 2012/9/5 Andreas Mueller : >> I am pleased to announce the release of scikit-learn 0.12.

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Andreas Mueller
On 09/05/2012 08:58 AM, Peter Prettenhofer wrote: > 2012/9/5 Andreas Mueller : >> On 09/05/2012 08:48 AM, Peter Prettenhofer wrote: >>> 2012/9/5 Lars Buitinck : 2012/9/5 Andreas Mueller : > I am pleased to announce the release of scikit-learn 0.12. > This release adds several new featu

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Peter Prettenhofer
2012/9/5 Peter Prettenhofer : > 2012/9/5 Andreas Mueller : >> On 09/05/2012 08:48 AM, Peter Prettenhofer wrote: >>> 2012/9/5 Lars Buitinck : 2012/9/5 Andreas Mueller : > I am pleased to announce the release of scikit-learn 0.12. > This release adds several new features, for example >>>

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Peter Prettenhofer
2012/9/5 Andreas Mueller : > On 09/05/2012 08:48 AM, Peter Prettenhofer wrote: >> 2012/9/5 Lars Buitinck : >>> 2012/9/5 Andreas Mueller : I am pleased to announce the release of scikit-learn 0.12. This release adds several new features, for example multidimensional scaling (MDS), mul

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Andreas Mueller
On 09/05/2012 08:48 AM, Peter Prettenhofer wrote: > 2012/9/5 Lars Buitinck : >> 2012/9/5 Andreas Mueller : >>> I am pleased to announce the release of scikit-learn 0.12. >>> This release adds several new features, for example >>> multidimensional scaling (MDS), multi-task Lasso >>> and multi-output

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Peter Prettenhofer
2012/9/5 Lars Buitinck : > 2012/9/5 Andreas Mueller : >> I am pleased to announce the release of scikit-learn 0.12. >> This release adds several new features, for example >> multidimensional scaling (MDS), multi-task Lasso >> and multi-output decision and regression forests. > > Thanks for all the

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Matthieu Brucher
2012/9/5 Nelle Varoquaux > > > On 5 September 2012 09:28, Matthieu Brucher wrote: > >> >> >> 2012/9/5 Nelle Varoquaux >> >>> >>> >>> On 5 September 2012 08:08, Matthieu Brucher >>> wrote: >>> Excellent work! I have a question on MDS. Is it the classic MDS or something else? (Aski

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Nelle Varoquaux
On 5 September 2012 09:28, Matthieu Brucher wrote: > > > 2012/9/5 Nelle Varoquaux > >> >> >> On 5 September 2012 08:08, Matthieu Brucher >> wrote: >> >>> Excellent work! >>> I have a question on MDS. Is it the classic MDS or something else? >>> (Asking the question as PCA is the classic MDS). It

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Matthieu Brucher
2012/9/5 Nelle Varoquaux > > > On 5 September 2012 08:08, Matthieu Brucher wrote: > >> Excellent work! >> I have a question on MDS. Is it the classic MDS or something else? >> (Asking the question as PCA is the classic MDS). It seems to be when the >> distance matrix is Euclidean? >> > > It is in

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Alexandre Gramfort
congrats sklearners and Andy for pulling this off ! Alex On Wed, Sep 5, 2012 at 7:50 AM, bthirion wrote: > Congrats ! And again, thanks, Andy, > > B > > > On 09/05/2012 12:38 AM, Andreas Mueller wrote: > > Dear fellow Pythonistas. > I am pleased to announce the release of scikit-learn 0.12. > Th

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Nelle Varoquaux
On 5 September 2012 08:08, Matthieu Brucher wrote: > Excellent work! > I have a question on MDS. Is it the classic MDS or something else? (Asking > the question as PCA is the classic MDS). It seems to be when the distance > matrix is Euclidean? > It is indeed the classical MDS. When the whole sim

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Matthieu Brucher
After checking the code, the metric MDS used here is the classic MDS (the stress function is squared-sum of the discrepancies between original distances and computed distances). I didn't check the speed yet (currently on road), but the implementation may benefit from using directly PCA (just like I

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-05 Thread Andreas Mueller
Hey guys! Thanks for all the acknowledgement. See you on github ;) Andy -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT manage