Re: [Scikit-learn-general] parallel GMM

2014-07-01 Thread Valerio Maggio
Hi Sturla and Yuan. Yesterday I looked into this and I would like to share with you my two cents. Yuan Luo wrote: > Hi, > Does anyone know how I can make GMM parallel the fitting of some moderately > big matrix (say, 390,000 x 400) with 200 components? Actually, with scikit you can't do this ou

Re: [Scikit-learn-general] parallel GMM

2014-07-01 Thread Sturla Molden
Yuan Luo wrote: > Hi, > Does anyone know how I can make GMM parallel the fitting of some moderately > big matrix (say, 390,000 x 400) with 200 components? I am not sure about GMM code in scikit-learn, but the EM-algorithm for GMMs is very easy to vectorize. There are several ways to do this: 1.

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Lars Buitinck
2014-07-01 23:58 GMT+02:00 Michael Eickenberg : > (the 4th one is typically a kwarg it didn't care about) Ah: from elasticsearch import Elasticsearch es = Elasticsearch() hits = [es.termvector('20news', 'post', i, fields=['text']) for i in range(1, 4)] does the trick, and getting the number of d

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.0b2 released

2014-07-01 Thread Valerio Maggio
On 01 Jul 2014, at 22:58, Valerio Maggio wrote: > > On 01 Jul 2014, at 22:55, Gael Varoquaux wrote: > >> Thank you Olivier, you rock! > > +1 :) > Thanks a lot Oliver! Olivier! (I just realised it….) Apologies for munging your name… auto-correction's fault !-) Valerio ---

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Michael Eickenberg
(the 4th one is typically a kwarg it didn't care about) On Tuesday, July 1, 2014, Lars Buitinck wrote: > 2014-07-01 23:44 GMT+02:00 Joel Nothman >: > > Calculating TfIdf really isn't that hard. It's much easier for you to do > so > > while transforming that into DictVectorizer input than for th

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Lars Buitinck
2014-07-01 23:44 GMT+02:00 Joel Nothman : > Calculating TfIdf really isn't that hard. It's much easier for you to do so > while transforming that into DictVectorizer input than for the library to be > everything to everyone. Indeed. I just indexed 20news in ES, then did $ curl -XGET 'http://loca

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Joel Nothman
Calculating TfIdf really isn't that hard. It's much easier for you to do so while transforming that into DictVectorizer input than for the library to be everything to everyone. On 1 July 2014 17:37, Geetu Ambwani wrote: > The term vector output from ElasticSearch is like so: (solr is also > sim

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Geetu Ambwani
The term vector output from ElasticSearch is like so: (solr is also similar) { "_id": "1", "_index": "twitter", "_type": "tweet", "_version": 1, "found": true, "term_vectors": { "text": { "field_statistics": { "doc_count": 2,

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Joel Nothman
Pulling the IDF out of Lucene is a little bit trickier, but otherwise DictVectorizer pipelined with TfidfTransformer should be able to do this. On 1 July 2014 16:40, Lars Buitinck wrote: > 2014-07-01 21:03 GMT+02:00 Geetu Ambwani : > > I imagine this transformer would be useful to others who us

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.0b2 released

2014-07-01 Thread Valerio Maggio
On 01 Jul 2014, at 22:55, Gael Varoquaux wrote: > Thank you Olivier, you rock! +1 :) Thanks a lot Oliver! Valerio -- Open source business process management suite built on Java and Eclipse Turn processes into business

Re: [Scikit-learn-general] [ANN] scikit-learn 0.15.0b2 released

2014-07-01 Thread Gael Varoquaux
Thank you Olivier, you rock! Gaël On Tue, Jul 01, 2014 at 10:51:42PM +0200, Olivier Grisel wrote: > Hi all, > I have finally cut a new beta release, namely: 0.15.0b2. The source > tarball and binary wheels for OSX and Win32 are available on PyPI: > https://pypi.python.org/pypi/scikit-learn/

[Scikit-learn-general] [ANN] scikit-learn 0.15.0b2 released

2014-07-01 Thread Olivier Grisel
Hi all, I have finally cut a new beta release, namely: 0.15.0b2. The source tarball and binary wheels for OSX and Win32 are available on PyPI: https://pypi.python.org/pypi/scikit-learn/0.15.0b2 You can install / upgrade with: pip install scikit-learn==0.15.0b2 As usual you need numpy a

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Lars Buitinck
2014-07-01 21:03 GMT+02:00 Geetu Ambwani : > I imagine this transformer would be useful to others who use lucene for text > analysis and already have access to term vectors and have the partial > pipeline but might still want access to the various weighting schemes > available in TfidfVectorizer (e

[Scikit-learn-general] parallel GMM

2014-07-01 Thread Yuan Luo
Hi, Does anyone know how I can make GMM parallel the fitting of some moderately big matrix (say, 390,000 x 400) with 200 components? Best, Yuan -- Open source business process management suite built on Java and Eclipse Tur

[Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Geetu Ambwani
Hi All I have been working on a news classification project using documents indexed in ElasticSearch as my training set. So my documents are analyzed using Lucene analyzers and I have access to the term vectors. ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvecto

Re: [Scikit-learn-general] [newb] online training?

2014-07-01 Thread Olivier Grisel
2014-07-01 20:22 GMT+02:00 Neal Becker : > Olivier Grisel wrote: > >> Some models have a partial_fit method for incremental and out of core >> learning. >> >> See for instance the documentation of the development version (that >> match version0.15.0b1 or the master branch): >> >> http://scikit-lear

Re: [Scikit-learn-general] [newb] online training?

2014-07-01 Thread Neal Becker
Olivier Grisel wrote: > Some models have a partial_fit method for incremental and out of core > learning. > > See for instance the documentation of the development version (that > match version0.15.0b1 or the master branch): > > http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_

Re: [Scikit-learn-general] [newb] online training?

2014-07-01 Thread Olivier Grisel
Some models have a partial_fit method for incremental and out of core learning. See for instance the documentation of the development version (that match version0.15.0b1 or the master branch): http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html -- Olivier

Re: [Scikit-learn-general] [newb] online training?

2014-07-01 Thread Gael Varoquaux
> I wonder if scikit-learn could be used in a similar manner? Yes. Some models support a 'partial_fit' method that does what you what. Typically for linear models, it would be the SGDClassifier. G -- Open source business

[Scikit-learn-general] [newb] online training?

2014-07-01 Thread Neal Becker
I am working on a regression problem. Currently I'm using pybrain with a classic neural net approach. I iterated over some number (100) of trials. For each trial, I generate some number (2) training vectors. The training is "online", in the sense that I feed 2 vectors, evaluate the ac

Re: [Scikit-learn-general] Help for learning/contributing

2014-07-01 Thread Ignacio Rossi
Oh, didn't see that, thanks! 2014-07-01 6:14 GMT-03:00 Olivier Grisel : > This PR has been made obsolete by another that was already merged (see > the comments). > > -- > Olivier > > > -- > Open source business process m

Re: [Scikit-learn-general] Multi-class AND multilabel learning/prediction

2014-07-01 Thread Arnaud Joly
Hi, Can you describe your problem? Do you mean multi-output multi-clas? Best, Arnaud On 01 Jul 2014, at 11:13, Gundala Viswanath wrote: > According to this documentation here: > http://scikit-learn.org/stable/modules/multiclass.html > > The API listed there does EITHER multi-class OR multi-l

Re: [Scikit-learn-general] Help for learning/contributing

2014-07-01 Thread Olivier Grisel
This PR has been made obsolete by another that was already merged (see the comments). -- Olivier -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with B

[Scikit-learn-general] Multi-class AND multilabel learning/prediction

2014-07-01 Thread Gundala Viswanath
According to this documentation here: http://scikit-learn.org/stable/modules/multiclass.html The API listed there does EITHER multi-class OR multi-label. Is there a way I can construct BOTH multi-class AND multi-label learning/prediction scheme? - G. V. --

Re: [Scikit-learn-general] Help for learning/contributing

2014-07-01 Thread Ignacio Rossi
Hi Joel, I've sent this email on friday, but it got stuck on some revision queue because of the attachment size, so I'm repeating it with a link :P https://github.com/pignacio/scikit-learn/blob/loo_is_bad_doc/doc/images/cross_validation_comparison.svg In https://github.com/scikit-learn/scikit-lea