Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Andreas Mueller
Please open an issue on the issue tracker: https://github.com/scikit-learn/scikit-learn/issues On 10/11/2016 08:19 AM, Gabriel Trautmann wrote: Thank you for your response, have Windows 7 Enterprise 64 bit / Intel Xeon E5 2640 CPU, same problem on two similar machines python-3.5.2-amd64.exe -

Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Mathieu Blondel
On Tue, Oct 11, 2016 at 10:49 PM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > Could it be a case of compilation: it seems to me that we are compiling > MKL vs non MKL builds. > The hashing vectorizer is written in Cython and doesn't use BLAS, though. Mathieu

Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Gael Varoquaux
Could it be a case of compilation: it seems to me that we are compiling MKL vs non MKL builds. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Piotr Bialecki
I just tested it on my Ubuntu machine and could not see any performance issues (5.68 seconds in scikit-learn 0.17 vs. 6.67 seconds in scikit-learn 0.18) However, on another Windows 10 machine I could indeed see this issue: scikit-learn 0.17.1. Numpy 1.11.1. Python 2.7.12 AMD64 Vectorizing 20new

Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Olivier Grisel
That's really weird. I don't have a windows machine handy at the moment. It would be nice if someone else could confirm. Could you please run the Python profiler on this to see where the time is spent on the slow setup? -- Olivier ___ scikit-learn mail

Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Gabriel Trautmann
Thank you for your response, have Windows 7 Enterprise 64 bit / Intel Xeon E5 2640 CPU, same problem on two similar machines python-3.5.2-amd64.exe - fresh installation numpy-1.11.2+mkl-cp35-cp35m-win_amd64.whl - from Christoph Gohlke scipy-0.18.1-cp35-cp35m-win_amd64.whl pip install scikit-lean

Re: [scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Olivier Grisel
I cannot reproduce such a degradation on my machine: (sklearn-0.17)ogrisel@is146148:~/code/scikit-learn$ python ~/tmp/bench_vectorizer.py scikit-learn 0.17.1. Numpy 1.11.2. Python 3.5.0 x86_64 Vectorizing 20newsgroup 11314 documents Vectorization completed in 4.033604383468628 seconds, resulting

[scikit-learn] HashingVectorizer slow in version 0.18

2016-10-11 Thread Gabriel Trautmann
Hi, After upgrading to scikit-learn 0.18 HashingVectorizer is about 10 times slower. Before: scikit-learn 0.17. Numpy 1.11.2. Python 3.5.2 AMD64 Vectorizing 20newsgroup 11314 documents Vectorization completed in 4.594092130661011 seconds, resulting shape (11314, 1048576) After upgrade: scik