[Scikit-learn-general] Negative tf-idf weight?

2011-12-23 Thread xinfan meng
Hi. In the current implementation of TfidfTransformer. It is possible to have a negative tf-idf weight. In IR area, it probably make sense, but for the text classification task, I would probably never expect a negative value. For example, if I use the tf-idf weighted matrix in Naive Bayes, the c

[Scikit-learn-general] Working on Python 3 support

2011-12-23 Thread Olivier Grisel
Hi all, In order to help us make scikit-learn work on python 3.x I have setup a new jenkins slave configuration to build the project using python 3.2.2 and the latest stable releases of numpy and scipy. The console log of a failed build such as follows shows us what remains to be achieved: https

Re: [Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Olivier Grisel
2011/12/23 Gael Varoquaux : > The reason that we have integrated openmp code in the scikit so far, is that > it does not seem that there is a reliable way of testing if a compiler > supports it with distutils. The danger is thus to break the build for certain > evironments. This is very true, w

Re: [Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Gael Varoquaux
On Fri, Dec 23, 2011 at 10:42:45PM +0100, Gael wrote: > The reason that we have integrated openmp code in the scikit so far, is I meant: that we have not integrated > that it does not seem that there is a reliable way of testing if a > compiler supports it with distutils. The danger is thus to

Re: [Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Gael Varoquaux
The reason that we have integrated openmp code in the scikit so far, is that it does not seem that there is a reliable way of testing if a compiler supports it with distutils. The danger is thus to break the build for certain evironments. Gael - Original message - > 2011/12/23 Benjamin

Re: [Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Olivier Grisel
2011/12/23 Benjamin Hepp : > Hi, > > I was wondering about the KMeans implementation in scikit-learn. From a > quick scan of the code I see that the main stuff is implemented in > Cython but it's spread in two different functions for the m- and the > e-step and the main loop is in python. I'm using

Re: [Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Kenneth C. Arnold
It may be relevant to note that Cython has recently gained some OpenMP support: http://docs.cython.org/src/userguide/parallelism.html -- I haven't tried it, but perhaps it could help improve the scikit-learn implementation. -Ken On Dec 23, 2011 7:31 AM, "Benjamin Hepp" wrote: > > Hi, > > I was

Re: [Scikit-learn-general] Zipped dump in joblib

2011-12-23 Thread Gael Varoquaux
On Fri, Dec 23, 2011 at 12:18:49PM +0100, Olivier Grisel wrote: > Looks good too here (just did a bunch of calls to dumpz / loadz on > various objects). Hum, so you tried out an oldish version of the code, as dumpz / loadz have disappeared and have been replaced by a 'zipped=True'. > What about i

[Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Benjamin Hepp
Hi, I was wondering about the KMeans implementation in scikit-learn. From a quick scan of the code I see that the main stuff is implemented in Cython but it's spread in two different functions for the m- and the e-step and the main loop is in python. I'm using my own KMeans routine written as a Py

Re: [Scikit-learn-general] Zipped dump in joblib

2011-12-23 Thread Olivier Grisel
Looks good too here (just did a bunch of calls to dumpz / loadz on various objects). What about implementing a dump(obj, path, compress='gzip') with the standard multi files output as discussed in the plane? Do you still have this in ming or the zip archive is enough in your opinion? See also: ht