Re: [Scikit-learn-general] Using TFxIDF with HashingVectorizer

2014-09-08 Thread Apu Mishra
Lars Buitinck writes: > The way to combine HV and > Tfidf is > > hashing = HashingVectorizer(non_negative=True, norm=None) > tfidf = TfidfTransformer() > hashing_tfidf = Pipeline([("hashing", hashing), ("tidf", tfidf)]) > I notice your use of the non_negative option in HashingVectorizer(), whe

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Joel Nothman
I'm happy with these proposals, but expect that some users will find themselves using sparsefuncs or extmath. On 9 September 2014 07:31, Kyle Kastner wrote: > I agree as well. Maybe default to everything other than validation > private? Then see what people want to become public? Don't know wha

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Kyle Kastner
I agree as well. Maybe default to everything other than validation private? Then see what people want to become public? Don't know what nilearn is using but that should obviously be public too... On Mon, Sep 8, 2014 at 5:17 PM, Olivier Grisel wrote: > +1 as well for the combined proposal of Gael

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Olivier Grisel
+1 as well for the combined proposal of Gael and Matthieu (explicit __all__ in sklearn/util/__init__.py) + prefixing private utils with `_`. -- Olivier -- Want excitement? Manually upgrade your production database. When

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Yaroslav Halchenko
On Mon, 08 Sep 2014, Yaroslav Halchenko wrote: > hm... actually not clear since it claims that it is because of missing > bdepends > scikit-learn build-depends on missing: > - libsvm-dev (>= 2.84.0) > while that one is available :-/ I will check yeap -- not yet available on arm64. -- Yarosl

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Yaroslav Halchenko
On Mon, 08 Sep 2014, Olivier Grisel wrote: > 2014-09-08 7:46 GMT-07:00 Yaroslav Halchenko : > > It is a bit early to say about Debian servers conclusively -- I have just > > uploaded to Debian proper, so they have been rebuilt across > > architectures: > > https://buildd.debian.org/status/packa

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Olivier Grisel
2014-09-08 7:46 GMT-07:00 Yaroslav Halchenko : > > It is a bit early to say about Debian servers conclusively -- I have just > uploaded to Debian proper, so they have been rebuilt across > architectures: > > https://buildd.debian.org/status/package.php?p=scikit-learn&suite=unstable > and armel seem

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gilles Louppe
Variants include: - Taking into account common internal nodes reached by two samples. In this sense, proximity takes into account the paths that are common and not only the leaves. - Normalizing the counts by the number of training samples within the common leaves (instead of simply counting +1 fo

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Mathieu Blondel
On Mon, Sep 8, 2014 at 11:55 PM, Gilles Louppe wrote: > I am rather -1 on making this a transform. There has many ways to come > up with proximity measures in forest -- In fact, I dont think > Breiman's is particularly well designed. > I think this is actually an argument for non-inclusion in th

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gilles Louppe
I am rather -1 on making this a transform. There has many ways to come up with proximity measures in forest -- In fact, I dont think Breiman's is particularly well designed. On 8 September 2014 16:52, Gael Varoquaux wrote: > On Mon, Sep 08, 2014 at 11:49:26PM +0900, Mathieu Blondel wrote: >> This

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Gael Varoquaux
I agree with everything you said, Matthieu (which of course does not answer the questions that you raise). Gaël On Mon, Sep 08, 2014 at 11:01:44PM +0900, Mathieu Blondel wrote: > Maintaining backward compatibility for a subset of the utils only means that > from now on we will have to decide whet

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gael Varoquaux
> I don't think that it can be a transform, because currently transform > cannot modify y (and that's really a problem). Brainfart! I hadn't thought about the problem well enough. Please disregard the previous message. G ---

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gael Varoquaux
On Mon, Sep 08, 2014 at 11:49:26PM +0900, Mathieu Blondel wrote: > This could be a transform method added to RandomForestClassifier / > RandomForestRegressor. I don't think that it can be a transform, because currently transform cannot modify y (and that's really a problem). G --

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Andreas Mueller
Awesome Oliver, thanks a lot! On Sep 6, 2014 2:27 AM, "Olivier Grisel" wrote: > Hi all, > > I just released 0.15.2. The source and binary packages for this > release are on PyPi as usual: > > https://pypi.python.org/pypi/scikit-learn/0.15.2 > > The website has the change log: > > http://scikit-le

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Mathieu Blondel
This could be a transform method added to RandomForestClassifier / RandomForestRegressor. On Mon, Sep 8, 2014 at 11:14 PM, Gilles Louppe wrote: > Hi Luca, > > This may not be the fastest implementation, but random forest > proximities can be computed quite straightforwardly in Python given > our

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Yaroslav Halchenko
On Mon, 08 Sep 2014, Olivier Grisel wrote: > >> I just released 0.15.2. The source and binary packages for this > >> release are on PyPi as usual: > >> https://pypi.python.org/pypi/scikit-learn/0.15.2 > > Congrats! > > And FWIW -- 0.15.2 is available now from NeuroDebian for all > > Debian/Ubun

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Olivier Grisel
2014-09-08 6:57 GMT-07:00 Yaroslav Halchenko : > On Sat, 06 Sep 2014, Olivier Grisel wrote: > >> I just released 0.15.2. The source and binary packages for this >> release are on PyPi as usual: > >> https://pypi.python.org/pypi/scikit-learn/0.15.2 > > Congrats! > > And FWIW -- 0.15.2 is available n

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Sam Nicholls
+1 for seeing this implemented. I feel it would be a useful addition for work we do here that involves use of random forests. On Mon, Sep 8, 2014 at 3:14 PM, Gilles Louppe wrote: > Hi Luca, > > This may not be the fastest implementation, but random forest > proximities can be computed quite stra

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Peter Prettenhofer
+1 -- looks like a very handy 3-liner :) 2014-09-08 16:14 GMT+02:00 Gilles Louppe : > Hi Luca, > > This may not be the fastest implementation, but random forest > proximities can be computed quite straightforwardly in Python given > our 'apply' function. > See for instance > > https://github.com/

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gilles Louppe
Hi Luca, This may not be the fastest implementation, but random forest proximities can be computed quite straightforwardly in Python given our 'apply' function. See for instance https://github.com/glouppe/phd-thesis/blob/master/scripts/ch4_proximity.py#L12 >From a personal point of view, I never

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Mathieu Blondel
Maintaining backward compatibility for a subset of the utils only means that from now on we will have to decide whether an util deserves to be public or not. While we are at it, I would rather make it explicit and use an underscore prefix for private utils and no prefix for public utils. This can b

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-08 Thread Yaroslav Halchenko
On Sat, 06 Sep 2014, Olivier Grisel wrote: > I just released 0.15.2. The source and binary packages for this > release are on PyPi as usual: > https://pypi.python.org/pypi/scikit-learn/0.15.2 Congrats! And FWIW -- 0.15.2 is available now from NeuroDebian for all Debian/Ubuntu-powered folks. --

Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol 56, Issue 13

2014-09-08 Thread Luca Puggini
> > for personal reason I am writing a function to compute the outlier > > measure from random forest > > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm# > > outliers > > > with a little more work I can include the function in the sklearn > > random forest class. > > Do you have a

Re: [Scikit-learn-general] probabilistic values from KNeighborsClassifier

2014-09-08 Thread Patrick Short
Hi Sheila, I think if you use an odd-number of neighbors you can break your ties. Without a weight function, the probability should be comprised of votes from the k-nearest neighbors. So, the tie at 0.5 means two neighbors are class 2 and two are class 3 for the first two samples and a tie would b

[Scikit-learn-general] Backward compat policy in utils

2014-09-08 Thread Gael Varoquaux
Hi people, So far we have had no policy of backward compatibility in sklearn/utils. However, some of the utilities there are very useful for packages that want to extend scikit-learn's functionality, such as seqlearn, sklearn-theano, nilearn... The latest set of changes in the validation utilitie

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gael Varoquaux
On Mon, Sep 08, 2014 at 10:05:58AM +0100, Luca Puggini wrote: > for personal reason I am writing a function to compute the outlier > measure from random forest > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm# > outliers > with a little more work I can include the function in the

[Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Luca Puggini
Hi, for personal reason I am writing a function to compute the outlier measure from random forest http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#outliers with a little more work I can include the function in the sklearn random forest class. Is the community interested? Should I d

Re: [Scikit-learn-general] Print coordinate descent coefficients at each iteration

2014-09-08 Thread Danny Sullivan
Sorry it took a while to respond to this. I believe you'll just have to include the gil before each print statement. At the beginning of the enet_coordinate_descent algorithm you'll see a statement "with nogil:" which releases the python gil increasing the c performance. I suppose you could jus

Re: [Scikit-learn-general] probabilistic values from KNeighborsClassifier

2014-09-08 Thread Sheila the angel
Any suggestion about KNeighborsClassifier().predict_proba ? On 3 September 2014 14:57, Sheila the angel wrote: > I am using KNeighborsClassifier and trying to obtain probabilistic output. > But for many of the test sets I am getting equal probability for all class. > > >>>X_train, X_test, y_tra

Re: [Scikit-learn-general] Multi-target regression

2014-09-08 Thread Giuseppe Marco Randazzo
Hello, look in wilkipedia. There is the general algorithm to estimate the beta coefficient in a simple linear regression trough the Ordinary Least Squares. All that you need is in the page: Then... Marco On 08 Sep 2014, at 09:54, Philipp Singer wrote: > Is there a description about t

Re: [Scikit-learn-general] Multi-target regression

2014-09-08 Thread Philipp Singer
Is there a description about this somewhere? I can’t find it in the docu. Thanks! Am 05.09.2014 um 18:40 schrieb Flavio Vinicius : > I the case of LinearRegression independent models are being fit for > each response. But this is not the case for every multi-response > estimator. Afaik, the mult