Re: [Scikit-learn-general] mixture.GMM and data ranges

2012-10-31 Thread bthirion
> Hi, > > Thanks for this - yes I think I see that now. (The values do indeed > differ by n_dim * n_samples * log(scale), but no 0.5 here.) > > I guess in a way the issue is that we typically evaluate point > likelihoods, rather than e.g. integrals within some bounds of certainty > of the measurem

Re: [Scikit-learn-general] mixture.GMM and data ranges

2012-10-31 Thread Dan Stowell
On 31/10/12 16:09, bthirion wrote: > On 10/31/2012 04:50 PM, Dan Stowell wrote: >> Hi all, >> >> I'm still getting odd results using mixture.GMM depending on data >> scaling. In the following code example, I change the overall scaling but >> I do NOT change the relative scaling of the dimensions. Y

Re: [Scikit-learn-general] mixture.GMM and data ranges

2012-10-31 Thread Martin Fergie
Hi Dan, I would have thought that it is the relative scaling that is important, not the overall scaling. I.e. each feature of your data set should have zero mean and unit variance. Martin On 31 October 2012 16:09, bthirion wrote: > On 10/31/2012 04:50 PM, Dan Stowell wrote: > > Hi all, > > >

Re: [Scikit-learn-general] LarsLasso and Lasso normalize option

2012-10-31 Thread Olivier Grisel
2012/10/31 Alexandre Gramfort : > fine with me but do you push the logic further to any linear estimator ? > For example in Ridge we also have normalize=False by default. > > I would say that LassoLars is more the exception than the norm. Indeed, yet another tricky mission for the Consistency Brig

Re: [Scikit-learn-general] LarsLasso and Lasso normalize option

2012-10-31 Thread Alexandre Gramfort
fine with me but do you push the logic further to any linear estimator ? For example in Ridge we also have normalize=False by default. I would say that LassoLars is more the exception than the norm. Alex On Wed, Oct 31, 2012 at 11:53 AM, Jaques Grobler wrote: > It makes sense to me to make the

Re: [Scikit-learn-general] mixture.GMM and data ranges

2012-10-31 Thread bthirion
On 10/31/2012 04:50 PM, Dan Stowell wrote: > Hi all, > > I'm still getting odd results using mixture.GMM depending on data > scaling. In the following code example, I change the overall scaling but > I do NOT change the relative scaling of the dimensions. Yet under the > three different scaling set

Re: [Scikit-learn-general] mixture.GMM and data ranges

2012-10-31 Thread Dan Stowell
Hi all, I'm still getting odd results using mixture.GMM depending on data scaling. In the following code example, I change the overall scaling but I do NOT change the relative scaling of the dimensions. Yet under the three different scaling settings I get completely different results:

Re: [Scikit-learn-general] API for multi-sample "documents"

2012-10-31 Thread Andreas Mueller
Hi Vlad. This is definitely a good question. I have that often when representing an image as bags of keypoints / features. Why is it not a good solution to have X as being a list of arrays / lists? Which algorithms do you want to use such samples in? The text feature extraction sort of deals with

[Scikit-learn-general] API for multi-sample "documents"

2012-10-31 Thread Vlad Niculae
Hello, It seems I have reached again the need for something that became apparent when working with image patches last summer. Sometimes we don't have a 1 to 1 correspondence between samples (rows in X) and actual documents we are interested in scoring over. Instead, each document consists of (a di

Re: [Scikit-learn-general] Trees in Mahout

2012-10-31 Thread Andreas Mueller
On 10/31/2012 11:45 AM, Joseph Turian wrote: >> As far as I understand, we are not really sure what is the best way to >> build >> the trees (masks / no masks, pre-sorting / lazy sorting..). > Are you talking about efficiency in training time, or generalization accuracy? > Training time. -

Re: [Scikit-learn-general] Trees in Mahout

2012-10-31 Thread Joseph Turian
> As far as I understand, we are not really sure what is the best way to > build > the trees (masks / no masks, pre-sorting / lazy sorting..). Are you talking about efficiency in training time, or generalization accuracy? Best, Joseph --

Re: [Scikit-learn-general] LarsLasso and Lasso normalize option

2012-10-31 Thread Jaques Grobler
It makes sense to me to make the change - however the scikit-learn users would just need to be warned about this. Perhaps for now we can just add a warning that the API will be changing as to make users well aware (before actually changing the API) and that they must manually set it up in the meanw

Re: [Scikit-learn-general] What caused the HMM test failure

2012-10-31 Thread Lars Buitinck
2012/10/31 Olivier Grisel : >>> Can we have a vote on this? +1 -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics D

Re: [Scikit-learn-general] What caused the HMM test failure

2012-10-31 Thread Olivier Grisel
2012/10/31 Andreas Mueller : > On 10/31/2012 10:09 AM, Gael Varoquaux wrote: >> On Tue, Oct 30, 2012 at 03:40:48PM +0100, Lars Buitinck wrote: >>> Agree with David, int->float conversion should be expected to produce >>> larger arrays. >> Can we have a vote on this? +1 too -- Olivier http://twit

Re: [Scikit-learn-general] What caused the HMM test failure

2012-10-31 Thread Andreas Mueller
On 10/31/2012 10:09 AM, Gael Varoquaux wrote: > On Tue, Oct 30, 2012 at 03:40:48PM +0100, Lars Buitinck wrote: >> Agree with David, int->float conversion should be expected to produce >> larger arrays. > Can we have a vote on this? > > I am +0 on int->float conversion always giving float64 (np.floa

Re: [Scikit-learn-general] LarsLasso and Lasso normalize option

2012-10-31 Thread Olivier Grisel
2012/10/31 Gael Varoquaux : > > I want to change this (warning backward compatibility breakage :$ ). I > want to change Lasso to have normalize=True, because in my experience > this is a sane behavior. This would imply, for consistency, changing > ElasticNet to also have normalize=True. We would ha

[Scikit-learn-general] LarsLasso and Lasso normalize option

2012-10-31 Thread Gael Varoquaux
* First some background: LarsLasso and Lasso are two different algorithms to solve the same problem (l1-penalized linear model). As with all linear models, they have a 'normalize' parameter that can be turned of so that regressors are normalized. This is useful because the 'good' penalty on each

Re: [Scikit-learn-general] What caused the HMM test failure

2012-10-31 Thread Gael Varoquaux
On Tue, Oct 30, 2012 at 03:40:48PM +0100, Lars Buitinck wrote: > Agree with David, int->float conversion should be expected to produce > larger arrays. Can we have a vote on this? I am +0 on int->float conversion always giving float64 (np.float). G --

Re: [Scikit-learn-general] Prediction Probabilities in LinearSVC with scikit-learn >0.12

2012-10-31 Thread Olivier Grisel
2012/10/31 Afik Cohen : > > Hah, thanks for the explanation :) But yes, the accuracy was terrible. In > fact, > we just ran another cross-validated k=3 run with our current data, and got > these > results: > > Training LogisticRegression(C=1.0, class_weight=None, dual=False, > fit_intercept=True,

Re: [Scikit-learn-general] Trees in Mahout

2012-10-31 Thread Peter Prettenhofer
2012/10/31 Andreas Mueller : > Hey everybody. > I noticed mahout also has random forest algorithms. Has any-one tried those? > Has any-one done any timing comparisons? > As far as I understand, we are not really sure what is the best way to build > the trees (masks / no masks, pre-sorting / lazy so

[Scikit-learn-general] Trees in Mahout

2012-10-31 Thread Andreas Mueller
Hey everybody. I noticed mahout also has random forest algorithms. Has any-one tried those? Has any-one done any timing comparisons? As far as I understand, we are not really sure what is the best way to build the trees (masks / no masks, pre-sorting / lazy sorting..). I thought it might be a