Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Olivier Grisel
2012/2/1 Mathieu Blondel : > On Wed, Feb 1, 2012 at 10:10 PM, David Warde-Farley > wrote: > >> I might suggest mean over training examples but sum over output dimensions, >> if there is more than one. > > Currently, Ridge is the only estimator in scikit-learn supporting > multivariate regression

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Olivier Grisel
Andreas: you should do some timing tests for data transfer using the plain numpy + IPython.parallel API (without scikit-learn nor joblib) to check that you are able to broadcast your data efficiently without memory copy. Once you have optimal time check that you can build an application in reverse

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 05:48:44PM +0100, Olivier Grisel wrote: > > IPython uses pickling, which is really slow. > This is not the case for plain numpy arrays > http://ipython.org/ipython-doc/stable/parallel/parallel_details.html#non-copying-sends-and-numpy-arrays Yes, but as soon as you use obj

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Olivier Grisel
2012/2/1 Gael Varoquaux : > On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: >> I started working with IPython.parallel for training the trees using joblib. >> It works in principal, but it is SLOW. >> The time between starting and the jobs arriving at the engines is really >> long. >> I'm

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Vlad Niculae
Sent from my iPod On 01.02.2012, at 15:43, Mathieu Blondel wrote: > On Wed, Feb 1, 2012 at 10:10 PM, David Warde-Farley > wrote: > >> I might suggest mean over training examples but sum over output dimensions, >> if there is more than one. > > Currently, Ridge is the only estimator in scikit-l

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: > I started working with IPython.parallel for training the trees using joblib. > It works in principal, but it is SLOW. > The time between starting and the jobs arriving at the engines is really > long. > I'm sending around 20.000x2000 float

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Andreas
On 02/01/2012 03:05 PM, Andreas wrote: > I started working with IPython.parallel for training the trees using joblib. > It works in principal, but it is SLOW. > The time between starting and the jobs arriving at the engines is really > long. > I'm sending around 20.000x2000 float64 matrices, but th

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Andreas
I started working with IPython.parallel for training the trees using joblib. It works in principal, but it is SLOW. The time between starting and the jobs arriving at the engines is really long. I'm sending around 20.000x2000 float64 matrices, but this is gigabit ethernet and I wouldn't expect it

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Alexandre Gramfort
>> I might suggest mean over training examples but sum over output dimensions, >> if there is more than one. > > Currently, Ridge is the only estimator in scikit-learn supporting > multivariate regression (it does so in a way which is more efficient > than solving `n_responses` problems). It would

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Mathieu Blondel
On Wed, Feb 1, 2012 at 10:10 PM, David Warde-Farley wrote: > I might suggest mean over training examples but sum over output dimensions, > if there is more than one. Currently, Ridge is the only estimator in scikit-learn supporting multivariate regression (it does so in a way which is more effi

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread David Warde-Farley
On 2012-02-01, at 5:10 AM, Mathieu Blondel wrote: > Hello, > > I just realized that the function "mean_square_error" returns > np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) ** > 2). Hence it is more a cumulated error than a mean error. > > I would like to fix this but this

Re: [Scikit-learn-general] model persistence and sklearn version upgrades

2012-02-01 Thread David Warde-Farley
On 2012-02-01, at 2:53 AM, Gael Varoquaux wrote: > All your remarks are valid, but what it really boils down to is that general > purpose persistence is hard. Given well-defined objects, good persistence > scheme can be developped, but than you have to worry about transition that > code as the

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 07:22:38PM +0900, Mathieu Blondel wrote: > I will rename the function from "mean_square_error" to > "mean_squared_error", as this is how Wikipedia calls it anyway. This > way, we can keep the old one for two releases. Sounds good. We can add a depreciation warning. Thanks,

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Mathieu Blondel
On Wed, Feb 1, 2012 at 7:14 PM, Gael Varoquaux wrote: > But at least with a warning. We can't have such a change silent. I will rename the function from "mean_square_error" to "mean_squared_error", as this is how Wikipedia calls it anyway. This way, we can keep the old one for two releases. Mat

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Alexandre Gramfort
+1 for fixing the bug eventually with a warning notifying for the change in behavior A On Wed, Feb 1, 2012 at 11:10 AM, Mathieu Blondel wrote: > Hello, > > I just realized that the function "mean_square_error" returns > np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) ** > 2).

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 11:12:33AM +0100, Olivier Grisel wrote: > > I would like to fix this but this will change people's results. > +1 for changing and documenting it in whats_new.rst. But at least with a warning. We can't have such a change silent. On the other hand, I agree that the current

Re: [Scikit-learn-general] mean square error

2012-02-01 Thread Olivier Grisel
2012/2/1 Mathieu Blondel : > Hello, > > I just realized that the function "mean_square_error" returns > np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) ** > 2). Hence it is more a cumulated error than a mean error. > > I would like to fix this but this will change people's resul

[Scikit-learn-general] mean square error

2012-02-01 Thread Mathieu Blondel
Hello, I just realized that the function "mean_square_error" returns np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) ** 2). Hence it is more a cumulated error than a mean error. I would like to fix this but this will change people's results. Mathieu -

Re: [Scikit-learn-general] model persistence and sklearn version upgrades

2012-02-01 Thread Olivier Grisel
For handling versioning and efficient binary serialization there is also Avro (which stems from the Hadoop and Cassandra communities): https://avro.apache.org/docs/current/ It works with a versioned schema embedded as a prefix of the blob that garantees backward compat with old versions (but you

Re: [Scikit-learn-general] model persistence and sklearn version upgrades\

2012-02-01 Thread Olivier Grisel
2012/1/31 David Warde-Farley : > On Tue, Jan 31, 2012 at 02:08:21PM -0500, Jeff Farris wrote: >> I'm currently using pickle to persist models (e.g. SVC).   After upgrading >> sklearn, these pickled models from a previous version of sklearn don't tend >> to work and then I need to retrain.  Is there

Re: [Scikit-learn-general] Joblib and IPython

2012-02-01 Thread Gael Varoquaux
On Mon, Jan 30, 2012 at 05:22:35PM +0100, Andreas wrote: > I implemented a somewhat trivial solution here: > https://github.com/amueller/joblib/tree/ipython_refactoring > It can be used like this: > https://gist.github.com/1705235 > Not sure if this is a good way to do things but it > was a very

Re: [Scikit-learn-general] GaussianProcess 'nugget'

2012-02-01 Thread Gael Varoquaux
On Tue, Jan 31, 2012 at 12:44:58PM -0800, Jacob VanderPlas wrote: > I've been working on applying Gaussian Processes to noisy input data. > The scikit-learn docs are not especially helpful on this topic, :$ Agreed. > Would it make more sense to rename "nugget" to "training_variance" or > someth

Re: [Scikit-learn-general] Causes for one class dominating?

2012-02-01 Thread Gael Varoquaux
On Tue, Jan 31, 2012 at 10:28:55AM -0800, Michael Waskom wrote: > First, I realized that my original PCA did not make much sense.  What > I want to do is reduce the feature dimensions in my classification, > but keep the number of observations. That's what the scikit's PCA does. I don't understan

Re: [Scikit-learn-general] model persistence and sklearn version upgrades

2012-02-01 Thread Gael Varoquaux
On Wed, Feb 01, 2012 at 05:14:32PM +0900, Mathieu Blondel wrote: > On Wed, Feb 1, 2012 at 6:57 AM, Andreas wrote: > > I imagine it would be things like renaming and removing attributes. What > > else is there? > I've seen pickle complaining even when the attributes didn't change > but couldn't fi

Re: [Scikit-learn-general] model persistence and sklearn version upgrades

2012-02-01 Thread Mathieu Blondel
On Wed, Feb 1, 2012 at 6:57 AM, Andreas wrote: > I imagine it would be things like renaming and removing attributes. What > else is there? I've seen pickle complaining even when the attributes didn't change but couldn't figure out what was the cause... Pickling the entire estimator is very conv