2012/2/1 Mathieu Blondel :
> On Wed, Feb 1, 2012 at 10:10 PM, David Warde-Farley
> wrote:
>
>> I might suggest mean over training examples but sum over output dimensions,
>> if there is more than one.
>
> Currently, Ridge is the only estimator in scikit-learn supporting
> multivariate regression
Andreas: you should do some timing tests for data transfer using the
plain numpy + IPython.parallel API (without scikit-learn nor joblib)
to check that you are able to broadcast your data efficiently without
memory copy.
Once you have optimal time check that you can build an application in
reverse
On Wed, Feb 01, 2012 at 05:48:44PM +0100, Olivier Grisel wrote:
> > IPython uses pickling, which is really slow.
> This is not the case for plain numpy arrays
> http://ipython.org/ipython-doc/stable/parallel/parallel_details.html#non-copying-sends-and-numpy-arrays
Yes, but as soon as you use obj
2012/2/1 Gael Varoquaux :
> On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote:
>> I started working with IPython.parallel for training the trees using joblib.
>> It works in principal, but it is SLOW.
>> The time between starting and the jobs arriving at the engines is really
>> long.
>> I'm
Sent from my iPod
On 01.02.2012, at 15:43, Mathieu Blondel wrote:
> On Wed, Feb 1, 2012 at 10:10 PM, David Warde-Farley
> wrote:
>
>> I might suggest mean over training examples but sum over output dimensions,
>> if there is more than one.
>
> Currently, Ridge is the only estimator in scikit-l
On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote:
> I started working with IPython.parallel for training the trees using joblib.
> It works in principal, but it is SLOW.
> The time between starting and the jobs arriving at the engines is really
> long.
> I'm sending around 20.000x2000 float
On 02/01/2012 03:05 PM, Andreas wrote:
> I started working with IPython.parallel for training the trees using joblib.
> It works in principal, but it is SLOW.
> The time between starting and the jobs arriving at the engines is really
> long.
> I'm sending around 20.000x2000 float64 matrices, but th
I started working with IPython.parallel for training the trees using joblib.
It works in principal, but it is SLOW.
The time between starting and the jobs arriving at the engines is really
long.
I'm sending around 20.000x2000 float64 matrices, but this is gigabit
ethernet and I wouldn't
expect it
>> I might suggest mean over training examples but sum over output dimensions,
>> if there is more than one.
>
> Currently, Ridge is the only estimator in scikit-learn supporting
> multivariate regression (it does so in a way which is more efficient
> than solving `n_responses` problems). It would
On Wed, Feb 1, 2012 at 10:10 PM, David Warde-Farley
wrote:
> I might suggest mean over training examples but sum over output dimensions,
> if there is more than one.
Currently, Ridge is the only estimator in scikit-learn supporting
multivariate regression (it does so in a way which is more effi
On 2012-02-01, at 5:10 AM, Mathieu Blondel wrote:
> Hello,
>
> I just realized that the function "mean_square_error" returns
> np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) **
> 2). Hence it is more a cumulated error than a mean error.
>
> I would like to fix this but this
On 2012-02-01, at 2:53 AM, Gael Varoquaux wrote:
> All your remarks are valid, but what it really boils down to is that general
> purpose persistence is hard. Given well-defined objects, good persistence
> scheme can be developped, but than you have to worry about transition that
> code as the
On Wed, Feb 01, 2012 at 07:22:38PM +0900, Mathieu Blondel wrote:
> I will rename the function from "mean_square_error" to
> "mean_squared_error", as this is how Wikipedia calls it anyway. This
> way, we can keep the old one for two releases.
Sounds good. We can add a depreciation warning.
Thanks,
On Wed, Feb 1, 2012 at 7:14 PM, Gael Varoquaux
wrote:
> But at least with a warning. We can't have such a change silent.
I will rename the function from "mean_square_error" to
"mean_squared_error", as this is how Wikipedia calls it anyway. This
way, we can keep the old one for two releases.
Mat
+1 for fixing the bug eventually with a warning notifying for the
change in behavior
A
On Wed, Feb 1, 2012 at 11:10 AM, Mathieu Blondel wrote:
> Hello,
>
> I just realized that the function "mean_square_error" returns
> np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) **
> 2).
On Wed, Feb 01, 2012 at 11:12:33AM +0100, Olivier Grisel wrote:
> > I would like to fix this but this will change people's results.
> +1 for changing and documenting it in whats_new.rst.
But at least with a warning. We can't have such a change silent.
On the other hand, I agree that the current
2012/2/1 Mathieu Blondel :
> Hello,
>
> I just realized that the function "mean_square_error" returns
> np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) **
> 2). Hence it is more a cumulated error than a mean error.
>
> I would like to fix this but this will change people's resul
Hello,
I just realized that the function "mean_square_error" returns
np.sum((y_true - y_pred) ** 2) instead of np.mean((y_true - y_pred) **
2). Hence it is more a cumulated error than a mean error.
I would like to fix this but this will change people's results.
Mathieu
-
For handling versioning and efficient binary serialization there is
also Avro (which stems from the Hadoop and Cassandra communities):
https://avro.apache.org/docs/current/
It works with a versioned schema embedded as a prefix of the blob that
garantees backward compat with old versions (but you
2012/1/31 David Warde-Farley :
> On Tue, Jan 31, 2012 at 02:08:21PM -0500, Jeff Farris wrote:
>> I'm currently using pickle to persist models (e.g. SVC). After upgrading
>> sklearn, these pickled models from a previous version of sklearn don't tend
>> to work and then I need to retrain. Is there
On Mon, Jan 30, 2012 at 05:22:35PM +0100, Andreas wrote:
> I implemented a somewhat trivial solution here:
> https://github.com/amueller/joblib/tree/ipython_refactoring
> It can be used like this:
> https://gist.github.com/1705235
> Not sure if this is a good way to do things but it
> was a very
On Tue, Jan 31, 2012 at 12:44:58PM -0800, Jacob VanderPlas wrote:
> I've been working on applying Gaussian Processes to noisy input data.
> The scikit-learn docs are not especially helpful on this topic,
:$ Agreed.
> Would it make more sense to rename "nugget" to "training_variance" or
> someth
On Tue, Jan 31, 2012 at 10:28:55AM -0800, Michael Waskom wrote:
> First, I realized that my original PCA did not make much sense. What
> I want to do is reduce the feature dimensions in my classification,
> but keep the number of observations.
That's what the scikit's PCA does. I don't understan
On Wed, Feb 01, 2012 at 05:14:32PM +0900, Mathieu Blondel wrote:
> On Wed, Feb 1, 2012 at 6:57 AM, Andreas wrote:
> > I imagine it would be things like renaming and removing attributes. What
> > else is there?
> I've seen pickle complaining even when the attributes didn't change
> but couldn't fi
On Wed, Feb 1, 2012 at 6:57 AM, Andreas wrote:
> I imagine it would be things like renaming and removing attributes. What
> else is there?
I've seen pickle complaining even when the attributes didn't change
but couldn't figure out what was the cause...
Pickling the entire estimator is very conv
25 matches
Mail list logo