Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-18 Thread Joel Nothman
You can also modify that line in sklearn/externals/joblib/pool.py in your local copy of scikit-learn to include an additional condition: and a.dtype.kind != 'O' On 19 August 2014 16:55, Joel Nothman wrote: > Oh well. I'm not a very experienced monkey-patcher. There may be a better > way to do i

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-18 Thread Joel Nothman
Oh well. I'm not a very experienced monkey-patcher. There may be a better way to do it (make sure you apply the monkey patch before importing any other scikit-learn modules). On 19 August 2014 16:52, Anders Aagaard wrote: > It does work with 1 job. > > I tried your monkey patch: > # joblib.Para

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-18 Thread Anders Aagaard
It does work with 1 job. I tried your monkey patch: # joblib.Parallel functools.partial(, max_nbytes=None) I still get the same error though. On Tue, Aug 19, 2014 at 8:19 AM, Joel Nothman wrote: > I suspect this is a bug in joblib, and that you won't get it with > n_jobs=1. Joblib employs me

Re: [Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-18 Thread Joel Nothman
I suspect this is a bug in joblib, and that you won't get it with n_jobs=1. Joblib employs memmap for inter-process communication if the array is larger than a fized size: https://github.com/joblib/joblib/blob/master/joblib/pool.py#L203. It seems it needs another criterion to check ensure that the

[Scikit-learn-general] [GSOC] Wrap up blog post

2014-08-18 Thread Hamzeh Alsalhi
Hello, I am wrapping up my final blogpost and I want to say that this was an awesome summer of code! It has been a great feeling for me to wake up daily and contribute to a a software project that is valuable to me. http://hamzehgsBest, oc.blogspot.com/2014/08/google-summer-of-code-2014-final-summ

Re: [Scikit-learn-general] TdidfTransformer when applied to test dataset

2014-08-18 Thread Joel Nothman
If I understand your question correctly, the answer is yes! If you want a clearer response, you might clarify what the alternative hypothesis is to your suggestion. On 19 August 2014 03:13, ZORAIDA HIDALGO SANCHEZ < [email protected]> wrote: > I am using TdidfTransformer on

[Scikit-learn-general] Large dataset causing Array can't be memory-mapped. Python objects in dtype.

2014-08-18 Thread Anders Aagaard
Hi I've got a reasonably large dataset I'm trying to do a gridsearch on. If I feed in a subset of it it works fine, but if I feed in the entire file it dies with : "Array can't be memory-mapped: Python objects in dtype.". Now I realize what that's telling me, but I seem to remember building pipeli

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Sebastian Raschka
On Aug 18, 2014, at 12:15 PM, Olivier Grisel wrote: > > since it would make the "estimate" and "error" calculation more convenient, > > right? > > I don't understand what you mean "estimate" by "error". Both the model > parameters, its individual predictions and its cross-validation scores or

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread josef.pktd
On Mon, Aug 18, 2014 at 12:43 PM, Olivier Grisel wrote: > 2014-08-18 18:28 GMT+02:00 : > > > > > > > > On Mon, Aug 18, 2014 at 12:15 PM, Olivier Grisel < > [email protected]> > > wrote: > >> > >> Le 18 août 2014 16:16, "Sebastian Raschka" a > écrit > >> : > >> > >> > >> > > >> > > >> > O

[Scikit-learn-general] TdidfTransformer when applied to test dataset

2014-08-18 Thread ZORAIDA HIDALGO SANCHEZ
I am using TdidfTransformer on documents that I need to classify. In order to evaluate the model, I need to apply the whole pipeline(TdidfTransformer, Classifier) to the test dataset. On the training step, I am using a cross-validation and in each iteration I am applying tdidf.fit_transform/transfo

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Olivier Grisel
2014-08-18 18:28 GMT+02:00 : > > > > On Mon, Aug 18, 2014 at 12:15 PM, Olivier Grisel > wrote: >> >> Le 18 août 2014 16:16, "Sebastian Raschka" a écrit >> : >> >> >> > >> > >> > On Aug 18, 2014, at 3:46 AM, Olivier Grisel >> > wrote: >> > >> > > But the sklearn.cross_validation.Bootstrap curren

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread josef.pktd
On Mon, Aug 18, 2014 at 12:15 PM, Olivier Grisel wrote: > Le 18 août 2014 16:16, "Sebastian Raschka" a écrit > : > > > > > > > On Aug 18, 2014, at 3:46 AM, Olivier Grisel > wrote: > > > > > But the sklearn.cross_validation.Bootstrap currently implemented in > sklearn is a cross validation itera

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Olivier Grisel
Le 18 août 2014 16:16, "Sebastian Raschka" a écrit : > > > On Aug 18, 2014, at 3:46 AM, Olivier Grisel wrote: > > > But the sklearn.cross_validation.Bootstrap currently implemented in sklearn is a cross validation iterator, not a generic resampling method to estimate variance or confidence interv

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Sebastian Raschka
On Aug 18, 2014, at 3:46 AM, Olivier Grisel wrote: > But the sklearn.cross_validation.Bootstrap currently implemented in sklearn > is a cross validation iterator, not a generic resampling method to estimate > variance or confidence intervals. Don't be mislead by the name. If we chose > to dep

Re: [Scikit-learn-general] MNIST benchmark

2014-08-18 Thread Lars Buitinck
2014-08-17 7:26 GMT+02:00 Amey : > Zero-one classification Loss as : 0.0426 > > I would like somebody to help me interpret this in terms of benchmarks : > http://yann.lecun.com/exdb/mnist/ > > What is the test error % metric given on the link corresponding to the > metrics I have computed? It's ze

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Arman Eshaghi
thanks, very informative. On Mon, Aug 18, 2014 at 1:08 PM, Olivier Grisel wrote: > Le 18 août 2014 09:57, "Arman Eshaghi" a écrit : > > > > > thanks for the discussion. Could you please what the right way of using > boostraping for confidence interval calculation (or other statistics) would >

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Olivier Grisel
Le 18 août 2014 09:57, "Arman Eshaghi" a écrit : > > thanks for the discussion. Could you please what the right way of using boostraping for confidence interval calculation (or other statistics) would be? I mean what would you do to get, as olivier said a "generic resampling method to estimate var

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Arman Eshaghi
thanks for the discussion. Could you please what the right way of using boostraping for confidence interval calculation (or other statistics) would be? I mean what would you do to get, as olivier said a "generic resampling method to estimate variance or confidence intervals"? I'm under the impressi

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread Olivier Grisel
But the sklearn.cross_validation.Bootstrap currently implemented in sklearn is a cross validation iterator, not a generic resampling method to estimate variance or confidence intervals. Don't be mislead by the name. If we chose to deprecate and then remove this class, it's precisely because it caus