Sorry for not being able to help you with the actual problem, but
another hint:
I have a pull request for randomly sampling the parameter space, which
should be much more efficient in a model with so many parameters.
https://github.com/scikit-learn/scikit-learn/pull/1194
>> 2) how would I go about grid search over different vectorizers (e.g.
>> CountVectorizer(analyzer="word"), CountVectorizer(analyzer="char_wb"), and a
>> FeatureUnion of the two)?
>
You could always use a FeatureUnion and give it different TransformerLists
via the GridSearchCV (at least I think t
On Fri, Nov 16, 2012 at 3:28 PM, Gael Varoquaux <
gael.varoqu...@normalesup.org> wrote:
> On Thu, Nov 15, 2012 at 05:07:24PM -0800, Fred Mailhot wrote:
> > 1) there are a few LinearSVC options (penalty/loss, penalty/dual) for
> which
> > certain values are incompatible, but which are not documente
On Thu, Nov 15, 2012 at 05:07:24PM -0800, Fred Mailhot wrote:
> 1) there are a few LinearSVC options (penalty/loss, penalty/dual) for which
> certain values are incompatible, but which are not documented as such...this
> makes grid search a bit of a pain.
Indeed, they should be documented. Pull re
Hello Jake,
The error is easy to reproduce, after downloading the data for the file
sdss_photoz via the fetch_data script:
data=np.load('./sklearn_tutorial/doc/data/sdss_photoz/sdss_photoz.npy')
print data.dtype.names
#
count=0
N=len(data)
X
I already know that things work with n_jobs=1. I just tried n_jobs=-1 with
a few smaller datasets (100 & 1000 items) and things seem to have worked
fine (without LinearSVC, see below). Possibly there's something wrong with
the larger dataset...investigating now.
A couple of points related to grid
Are you sure the error is related to n_jobs, not a specific classifier?
Could you run with n_jobs=1 and a very small training set (like 100
examples or something)
and see if it runs through?
(Actually I'm totally clueless but that doesn't look like a
multiprocessing error to me)
On 11/15/201
Argh, copy-paste error:
https://gist.github.com/e2ca1910450819a8a287
As for Accelerate, I'm not 100% how to check that (I cloned & ran "setup.py
build" and "setup.py install" without making any changes, if memory
serves), but this leads me to think "yes":
$ otool -L
/Users/aboutuser/Development/
> I definitely would like to see the term "data mining" stay -- we want
> to show up in results for "python data mining" in google. But I
> wouldn't mind "applications like data mining", and saying that sklearn
> is a "statistical package" or something similar.
Maybe we want something like 'keywor
On 16 November 2012 00:36, Lars Buitinck wrote:
> 2012/11/15 Jaques Grobler :
> > @Lars you countered Olivier's paragraph with a quote from Oliver :D hehe
>
> Oops, I intended to reply to Nelle. Sorry Olivier! :)
>
> --
> Lars Buitinck
> Scientific programmer, ILPS
> University of Amsterdam
>
>
>
Hi Fred.
The link is dead for me.
Do you link against Accelerate (not sure if this is relevant)?
Cheers,
Andy
On 11/15/2012 08:45 PM, Fred Mailhot wrote:
Dear list,
I'm using GridSearchCV to do some simple model selection for a text
classification task. I've got it working (see below for cave
Dear list,
I'm using GridSearchCV to do some simple model selection for a text
classification task. I've got it working (see below for caveat), but I'm
not convinced that I'm making the best use of this tool. If someone has the
time/inclination, I'd love a set of eyes to check the following gist t
Olivier,
actually, SGDRegressor is best on boston (of those that give coefs)
so that would be my first choice, for problems big or small.
Grid search ? who has the time ?
OK ... in fact L1-regularization shrinks coefs and R2 towards 0
but av, max |residuals| get worse --
SGDRegressor boston pen
Hi Leon,
I haven't run into any NaN issues, or heard of anyone else having that
problem. Can you send the traceback for the specific error you're
getting? Thanks
Jake
On 11/15/2012 04:14 AM, Jaques Grobler wrote:
Hi Leon -
I hadn't encountered this back when I looked at this.
I think @J
2012/11/15 Jaques Grobler :
> @Lars you countered Olivier's paragraph with a quote from Oliver :D hehe
Oops, I intended to reply to Nelle. Sorry Olivier! :)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
-
@Lars you countered Olivier's paragraph with a quote from Oliver :D hehe
That's why I think we could, if we wanna keep the Data Mining solutions in
there, just mention that sklearn can be applied to areas like data mining,
etc. IMHO :)
2012/11/15 Lars Buitinck
> 2012/11/15 Olivier Grisel :
> >
2012/11/15 Olivier Grisel :
> I think that using unsupervised model for clustering or using random
> forest to rank feature by importance can be part of data mining tasks.
> Even building predictive models with a supervised signal can
> sometimes be considered data mining.
Sure, but as Olivier sa
I think that using unsupervised model for clustering or using random
forest to rank feature by importance can be part of data mining tasks.
Even building predictive models with a supervised signal can
sometimes be considered data mining.
However scikit-learn is not a full fledged data mining soft
What if we just mention that it can be applied to fields like data-mining
etc.
Then it doesn't claim to be a data-mining package or library but mentions
that
it can be used/applied for that.
Unless we drop 'Data mining' alltogether from there.
2012/11/15 Nelle Varoquaux
>
>
>
> On 15 November 2
On 15 November 2012 12:35, Mathieu Blondel wrote:
>
>
> On Thu, Nov 15, 2012 at 8:21 PM, Lars Buitinck wrote:
>
>> 2012/11/15 Gael Varoquaux :
>> > scikit-learn integrates machine learning algorithms in the tightly-knit
>> > scientific Python world, building upon numpy, scipy, and matplotlib. It
On Thu, Nov 15, 2012 at 08:35:36PM +0900, Mathieu Blondel wrote:
> "well-known algorithms" would do the trick too.
"reference algorithms"?
G
--
Monitor your physical, virtual and cloud infrastructure from a single
web co
Hi Leon -
I hadn't encountered this back when I looked at this.
I think @JacobVanderPlas would perhaps be best with this since he put that
tutorial together.
I'm sure he'll be able to help with this.
ping @jakevp :)
Regards, J
2012/11/15 Leon Palafox
>
> Hey Guys,
>
> I was running the dat
On Thu, Nov 15, 2012 at 8:21 PM, Lars Buitinck wrote:
> 2012/11/15 Gael Varoquaux :
> > scikit-learn integrates machine learning algorithms in the tightly-knit
> > scientific Python world, building upon numpy, scipy, and matplotlib. It
> > provides simple, efficient and effective data mining solu
Hey Guys,
I was running the data set in the Tree Regression Example for the astroml (
http://astroml.github.com/sklearn_tutorial/regression.html#a-simple-method-decision-tree-regression
)
And I bumped with some NaN that come from the dataset.
Has anyone else encountered this issue, and if so, ho
2012/11/15 Gael Varoquaux :
> scikit-learn integrates machine learning algorithms in the tightly-knit
> scientific Python world, building upon numpy, scipy, and matplotlib. It
> provides simple, efficient and effective data mining solutions,
> accessible to everybody and reusable in various context
On 15 November 2012 20:55, Andreas Mueller wrote:
> Am 15.11.2012 10:50, schrieb Olivier Grisel:
> > Andy, please feel free to add a new page to the documentation named
> > "Who uses scikit-learn?" and where we can collect a bunch of
> > testimonies (it's interesting not only to collect names of
Am 15.11.2012 10:50, schrieb Olivier Grisel:
> Andy, please feel free to add a new page to the documentation named
> "Who uses scikit-learn?" and where we can collect a bunch of
> testimonies (it's interesting not only to collect names of companies /
> organizations but also what specific component
Am 15.11.2012 10:34, schrieb Mathieu Blondel:
> Tackling this one would be nice:
> https://github.com/scikit-learn/scikit-learn/issues/1327
>
> Currently, PassiveAggressiveClassifier is quite slower than Perceptron.
>
There is a list of issues tagged with the 0.13 milestone:
https://github.com/scik
Andy, please feel free to add a new page to the documentation named
"Who uses scikit-learn?" and where we can collect a bunch of
testimonies (it's interesting not only to collect names of companies /
organizations but also what specific components they use for which
kind of problems).
I like the new version. If we wanted to keep the word `classic` in there,
I'd go for something like 'scikit-learn integrates both classic and recent
machine learning algorithms in the tightly-knit scientific Python world,
building upon numpy, scipy, and matplotlib.` Beyond that I think it's just
pe
Tackling this one would be nice:
https://github.com/scikit-learn/scikit-learn/issues/1327
Currently, PassiveAggressiveClassifier is quite slower than Perceptron.
Mathieu
--
Monitor your physical, virtual and cloud infrast
31 matches
Mail list logo