Re: [Scikit-learn-general] TF-Idf

2012-10-25 Thread Ark
>Can you try to turn off IDF normalization using `use_idf=False ` in >the constructor params of your vectorizer and retry (fit + predict) to >see if it's related to IDF normalization? >How many dimensions do you have in your fitted model? https://gist.github.com/3933727 data_vectors.shape = (10361

Re: [Scikit-learn-general] LinearSVC and copy/deepcopy

2012-10-25 Thread David Warde-Farley
On Thu, Oct 25, 2012 at 4:15 PM, Gael Varoquaux wrote: > Hi David, > > Thanks for the heads up. It's good to have you around. > > On Thu, Oct 25, 2012 at 04:12:28PM -0400, David Warde-Farley wrote: >> A workaround would be to check flags in predict() and do the necessary >> reshuffle if the flags

Re: [Scikit-learn-general] LinearSVC and copy/deepcopy

2012-10-25 Thread Gael Varoquaux
Hi David, Thanks for the heads up. It's good to have you around. On Thu, Oct 25, 2012 at 04:12:28PM -0400, David Warde-Farley wrote: > A workaround would be to check flags in predict() and do the necessary > reshuffle if the flags have been botched, but that's kind of a > maintenance headache. I

[Scikit-learn-general] LinearSVC and copy/deepcopy

2012-10-25 Thread David Warde-Farley
A colleague of mine ran into this this morning, and I've fixed the upstream bug in NumPy, but I thought I'd let you guys know in case you want to put in a workaround. Basically, if you copy.copy or copy.deepcopy a LinearSVC object (and probably some other objects where the underlying code relies o

Re: [Scikit-learn-general] Python MapReduce

2012-10-25 Thread Olivier Grisel
2012/10/25 Nikit Saraf : > Hi > > I'm fairly new to the field of Machine Learning and as a result new user of > scikit-learn. I'm learning about the Map Reduce parallel implementation of > Machine Learning Algorithms in python. So I was thinking of ways to > MapReduce the cross-validation. Anyone h

[Scikit-learn-general] Python MapReduce

2012-10-25 Thread Nikit Saraf
Hi I'm fairly new to the field of Machine Learning and as a result new user of scikit-learn. I'm learning about the Map Reduce parallel implementation of Machine Learning Algorithms in python. So I was thinking of ways to MapReduce the cross-validation. Anyone having any ideas on how to translate