Perhaps more oddly, why is GridSearchCV so sensitive to it (note that a
simple svm.SVC().fit(X,y) without scaling was already fast.
In other words, it looks like scaling affects GridSearchCV in particular.
Jacob
On Wed, Jul 3, 2013 at 3:35 PM, Josh Wasserstein <ribonucle...@gmail.com>wrote:
> Hmm, I noticed that if I run
>
> from sklearn import preprocessing
> X = preprocessing.scale(X)
>
> beforehand, it runs extremely fast!
>
> Why is that?
>
> Jacob
>
>
> On Wed, Jul 3, 2013 at 3:07 PM, Josh Wasserstein
> <ribonucle...@gmail.com>wrote:
>
>> Thank you Vlad. I think you are right and there may be a problem with
>> parallel jobs.
>>
>> When I run the code with the verbosity option enabled I see output coming
>> out slowly. The strange thing is that doing a simple SVM fit is basically
>> instantaneous (literally less than half a second), so I am not sure why
>> this takes so long.
>>
>> Below is the session:
>>
>> # I reduced the dimensionality even further:
>> > X.shape
>> Out[2]: (27, 4)
>>
>> # There are 5 positives (1), the rest are negatives (0)
>> > y.shape
>> (27,)
>>
>> > clf.fit(X,y,cv=loo)
>> [GridSearchCV] kernel=linear, C=1
>> ..............................................
>> [GridSearchCV] ..................................... kernel=linear, C=1 -
>> 1.3min
>> [Parallel(n_jobs=1)]: Done 1 jobs | elapsed: 1.3min
>> [GridSearchCV] kernel=linear, C=1
>> ..............................................
>> [GridSearchCV] ..................................... kernel=linear, C=1 -
>> 26.8s
>> [GridSearchCV] kernel=linear, C=1
>> ..............................................
>> [GridSearchCV] ..................................... kernel=linear, C=1 -
>> 50.2s
>> [GridSearchCV] kernel=linear, C=100
>> ............................................
>> [GridSearchCV] ................................... kernel=linear, C=100 -
>> 1.9min
>> [GridSearchCV] kernel=linear, C=100
>> ............................................
>> [GridSearchCV] ................................... kernel=linear, C=100 -
>> 1.9min
>> [GridSearchCV] kernel=linear, C=100
>> ............................................
>>
>> and it keeps running...
>>
>> On the other hand, a simple fitting is extremely fast:
>>
>> > %timeit clf2 = svm.SVC(); clf2.fit(X,y)
>>
>>
>> 1000 loops, best of 3: 328 us per loop
>>
>> Why? I am not importing multiprocessing nor setting any n_jobs anywhere
>> in my code (the code is effectively what I sent earlier)
>>
>> Jacob
>>
>>
>>
>> On Wed, Jul 3, 2013 at 2:42 PM, Vlad Niculae <zephy...@gmail.com> wrote:
>>
>>> >From your code it doesn't seem like it, but are you using
>>> multiprocessing (ie. n_jobs > 1)? It causes issues on certain
>>> configurations.
>>>
>>> Either way, try to pass `verbose=2` to the grid search constructor.
>>>
>>> Yours,
>>> Vlad
>>>
>>> On Wed, Jul 3, 2013 at 9:36 PM, Josh Wasserstein <ribonucle...@gmail.com>
>>> wrote:
>>> > This is odd. I can successfully run the example
>>> `grid_search_digits.py`.
>>> > However, I am unable to do a grid search on my own data.
>>> >
>>> > I have the following setup:
>>> > ===============
>>> > import sklearn
>>> > from sklearn.svm import SVC
>>> > from sklearn.grid_search import GridSearchCV
>>> > from sklearn.cross_validation import LeaveOneOut
>>> > from sklearn.metrics import auc_score
>>> >
>>> > # ... Build X and y ....
>>> >
>>> > tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
>>> > 'C': [1, 10, 100, 1000]},
>>> > {'kernel': ['linear'], 'C': [1, 10, 100,
>>> 1000]}]
>>> >
>>> > loo = LeaveOneOut(len(y))
>>> > clf = GridSearchCV(SVC(C=1), tuned_parameters,
>>> score_func=auc_score)
>>> > clf.fit(X, y, cv=loo)
>>> > ....
>>> > print clf.best_estimator_
>>> > ....
>>> > ===============
>>> > But I never get passed `clf.fit` (I left it run for ~1hr).
>>> >
>>> > I have tried also with
>>> >
>>> > clf.fit(X, y, cv=10)
>>> >
>>> > and with
>>> >
>>> > skf = StratifiedKFold(y,2)
>>> > clf.fit(X, y, cv=skf)
>>> >
>>> > and had the same problem (it never finishes the clf.fit statement). My
>>> data
>>> > is simple:
>>> >
>>> > > X.shape
>>> > (27,26)
>>> >
>>> > > y.shape
>>> > 5
>>> >
>>> > > y.dtype
>>> > dtype('int64')
>>> >
>>> >
>>> > >?y
>>> > Type: ndarray
>>> > String Form:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1]
>>> > Length: 27
>>> > File:
>>> > /home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-
>>> > packages/numpy/__init__.py
>>> > Docstring: <no docstring>
>>> > Class Docstring:
>>> > ndarray(shape, dtype=float, buffer=None, offset=0,
>>> > strides=None, order=None)
>>> >
>>> > > ?X
>>> > Type: ndarray
>>> > String Form:
>>> > [[ -3.61238468e+03 -3.61253920e+03 -3.61290196e+03
>>> > -3.61326679e+03
>>> > 7.84590361e+02 0.0000 <...> 0000e+00 2.22389150e+00
>>> > 2.53252959e+00
>>> > 2.11606216e+00 -1.99613432e+05 -1.99564828e+05]]
>>> > Length: 27
>>> > File:
>>> > /home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-
>>> > packages/numpy/__init__.py
>>> > Docstring: <no docstring>
>>> > Class Docstring:
>>> > ndarray(shape, dtype=float, buffer=None, offset=0,
>>> > strides=None, order=None)
>>> >
>>> > This is all with the latest version of scikit-learn (0.13.1) and:
>>> >
>>> > $ pip freeze
>>> > Cython==0.19.1
>>> > PIL==1.1.7
>>> > PyXB==1.2.2
>>> > PyYAML==3.10
>>> > argparse==1.2.1
>>> > distribute==0.6.34
>>> > epc==0.0.5
>>> > ipython==0.13.2
>>> > jedi==0.6.0
>>> > matplotlib==1.3.x
>>> > nltk==2.0.4
>>> > nose==1.3.0
>>> > numexpr==2.1
>>> > numpy==1.7.1
>>> > pandas==0.11.0
>>> > pyparsing==1.5.7
>>> > python-dateutil==2.1
>>> > pytz==2013b
>>> > rpy2==2.3.1
>>> > scikit-learn==0.13.1
>>> > scipy==0.12.0
>>> > sexpdata==0.0.3
>>> > six==1.3.0
>>> > stemming==1.0.1
>>> > -e
>>> > git+
>>> https://github.com/PyTables/PyTables.git@df7b20444b0737cf34686b5d88b4e674ec85575b#egg=tables-dev
>>> > tornado==3.0.1
>>> > wsgiref==0.1.2
>>> >
>>> > Thanks,
>>> >
>>> > Jacob
>>> >
>>> > PS: This thread is based on the following StackOverflow post:
>>> >
>>> http://stackoverflow.com/questions/17455302/clf-fit-freezes-on-small-dataset-in-scikit-learn
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > This SF.net email is sponsored by Windows:
>>> >
>>> > Build for Windows Store.
>>> >
>>> > http://p.sf.net/sfu/windows-dev2dev
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-general@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general