Hmm, I noticed that if I run

from sklearn import preprocessing
X = preprocessing.scale(X)

beforehand, it runs extremely fast!

Why is that?

Jacob


On Wed, Jul 3, 2013 at 3:07 PM, Josh Wasserstein <ribonucle...@gmail.com>wrote:

> Thank you Vlad. I think you are right and there may be a problem with
> parallel jobs.
>
> When I run the code with the verbosity option enabled I see output coming
> out slowly. The strange thing is that doing a simple SVM fit is basically
> instantaneous (literally less than half a second), so I am not sure why
> this takes so long.
>
> Below is the session:
>
> # I reduced the dimensionality even further:
> > X.shape
> Out[2]: (27, 4)
>
> # There are 5 positives (1), the rest are negatives (0)
> >  y.shape
>  (27,)
>
> > clf.fit(X,y,cv=loo)
> [GridSearchCV] kernel=linear, C=1
> ..............................................
> [GridSearchCV] ..................................... kernel=linear, C=1 -
> 1.3min
> [Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:  1.3min
> [GridSearchCV] kernel=linear, C=1
> ..............................................
> [GridSearchCV] ..................................... kernel=linear, C=1 -
>  26.8s
> [GridSearchCV] kernel=linear, C=1
> ..............................................
> [GridSearchCV] ..................................... kernel=linear, C=1 -
>  50.2s
> [GridSearchCV] kernel=linear, C=100
> ............................................
> [GridSearchCV] ................................... kernel=linear, C=100 -
> 1.9min
> [GridSearchCV] kernel=linear, C=100
> ............................................
> [GridSearchCV] ................................... kernel=linear, C=100 -
> 1.9min
> [GridSearchCV] kernel=linear, C=100
> ............................................
>
> and it keeps running...
>
> On the other hand, a simple fitting is extremely fast:
>
> >  %timeit clf2 = svm.SVC(); clf2.fit(X,y)
>
> 1000 loops, best of 3: 328 us per loop
>
> Why? I am not importing multiprocessing nor setting any n_jobs anywhere in
> my code (the code is effectively what I sent earlier)
>
> Jacob
>
>
>
> On Wed, Jul 3, 2013 at 2:42 PM, Vlad Niculae <zephy...@gmail.com> wrote:
>
>> >From your code it doesn't seem like it, but are you using
>> multiprocessing (ie. n_jobs > 1)?  It causes issues on certain
>> configurations.
>>
>> Either way, try to pass `verbose=2` to the grid search constructor.
>>
>> Yours,
>> Vlad
>>
>> On Wed, Jul 3, 2013 at 9:36 PM, Josh Wasserstein <ribonucle...@gmail.com>
>> wrote:
>> > This is odd. I can successfully run the example `grid_search_digits.py`.
>> > However, I am unable to do a grid search on my own data.
>> >
>> > I have the following setup:
>> > ===============
>> >     import sklearn
>> >     from sklearn.svm import SVC
>> >     from sklearn.grid_search import GridSearchCV
>> >     from sklearn.cross_validation import LeaveOneOut
>> >     from sklearn.metrics import auc_score
>> >
>> >     # ... Build X and y ....
>> >
>> >     tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
>> >                          'C': [1, 10, 100, 1000]},
>> >                         {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
>> >
>> >     loo = LeaveOneOut(len(y))
>> >     clf = GridSearchCV(SVC(C=1), tuned_parameters, score_func=auc_score)
>> >     clf.fit(X, y, cv=loo)
>> >     ....
>> >     print clf.best_estimator_
>> >     ....
>> > ===============
>> > But I never get passed `clf.fit` (I left it run for ~1hr).
>> >
>> > I have tried also with
>> >
>> >     clf.fit(X, y, cv=10)
>> >
>> > and with
>> >
>> >     skf = StratifiedKFold(y,2)
>> >     clf.fit(X, y, cv=skf)
>> >
>> > and had the same problem (it never finishes the clf.fit statement). My
>> data
>> > is simple:
>> >
>> >     > X.shape
>> >     (27,26)
>> >
>> >     > y.shape
>> >     5
>> >
>> >     > y.dtype
>> >     dtype('int64')
>> >
>> >
>> >     >?y
>> >     Type:       ndarray
>> >     String Form:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1]
>> >     Length:     27
>> >     File:
>> > /home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-
>> >     packages/numpy/__init__.py
>> >     Docstring:  <no docstring>
>> >     Class Docstring:
>> >     ndarray(shape, dtype=float, buffer=None, offset=0,
>> >             strides=None, order=None)
>> >
>> >     > ?X
>> >     Type:       ndarray
>> >     String Form:
>> >            [[ -3.61238468e+03  -3.61253920e+03  -3.61290196e+03
>> > -3.61326679e+03
>> >                7.84590361e+02   0.0000 <...> 0000e+00   2.22389150e+00
>> > 2.53252959e+00
>> >                2.11606216e+00  -1.99613432e+05  -1.99564828e+05]]
>> >     Length:     27
>> >     File:
>> > /home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-
>> >     packages/numpy/__init__.py
>> >     Docstring:  <no docstring>
>> >     Class Docstring:
>> >     ndarray(shape, dtype=float, buffer=None, offset=0,
>> >             strides=None, order=None)
>> >
>> > This is all with the latest version of scikit-learn (0.13.1) and:
>> >
>> >     $ pip freeze
>> >     Cython==0.19.1
>> >     PIL==1.1.7
>> >     PyXB==1.2.2
>> >     PyYAML==3.10
>> >     argparse==1.2.1
>> >     distribute==0.6.34
>> >     epc==0.0.5
>> >     ipython==0.13.2
>> >     jedi==0.6.0
>> >     matplotlib==1.3.x
>> >     nltk==2.0.4
>> >     nose==1.3.0
>> >     numexpr==2.1
>> >     numpy==1.7.1
>> >     pandas==0.11.0
>> >     pyparsing==1.5.7
>> >     python-dateutil==2.1
>> >     pytz==2013b
>> >     rpy2==2.3.1
>> >     scikit-learn==0.13.1
>> >     scipy==0.12.0
>> >     sexpdata==0.0.3
>> >     six==1.3.0
>> >     stemming==1.0.1
>> >     -e
>> > git+
>> https://github.com/PyTables/PyTables.git@df7b20444b0737cf34686b5d88b4e674ec85575b#egg=tables-dev
>> >     tornado==3.0.1
>> >     wsgiref==0.1.2
>> >
>> > Thanks,
>> >
>> > Jacob
>> >
>> > PS: This thread is based on the following StackOverflow post:
>> >
>> http://stackoverflow.com/questions/17455302/clf-fit-freezes-on-small-dataset-in-scikit-learn
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > This SF.net email is sponsored by Windows:
>> >
>> > Build for Windows Store.
>> >
>> > http://p.sf.net/sfu/windows-dev2dev
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to