Thank you Vlad. I think you are right and there may be a problem with
parallel jobs.
When I run the code with the verbosity option enabled I see output coming
out slowly. The strange thing is that doing a simple SVM fit is basically
instantaneous (literally less than half a second), so I am not sure why
this takes so long.
Below is the session:
# I reduced the dimensionality even further:
> X.shape
Out[2]: (27, 4)
# There are 5 positives (1), the rest are negatives (0)
> y.shape
(27,)
> clf.fit(X,y,cv=loo)
[GridSearchCV] kernel=linear, C=1
..............................................
[GridSearchCV] ..................................... kernel=linear, C=1 -
1.3min
[Parallel(n_jobs=1)]: Done 1 jobs | elapsed: 1.3min
[GridSearchCV] kernel=linear, C=1
..............................................
[GridSearchCV] ..................................... kernel=linear, C=1 -
26.8s
[GridSearchCV] kernel=linear, C=1
..............................................
[GridSearchCV] ..................................... kernel=linear, C=1 -
50.2s
[GridSearchCV] kernel=linear, C=100
............................................
[GridSearchCV] ................................... kernel=linear, C=100 -
1.9min
[GridSearchCV] kernel=linear, C=100
............................................
[GridSearchCV] ................................... kernel=linear, C=100 -
1.9min
[GridSearchCV] kernel=linear, C=100
............................................
and it keeps running...
On the other hand, a simple fitting is extremely fast:
> %timeit clf2 = svm.SVC(); clf2.fit(X,y)
1000 loops, best of 3: 328 us per loop
Why? I am not importing multiprocessing nor setting any n_jobs anywhere in
my code (the code is effectively what I sent earlier)
Jacob
On Wed, Jul 3, 2013 at 2:42 PM, Vlad Niculae <zephy...@gmail.com> wrote:
> >From your code it doesn't seem like it, but are you using
> multiprocessing (ie. n_jobs > 1)? It causes issues on certain
> configurations.
>
> Either way, try to pass `verbose=2` to the grid search constructor.
>
> Yours,
> Vlad
>
> On Wed, Jul 3, 2013 at 9:36 PM, Josh Wasserstein <ribonucle...@gmail.com>
> wrote:
> > This is odd. I can successfully run the example `grid_search_digits.py`.
> > However, I am unable to do a grid search on my own data.
> >
> > I have the following setup:
> > ===============
> > import sklearn
> > from sklearn.svm import SVC
> > from sklearn.grid_search import GridSearchCV
> > from sklearn.cross_validation import LeaveOneOut
> > from sklearn.metrics import auc_score
> >
> > # ... Build X and y ....
> >
> > tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
> > 'C': [1, 10, 100, 1000]},
> > {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
> >
> > loo = LeaveOneOut(len(y))
> > clf = GridSearchCV(SVC(C=1), tuned_parameters, score_func=auc_score)
> > clf.fit(X, y, cv=loo)
> > ....
> > print clf.best_estimator_
> > ....
> > ===============
> > But I never get passed `clf.fit` (I left it run for ~1hr).
> >
> > I have tried also with
> >
> > clf.fit(X, y, cv=10)
> >
> > and with
> >
> > skf = StratifiedKFold(y,2)
> > clf.fit(X, y, cv=skf)
> >
> > and had the same problem (it never finishes the clf.fit statement). My
> data
> > is simple:
> >
> > > X.shape
> > (27,26)
> >
> > > y.shape
> > 5
> >
> > > y.dtype
> > dtype('int64')
> >
> >
> > >?y
> > Type: ndarray
> > String Form:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1]
> > Length: 27
> > File:
> > /home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-
> > packages/numpy/__init__.py
> > Docstring: <no docstring>
> > Class Docstring:
> > ndarray(shape, dtype=float, buffer=None, offset=0,
> > strides=None, order=None)
> >
> > > ?X
> > Type: ndarray
> > String Form:
> > [[ -3.61238468e+03 -3.61253920e+03 -3.61290196e+03
> > -3.61326679e+03
> > 7.84590361e+02 0.0000 <...> 0000e+00 2.22389150e+00
> > 2.53252959e+00
> > 2.11606216e+00 -1.99613432e+05 -1.99564828e+05]]
> > Length: 27
> > File:
> > /home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-
> > packages/numpy/__init__.py
> > Docstring: <no docstring>
> > Class Docstring:
> > ndarray(shape, dtype=float, buffer=None, offset=0,
> > strides=None, order=None)
> >
> > This is all with the latest version of scikit-learn (0.13.1) and:
> >
> > $ pip freeze
> > Cython==0.19.1
> > PIL==1.1.7
> > PyXB==1.2.2
> > PyYAML==3.10
> > argparse==1.2.1
> > distribute==0.6.34
> > epc==0.0.5
> > ipython==0.13.2
> > jedi==0.6.0
> > matplotlib==1.3.x
> > nltk==2.0.4
> > nose==1.3.0
> > numexpr==2.1
> > numpy==1.7.1
> > pandas==0.11.0
> > pyparsing==1.5.7
> > python-dateutil==2.1
> > pytz==2013b
> > rpy2==2.3.1
> > scikit-learn==0.13.1
> > scipy==0.12.0
> > sexpdata==0.0.3
> > six==1.3.0
> > stemming==1.0.1
> > -e
> > git+
> https://github.com/PyTables/PyTables.git@df7b20444b0737cf34686b5d88b4e674ec85575b#egg=tables-dev
> > tornado==3.0.1
> > wsgiref==0.1.2
> >
> > Thanks,
> >
> > Jacob
> >
> > PS: This thread is based on the following StackOverflow post:
> >
> http://stackoverflow.com/questions/17455302/clf-fit-freezes-on-small-dataset-in-scikit-learn
> >
> >
> >
> ------------------------------------------------------------------------------
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general