Thanks Olivier and Lars, I'll take another look.

Cheers, Nigel
07914 740972



On 18 October 2013 09:16, Olivier Grisel <olivier.gri...@ensta.org> wrote:

> 2013/10/18 Lars Buitinck <larsm...@gmail.com>:
> > 2013/10/18 Nigel Legg <nigel.l...@gmail.com>:
> >> What am I doing wrong here?
> >
> > Could be lots of things. In any case, using an untuned SVC for this
> > task is a bad idea because (a) you need to tune it and (b) it's an
> > SVC. Better try LinearSVC or SGDClassifier.
>
> Indeed, SVC is using a RBF kernel by default which is not well suited
> for text classification. A linear model is often much better (and much
> faster to train) for sparse very high-dimensional data such as text
> data.
>
> Also you should never expect the classifiers to work correctly with
> the default parameters values. You have to grid search (manually or
> automatically with GridSearchCV) for the most important parameters,
> typically the regularizer strength for linear model such as LinearSVC
> (the C parameter) and SGDClassifier (the alpha parameter).
>
> Have a look at the document classification example for models and
> range of parameter values that work on text classification:
>
>
> http://scikit-learn.org/stable/auto_examples/document_classification_20newsgroups.html
>
> --
> Olivier
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to