Re: [Scikit-learn-general] using string features for classification

2011-12-29 Thread David Warde-Farley
On 2011-12-29, at 3:18 PM, Bronco Zaurus wrote: > Hello, > > I have a beginner's question: how do you classify using non-numerical > features, concretely strings (for example: 'audi', 'bmw', > 'chevrolet')? > > One way that comes to mind is to give each value a number. Is there a > more s

Re: [Scikit-learn-general] using string features for classification

2011-12-29 Thread xinfan meng
There are actually work on embedding word sense into vector space, "Word representations: A simple and general method for semi-supervised learning" for example. On Fri, Dec 30, 2011 at 6:26 AM, Robert Layton wrote: > On 30 December 2011 08:57, Gael Varoquaux > wrote: > >> On Thu, Dec 29, 2011 a

Re: [Scikit-learn-general] using string features for classification

2011-12-29 Thread Robert Layton
On 30 December 2011 08:57, Gael Varoquaux wrote: > On Thu, Dec 29, 2011 at 09:18:38PM +0100, Bronco Zaurus wrote: > >I have a beginner's question: how do you classify using non-numerical > >features, concretely strings (for example: 'audi', 'bmw', > >'chevrolet')? > > You are in troubl

Re: [Scikit-learn-general] ROC curve

2011-12-29 Thread Paolo Losi
Hi Adnan, probability=True performs a probability calibration on the decision function. In order to generate the ROC curve you could directly use the output of decision_function method and obtain exactly the same result as if you used probability calibration (this is because calibration is a stric

Re: [Scikit-learn-general] ROC curve

2011-12-29 Thread Gael Varoquaux
On Thu, Dec 29, 2011 at 12:46:36PM -0800, adnan rajper wrote: >I use LinearSVC for text classification. My problem is that I want to >generate ROC curve for LinearSVC. Since LinearSVC does not output >probabilties. Is there any other way to  generate ROC curve for LinearSVC? >I have

Re: [Scikit-learn-general] using string features for classification

2011-12-29 Thread Gael Varoquaux
On Thu, Dec 29, 2011 at 09:18:38PM +0100, Bronco Zaurus wrote: >I have a beginner's question: how do you classify using non-numerical >features, concretely strings (for example: 'audi', 'bmw', >'chevrolet')? You are in trouble as your input space is not metric: what's .5*('audi' + 'che

[Scikit-learn-general] ROC curve

2011-12-29 Thread adnan rajper
hi everybody, I use LinearSVC for text classification. My problem is that I want to generate ROC curve for LinearSVC. Since LinearSVC does not output probabilties. Is there any other way to  generate ROC curve for LinearSVC? I have tried svm.SVC(kernel='linear', probabilities=True) but it gets

[Scikit-learn-general] using string features for classification

2011-12-29 Thread Bronco Zaurus
Hello, I have a beginner's question: how do you classify using non-numerical features, concretely strings (for example: 'audi', 'bmw', 'chevrolet')? One way that comes to mind is to give each value a number. Is there a more straightforward way of using string features in sklearn? ---

Re: [Scikit-learn-general] Joblib compression and LFW

2011-12-29 Thread Gael Varoquaux
On Thu, Dec 29, 2011 at 10:34:16AM -0800, Josh Bleecher Snyder wrote: > If you want to experiment with more options, you might also play with > blosc (http://blosc.pytables.org/trac). The compression level is not > as good as heavier weight algorithms, but it is really zippy. I ended > up using it

Re: [Scikit-learn-general] Joblib compression and LFW

2011-12-29 Thread Josh Bleecher Snyder
> Obviously the fine-tuning that I did is not needed for the > scikit's storage of the datasets, but it general fast dump/load of Python > objects is useful for scientific computing and big data (think caching or > message passing parallel computing). If you want to experiment with more options, y

Re: [Scikit-learn-general] Joblib compression and LFW

2011-12-29 Thread Gael Varoquaux
On Wed, Dec 28, 2011 at 05:21:39PM +0100, Alexandre Gramfort wrote: > thanks Gael for the christmas present :) I just couldn't help playing more. I have pushed a new update that enables to control the compression level, and in general can achieve better compromises between speed and compression. H

Re: [Scikit-learn-general] Properties

2011-12-29 Thread Gael Varoquaux
On Thu, Dec 29, 2011 at 04:55:48PM +0100, Andreas Müller wrote: > I was wondering whether it is a good idea to use properties as to me > that seems very unlike the rest of the user-interface. > Also, from the documentation it is not entirely clear which attributes are > properties and which are no

[Scikit-learn-general] Properties

2011-12-29 Thread Andreas Müller
Hi Everybody. As you might have noticed, I am trying to get all the errors out of the docs. One thing I noticed today is that there are two (or six, depending on how you count) places where properties are used: The gmm and hmm modules. I was wondering whether it is a good idea to use properties