On Jan 3, 2012, at 17:02 , Olivier Grisel wrote:
> 2012/1/3 Lars Buitinck :
>>
>>> We probably need to extend the sklearn.feature_extraction.text package
>>> to make it more user friendly to work with with pure categorical
>>> features occurrences:
>>
>> I'm not sure this belongs in feature_ext
2012/1/3 Lars Buitinck :
>
>> We probably need to extend the sklearn.feature_extraction.text package
>> to make it more user friendly to work with with pure categorical
>> features occurrences:
>
> I'm not sure this belongs in feature_extraction.text; it's much more
> broadly applicable.
>
> If you
2011/12/30 Olivier Grisel :
> Alright, then the name of this kind of features is "categorical
> features" in machine learning jargon: the string is used as an
> identifier and the ordered sequence of letters is not exploited by the
> model. On the opposite "string features" means something very spe
2011/12/30 Bronco Zaurus :
> One more way would be computing classification probability for each value
> and plugging the resulting number back into data. For example, let's say
> there are 10 samples with BMW in the training set, and 3 of them are 1
> (true), 7 are 0 (false). So the maximum likeli
In the previous mail variable `X` should be replaced by `data`.
--
Olivier
--
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure acce
2011/12/30 Bronco Zaurus :
> Thank you for all the answers. Yes, I'm not dealing with arbitrary strings,
> just a set of possible values, so the binary representation seems OK.
Alright, then the name of this kind of features is "categorical
features" in machine learning jargon: the string is used
Thank you for all the answers. Yes, I'm not dealing with arbitrary strings,
just a set of possible values, so the binary representation seems OK.
One more way would be computing classification probability for each value
and plugging the resulting number back into data. For example, let's say
there
On 2011-12-29, at 3:18 PM, Bronco Zaurus wrote:
> Hello,
>
> I have a beginner's question: how do you classify using non-numerical
> features, concretely strings (for example: 'audi', 'bmw',
> 'chevrolet')?
>
> One way that comes to mind is to give each value a number. Is there a
> more s
There are actually work on embedding word sense into vector space, "Word
representations: A simple and general method for semi-supervised learning"
for example.
On Fri, Dec 30, 2011 at 6:26 AM, Robert Layton wrote:
> On 30 December 2011 08:57, Gael Varoquaux
> wrote:
>
>> On Thu, Dec 29, 2011 a
On 30 December 2011 08:57, Gael Varoquaux wrote:
> On Thu, Dec 29, 2011 at 09:18:38PM +0100, Bronco Zaurus wrote:
> >I have a beginner's question: how do you classify using non-numerical
> >features, concretely strings (for example: 'audi', 'bmw',
> >'chevrolet')?
>
> You are in troubl
On Thu, Dec 29, 2011 at 09:18:38PM +0100, Bronco Zaurus wrote:
>I have a beginner's question: how do you classify using non-numerical
>features, concretely strings (for example: 'audi', 'bmw',
>'chevrolet')?
You are in trouble as your input space is not metric: what's .5*('audi' +
'che
11 matches
Mail list logo