Hi Jeremy,

thank you for your answer.

I got no data, I just try to understand and learn more about Mahout,
since I am a beginner in machine-learning.

Mahout in Action says that there are typically four types of features:
categorical, word-like, text-like and continous.

So, let's say I got a descriptional-text of 100-200 words (text-like).
Does this mean that I got one feature (the description) or does it mean
that I got 100-200 features (the words)?

The OnlineLogisticRegression-class requires me to tell it how many
categories are there and how many features I like to provide.

My question now is, if I got a categorical- and a text-like feature, do
I have to tell the class that I am going to add two features?

What happens, if I encode 20 different features into the vector but
missconfigured the algorithm in a way that I told there were only 10
features? I miss a little bit some formula or something like that for
the algorithms that are part of mahout. This would make understanding
the different parameters more easy, I think.

That's what I ment.
Hopefully my explanation is better now?

Thank you,
Em

Am 22.05.2011 18:15, schrieb Jeremy Lewi:
> Em,
> 
> Typically in machine learning a feature vector is just a vector of
> numbers which describes the data.
> 
> For example, if you are trying to classify images, the features might be
> a vector of pixel intensities. Or you could process the image to extract
> higher level features. For example, you might compute some basic
> statistics of the pixel intensities for each image (e.g, the mean, max,
> min, etc...) and then use those summary statistics as the features for
> each image.
> 
> So in your case if you use key and value as the features then you have a
> 2-d feature vector.
> 
> Can you describe your data a little more? 
> 
> J 
> On Sun, 2011-05-22 at 05:56 -0700, Em wrote:
>> Hi list,
>>
>> I just read Mahout in Action and I tried to understand the chapter about
>> classifying data.
>> While I am reimplementing one of the examples from the book, I get really
>> confused and a little bit disappointed about the assumptions the author
>> makes about the reader.
>>
>> There are some lines of code where you can see a variable is in use but you
>> never saw where and how it was defined.
>>
>> So far, my question is:
>>
>> When using an OnlineLogisticRegression-Algorithm, what is ment by "feature"?
>>
>> Let's say I got a bunch of data in a csv-format.
>> There are the following columns I want to consider for classification:
>> "Key", "Value" - does it mean I got two features?
>>
>> Thanks,
>> Em
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Beginner-s-Question-What-is-a-feature-tp2971745p2971745.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
> 
> 

Reply via email to