Hi Jeremy, thank you for your answer.
I got no data, I just try to understand and learn more about Mahout, since I am a beginner in machine-learning. Mahout in Action says that there are typically four types of features: categorical, word-like, text-like and continous. So, let's say I got a descriptional-text of 100-200 words (text-like). Does this mean that I got one feature (the description) or does it mean that I got 100-200 features (the words)? The OnlineLogisticRegression-class requires me to tell it how many categories are there and how many features I like to provide. My question now is, if I got a categorical- and a text-like feature, do I have to tell the class that I am going to add two features? What happens, if I encode 20 different features into the vector but missconfigured the algorithm in a way that I told there were only 10 features? I miss a little bit some formula or something like that for the algorithms that are part of mahout. This would make understanding the different parameters more easy, I think. That's what I ment. Hopefully my explanation is better now? Thank you, Em Am 22.05.2011 18:15, schrieb Jeremy Lewi: > Em, > > Typically in machine learning a feature vector is just a vector of > numbers which describes the data. > > For example, if you are trying to classify images, the features might be > a vector of pixel intensities. Or you could process the image to extract > higher level features. For example, you might compute some basic > statistics of the pixel intensities for each image (e.g, the mean, max, > min, etc...) and then use those summary statistics as the features for > each image. > > So in your case if you use key and value as the features then you have a > 2-d feature vector. > > Can you describe your data a little more? > > J > On Sun, 2011-05-22 at 05:56 -0700, Em wrote: >> Hi list, >> >> I just read Mahout in Action and I tried to understand the chapter about >> classifying data. >> While I am reimplementing one of the examples from the book, I get really >> confused and a little bit disappointed about the assumptions the author >> makes about the reader. >> >> There are some lines of code where you can see a variable is in use but you >> never saw where and how it was defined. >> >> So far, my question is: >> >> When using an OnlineLogisticRegression-Algorithm, what is ment by "feature"? >> >> Let's say I got a bunch of data in a csv-format. >> There are the following columns I want to consider for classification: >> "Key", "Value" - does it mean I got two features? >> >> Thanks, >> Em >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Beginner-s-Question-What-is-a-feature-tp2971745p2971745.html >> Sent from the Mahout User List mailing list archive at Nabble.com. > >
