Hi Ted Thanks for the response. I had a quick look at chapter 14 and that part of the book is about classification, i.e. supervised learning that involves training. I am looking to run some unsupervised learning algorithm on the data, I don't have any training data. Hence why I was looking at clustering.
Actually from reading, it seems to me that Apriori or FP-growth are the most useful algorithms for me to come up with useful information about this data, but it seems these algorithms have not been implemented in Mahout yet. So I guess the question to ask is given I have some data in key values where both keys and values are strings what unsupervised algorithms are available in Mahout that I can use to learn about this data? Many thanks Haddad On 10 January 2013 07:05, Ted Dunning <[email protected]> wrote: > Look at the last third of the book, especially chapter 14. > > One important thing to check is whether your integers represent codes or > actually represent numbers. Codes should be encoded as key words. > > Hashed vector encoding should work quite well. > > On Wed, Jan 9, 2013 at 10:10 PM, Haddad Said <[email protected]> > wrote: > > > Hi, > > > > I have a data set in CSV which is a set of key value pairs, the data set > is > > huge and the values are a mixture of integers and short strings (i.e. not > > lengthy texts, but rather key words) and I want to process it using > > Mahout's clustering algorithms. > > > > The issue is in converting this CSV into vectors that can be consumed by > > Mahout. I have been reading "Mahout In Action" and there seems to be two > > options for vectorizing, using numeric values with Mahout's DenseVector, > > RandomAccessSparseVector, and SequentialAccessSparseVector implementation > > or use Vector Space Model to vectorize text documents. > > > > The data I want to vectorize it not really a text document, but as it is > a > > huge data set with many different keys and values it is difficult to map > it > > to numeric values. What is the best way to vectorize this kind of data > for > > use in Mahout? > > > > Any pointers would be appreciated. > > > > Thanks > > >
