The data source system (i.e. MySQL) won't really matter since you'll be looking to output a file with a specific format for the clustering algorithm to pick up. As long as you can manage to get the data out of your source system into the acceptable input format you'll be fine.
I very strongly suggest walking through that Reuters example step-by-step to get a feel for how your data needs to be structured as an input, how the sequence file conversion works, etc. There are plenty of great resources out there re: clustering text (or, product descriptions in your case) that are straightforward and informative (i.e. https://eastagile.com/blogs/text-mining-in-apache-mahout, http://ashokharnal.wordpress.com/2014/02/09/text-clustering-using-mahout-command-line-step-by-step/ , http://blog.trifork.com/2011/04/04/how-to-cluster-seinfeld-episodes-with-mahout/ (fun one) ) and certainly the Mahout In Action book would be a great place to learn as well. Happy clustering! -----Original Message----- From: N! [mailto:[email protected]] Sent: Friday, February 14, 2014 2:33 AM To: user Subject: Re: get similar items Thank you Sebastian&Martin&Scott. I checked 'https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line'. It looks like the case what I said.But I am using JAVA with a Mysql database, is there an example related to this? thanks. ------------------ Original ------------------ From: "Scott C. Cote";<[email protected]>; Date: Wed, Feb 12, 2014 11:47 PM To: "[email protected]"<[email protected]>; Subject: Re: get similar items Since you are relying on unguided data - switch from recommenders/classifier to clustering. Anyone else agree with me on this??? SCott On 2/12/14 9:04 AM, "Martin, Nick" <[email protected]> wrote: >Yeah, since it would appear you're lacking requisite data for >recommenders the only other thing I can think of in this case is >potentially treating the movie records as documents and clustering them >(via whatever might be in the 'description' field). > >Have a look here >https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+a >nal >ysis+using+the+Mahout+command+line and see if you can support something >like this with your dataset. > >-----Original Message----- >From: Sebastian Schelter [mailto:[email protected]] >Sent: Wednesday, February 12, 2014 6:28 AM >To: [email protected] >Subject: Re: get similar items > >Hi, > >Mahout's recommenders are based on analyzing interactions between users >and items/movies, e.g. ratings or counts how often the movie was watched. > > >On 02/12/2014 11:34 AM, N! wrote: >> Hi all: >> Does anyone have any suggestions for the questions below? >> >> >> thanks a lot. >> >> >> ------------------ Original ------------------ >> Sender: "N!"<[email protected]>; >> Send time: Wednesday, Feb 12, 2014 6:17 PM >> To: "user"<[email protected]>; >> >> Subject: Re: get similar items >> >> >> >> Hi Sean: >> Thanks for the reply. >> Assume I have only one table named 'movie' with 1000+ >>records, this table have three >>columns:'id','movieName','movieDescription'. >> Can Mahout calculate the most similar movies for a >>movie.(based on only the 'movie' table)? >> code like: List mostSimilarMovieList = >>recommender.mostSimilar(int movieId). >> if not, do you have any suggestions for this scenario? >> > .
