I generate my initial sequence files directly from records in my mysql database. Follow Martin's advice on going through the tutorial. Very very very helpful. Also - I really like MiA even if it is a couple of versions behind. The clustering chapters are still very accurate (seem to be :) ).
You really need to get a good feel of what kind of vectors you are going to use as input to your clusters. SCott On 2/14/14 1:32 AM, "N!" <[email protected]> wrote: >Thank you Sebastian&Martin&Scott. >I checked >'https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+ana >lysis+using+the+Mahout+command+line'. >It looks like the case what I said.But I am using JAVA with a Mysql >database, is there an example related to this? > > >thanks. >------------------ Original ------------------ >From: "Scott C. Cote";<[email protected]>; >Date: Wed, Feb 12, 2014 11:47 PM >To: "[email protected]"<[email protected]>; > >Subject: Re: get similar items > > > >Since you are relying on unguided data - switch from >recommenders/classifier to clustering. > >Anyone else agree with me on this??? > >SCott > >On 2/12/14 9:04 AM, "Martin, Nick" <[email protected]> wrote: > >>Yeah, since it would appear you're lacking requisite data for >>recommenders the only other thing I can think of in this case is >>potentially treating the movie records as documents and clustering them >>(via whatever might be in the 'description' field). >> >>Have a look here >>https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+ana >>l >>ysis+using+the+Mahout+command+line and see if you can support something >>like this with your dataset. >> >>-----Original Message----- >>From: Sebastian Schelter [mailto:[email protected]] >>Sent: Wednesday, February 12, 2014 6:28 AM >>To: [email protected] >>Subject: Re: get similar items >> >>Hi, >> >>Mahout's recommenders are based on analyzing interactions between users >>and items/movies, e.g. ratings or counts how often the movie was watched. >> >> >>On 02/12/2014 11:34 AM, N! wrote: >>> Hi all: >>> Does anyone have any suggestions for the questions below? >>> >>> >>> thanks a lot. >>> >>> >>> ------------------ Original ------------------ >>> Sender: "N!"<[email protected]>; >>> Send time: Wednesday, Feb 12, 2014 6:17 PM >>> To: "user"<[email protected]>; >>> >>> Subject: Re: get similar items >>> >>> >>> >>> Hi Sean: >>> Thanks for the reply. >>> Assume I have only one table named 'movie' with 1000+ >>>records, this table have three >>>columns:'id','movieName','movieDescription'. >>> Can Mahout calculate the most similar movies for a >>>movie.(based on only the 'movie' table)? >>> code like: List mostSimilarMovieList = >>>recommender.mostSimilar(int movieId). >>> if not, do you have any suggestions for this scenario? >>> >> > > >.
