RE: get similar items

Martin, Nick Fri, 14 Feb 2014 06:13:25 -0800

The data source system (i.e. MySQL) won't really matter since you'll be looking 
to output a file with a specific format for the clustering algorithm to pick 
up. As long as you can manage to get the data out of your source system into 
the acceptable input format you'll be fine.

I very strongly suggest walking through that Reuters example step-by-step to 
get a feel for how your data needs to be structured as an input, how the 
sequence file conversion works, etc. There are plenty of great resources out 
there re: clustering text (or, product descriptions in  your case) that are 
straightforward and informative (i.e. 
https://eastagile.com/blogs/text-mining-in-apache-mahout, 
http://ashokharnal.wordpress.com/2014/02/09/text-clustering-using-mahout-command-line-step-by-step/
 , 
http://blog.trifork.com/2011/04/04/how-to-cluster-seinfeld-episodes-with-mahout/
 (fun one) ) and certainly the Mahout In Action book would be a great place to 
learn as well. 

Happy clustering!

-----Original Message-----
From: N! [mailto:[email protected]] 
Sent: Friday, February 14, 2014 2:33 AM
To: user
Subject: Re: get similar items

Thank you Sebastian&Martin&Scott.
I checked 
'https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line'.
It looks like the case what I said.But I am using JAVA with a Mysql database, 
is there an example related to this?

thanks.
------------------ Original ------------------
From:  "Scott C. Cote";<[email protected]>;
Date:  Wed, Feb 12, 2014 11:47 PM
To:  "[email protected]"<[email protected]>; 

Subject:  Re: get similar items

Since you are relying on unguided data - switch from recommenders/classifier to 
clustering.

Anyone else agree with me on this???

SCott

On 2/12/14 9:04 AM, "Martin, Nick" <[email protected]> wrote:

>Yeah, since it would appear you're lacking requisite data for 
>recommenders the only other thing I can think of in this case is 
>potentially treating the movie records as documents and clustering them 
>(via whatever might be in the 'description' field).
>
>Have a look here
>https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+a
>nal
>ysis+using+the+Mahout+command+line and see if you can support something
>like this with your dataset.
>
>-----Original Message-----
>From: Sebastian Schelter [mailto:[email protected]]
>Sent: Wednesday, February 12, 2014 6:28 AM
>To: [email protected]
>Subject: Re: get similar items
>
>Hi,
>
>Mahout's recommenders are based on analyzing interactions between users 
>and items/movies, e.g. ratings or counts how often the movie was watched.
>
>
>On 02/12/2014 11:34 AM, N! wrote:
>> Hi all:
>>   Does anyone have any suggestions for the questions below?
>>
>>
>>   thanks a lot.
>>
>>
>> ------------------ Original ------------------
>> Sender: "N!"<[email protected]>;
>> Send time: Wednesday, Feb 12, 2014 6:17 PM
>> To: "user"<[email protected]>;
>>
>> Subject: Re: get similar items
>>
>>
>>
>> Hi Sean:
>>              Thanks for the reply.
>>              Assume I have only one table named 'movie' with 1000+ 
>>records, this table have three 
>>columns:'id','movieName','movieDescription'.
>>              Can Mahout calculate the most similar movies for a 
>>movie.(based on only the 'movie' table)?
>>              code like: List mostSimilarMovieList = 
>>recommender.mostSimilar(int movieId).
>>              if not, do you have any suggestions for this scenario?
>>
>

.

RE: get similar items

Reply via email to