Re: Question about hadoop based mahout

Saikat Kanjilal Mon, 16 Jul 2012 17:27:53 -0700

Hi Swapna,
I can't give you source code for legal reasons but you should be able to setup 
Hadoop based map reduce jobs that invoke mahout item similarity and clustering 
algorithms to run in offline mode in a local or dev hadoop cluster. I wrote 
some scala code to invoke mahout and stream out results to stdout and then 
store the results inside hdfs.  The mahout documentation as well as the unit 
tests should help you get this configured and started.  For the clustering 
algorithms you will need to have a process to take a csv file and generate a 
sequence vector that serves as the input.

After the offline process completes you can then have a script or some other 
process that moves a subset (or maybe all) the offline data computed by mahout 
to a real time low latency database to render out these results through a 
webapp.  I have most of this infrastructure working at the moment.  Also please 
direct these to the user alias so everyone can benefit from the discussion.

Let me know if you have more specific questions.

Regards

Sent from my iPhone

On Jul 16, 2012, at 4:26 PM, Swapna Yeleswarapu <[email protected]> wrote:

> Hi Saikat,
> 
> I was reading your questions on 
> 
> http://comments.gmane.org/gmane.comp.apache.mahout.user/13362
> 
> And was wondering if you got a chance to implement what you were trying to 
> do. I am trying to do something similar for learning stuff.
> 
> Can you tell me example of how you did a hybrid mode(Offline precomputation 
> and online reco)?
> 
> Could appreciate it if you had any readily available source code which could 
> help me set things up.
> 
> Thanks
> Swapna

Re: Question about hadoop based mahout

Reply via email to