Re: Mahout & Hadoop

Jeff Eastman Sat, 02 Oct 2010 10:11:09 -0700

 On 10/2/10 11:46 AM, Latency Buster wrote:

What did you want to do with Mahout?  How much data do you have?


There are many capabilities that don't use Hadoop, some that require it.
  Others allow you to choose to use
Hadoop only when you need to scale to large volumes.

I have around 50GB data and need to do some data mining.. I do not
need realtime like performance and can live with slow performance...

Can I assume that Hadoop is a 'not required' item in my case?

Thanks,

It depends upon what sort of data mining you want to do. FPGrowth andmost of the clustering jobs have sequential operation as an option. Ifyou have a multicore machine you may see performance improvements usingHadoop even on a single box. Some of the Mahout jobs only run on Hadoop.Its not that hard to bring up on a single machine. If you can borrowsome cycles and disk space on other machines (I've been successfulrunning Hadoop in the background on others' dev machines that were notheavily loaded while they were being used in the foreground for normalbuilds, etc.), it's pretty exciting to see the performance scale almostlinearly with cores :)

Re: Mahout & Hadoop

Reply via email to