On 10/2/10 11:46 AM, Latency Buster wrote:
What did you want to do with Mahout? How much data do you have?
There are many capabilities that don't use Hadoop, some that require it.
Others allow you to choose to use
Hadoop only when you need to scale to large volumes.
I have around 50GB data and need to do some data mining.. I do not
need realtime like performance and can live with slow performance...
Can I assume that Hadoop is a 'not required' item in my case?
Thanks,
It depends upon what sort of data mining you want to do. FPGrowth and
most of the clustering jobs have sequential operation as an option. If
you have a multicore machine you may see performance improvements using
Hadoop even on a single box. Some of the Mahout jobs only run on Hadoop.
Its not that hard to bring up on a single machine. If you can borrow
some cycles and disk space on other machines (I've been successful
running Hadoop in the background on others' dev machines that were not
heavily loaded while they were being used in the foreground for normal
builds, etc.), it's pretty exciting to see the performance scale almost
linearly with cores :)