Re: 40 hours to run 1/2 Netflix Data?

Sebastian Schelter Sun, 13 May 2012 22:44:49 -0700

Hi,

something must be completely going wrong in this experiment. Please use
the latest version of Mahout (Mahout 0.6) and tell us exactly at which
point the job fails.


I have been able to process datasets seven times as large as Netflix
(http://webscope.sandbox.yahoo.com/catalog.php?datatype=r) in a few
hours on a 6 machine cluster.

--sebastian

On 14.05.2012 03:44, 许春玲 wrote:
> Hi,
> 
>    I run item recommemder base on Netflix, but it always fail for not
> enough local disk space. So, I cut the User Id to half(not user account but 
> user Id),to reduce the temp data. Now, it finish but 
> take 40 hours. The command like follow:
> 
> hadoop jar /app/mahout-distribution-0.5/core/target/mahout-core-0.5-job.jar 
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.map.tasks=196 
> -Dmapred.reduce.tasks=196 -Dmapred.input.dir=NetFlix_data_new 
> -Dmapred.output.dir=output_netflix8
> 
> my hadoop cluster:
> 
> 28 nodes
> 16G memory per node
> 8 core per node
> 250G local disk per node
> 
> 
> 
>

Re: 40 hours to run 1/2 Netflix Data?

Reply via email to