Setting mahout heapsize for rowsimilarity job

Mohit Singh Fri, 23 May 2014 12:00:52 -0700

Hi,
   I have a 1M X 6 dimensional matrix stored as sequence file and I am
trying to use rowSimilarity for this job...
But when I try to run the job, I see Java heap space error for the second
step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
My raw sequence file is around 700MB and then I have already set
MAHOUT_OPTS to (say) 7gb?
But I am still seeing that error?
My command line args are:


hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob -i
$INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess

Also, is this "r" a typo.. the help file says that this is column length?
Is it column or row dimension ?

Thanks

-- 
Mohit

"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates

Setting mahout heapsize for rowsimilarity job

Reply via email to