I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9, downsampling was introduced in RSJ which should avoid this error.
On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <[email protected]> wrote: > Hi, > I have a 1M X 6 dimensional matrix stored as sequence file and I am > trying to use rowSimilarity for this job... > But when I try to run the job, I see Java heap space error for the second > step (RowSimilarityJob-CooccurrencesMapper-Reducer) . > My raw sequence file is around 700MB and then I have already set > MAHOUT_OPTS to (say) 7gb? > But I am still seeing that error? > My command line args are: > > hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob -i > $INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess > > Also, is this "r" a typo.. the help file says that this is column length? > Is it column or row dimension ? > > Thanks > > -- > Mohit > > "When you want success as badly as you want the air, then you will get it. > There is no other secret of success." > -Socrates >
