I don't think you should use RowSimilarity job for that case, if you only have 6 columns.

Can you tell us a little bit about the data and what problem your are trying to solve?


On 05/23/2014 09:03 PM, Suneel Marthi wrote:
I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9,
downsampling was introduced in RSJ which should avoid this error.

On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <mohit1...@gmail.com> wrote:

    I have a 1M X 6 dimensional matrix stored as sequence file and I am
trying to use rowSimilarity for this job...
But when I try to run the job, I see Java heap space error for the second
step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
My raw sequence file is around 700MB and then I have already set
MAHOUT_OPTS to (say) 7gb?
But I am still seeing that error?
My command line args are:

hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob -i
$INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess

Also, is this "r" a typo.. the help file says that this is column length?
Is it column or row dimension ?



"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."

Reply via email to