Re: Setting mahout heapsize for rowsimilarity job

Mohit Singh Fri, 23 May 2014 12:20:33 -0700

Basically, finding the N most similar vectors? Adding columns isnt a
problem, This is just to get a "feel" of mahout (in general).




On Fri, May 23, 2014 at 12:07 PM, Sebastian Schelter <[email protected]> wrote:

> I don't think you should use RowSimilarity job for that case, if you only
> have 6 columns.
>
> Can you tell us a little bit about the data and what problem your are
> trying to solve?
>
> --sebastian
>
>
>
> On 05/23/2014 09:03 PM, Suneel Marthi wrote:
>
>> I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9,
>> downsampling was introduced in RSJ which should avoid this error.
>>
>>
>> On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <[email protected]> wrote:
>>
>>  Hi,
>>>     I have a 1M X 6 dimensional matrix stored as sequence file and I am
>>> trying to use rowSimilarity for this job...
>>> But when I try to run the job, I see Java heap space error for the second
>>> step (RowSimilarityJob-CooccurrencesMapper-Reducer) .
>>> My raw sequence file is around 700MB and then I have already set
>>> MAHOUT_OPTS to (say) 7gb?
>>> But I am still seeing that error?
>>> My command line args are:
>>>
>>> hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
>>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob
>>> -i
>>> $INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess
>>>
>>> Also, is this "r" a typo.. the help file says that this is column length?
>>> Is it column or row dimension ?
>>>
>>> Thanks
>>>
>>> --
>>> Mohit
>>>
>>> "When you want success as badly as you want the air, then you will get
>>> it.
>>> There is no other secret of success."
>>> -Socrates
>>>
>>>
>>
>


-- 
Mohit

"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates

Re: Setting mahout heapsize for rowsimilarity job

Reply via email to