Basically, finding the N most similar vectors? Adding columns isnt a problem, This is just to get a "feel" of mahout (in general).
On Fri, May 23, 2014 at 12:07 PM, Sebastian Schelter <[email protected]> wrote: > I don't think you should use RowSimilarity job for that case, if you only > have 6 columns. > > Can you tell us a little bit about the data and what problem your are > trying to solve? > > --sebastian > > > > On 05/23/2014 09:03 PM, Suneel Marthi wrote: > >> I had seen this issue too with RSJ until 0.8. Switch to using Mahout 0.9, >> downsampling was introduced in RSJ which should avoid this error. >> >> >> On Fri, May 23, 2014 at 2:59 PM, Mohit Singh <[email protected]> wrote: >> >> Hi, >>> I have a 1M X 6 dimensional matrix stored as sequence file and I am >>> trying to use rowSimilarity for this job... >>> But when I try to run the job, I see Java heap space error for the second >>> step (RowSimilarityJob-CooccurrencesMapper-Reducer) . >>> My raw sequence file is around 700MB and then I have already set >>> MAHOUT_OPTS to (say) 7gb? >>> But I am still seeing that error? >>> My command line args are: >>> >>> hadoop jar /usr/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar >>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob >>> -i >>> $INPUT -o $OUTPUT *-r 6 *-s SIMILARITY_COSINE -m 15 --tempDir $TEMP -ess >>> >>> Also, is this "r" a typo.. the help file says that this is column length? >>> Is it column or row dimension ? >>> >>> Thanks >>> >>> -- >>> Mohit >>> >>> "When you want success as badly as you want the air, then you will get >>> it. >>> There is no other secret of success." >>> -Socrates >>> >>> >> > -- Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates
