Here is an example that takes a PairRDD, which is an RDD of pairs of strings. The row-id and column-id are expected in the pair. This method inputs each element in the sparse matrix individually. So if the row-id is a user-id and the column-id is an item-id it will turn them into an IndexedDatasetSpark, which is essentially 2 BiMaps (one for users, one for items) and a DRM. Once you have the IndexedDataset pass it to SimiarityAnalysis. https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/indexeddataset/IndexedDatasetSpark.scala#L68
On May 4, 2016, at 6:12 AM, Rohit Jain <rohitkjai...@gmail.com> wrote: I am still looking searching for my answer. It will be great if somebody can help me with this :) On Wed, May 4, 2016 at 11:25 AM, Rohit Jain <rohitkjai...@gmail.com> wrote: > And If yes, can you please help me with what exactly do you mean by "You > can then just write some simple pre processing code that converts your > database files to the appropriate format for Mahout and read it in as an > indexed dataset." > > On Wed, May 4, 2016 at 11:21 AM, Rohit Jain <rohitkjai...@gmail.com> > wrote: > >> Hello Nikaash, >> So you mean I need to first read data from my mogodb using scala's mongo >> driver and then convert it into indexed datasets. And then process it using >> row similarity? >> >> On Wed, May 4, 2016 at 7:56 AM, Nikaash Puri <nikaashp...@gmail.com> >> wrote: >> >>> Hi Rohit, >>> >>> This would be a good place to start. >>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala >>> < >>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala >>>> >>> >>> This bit of code, in particular is how to call the spark-rowsimilarity >>> from Scala: >>> >>> val rowSimilarityIDS = >>> SimilarityAnalysis.rowSimilarityIDS(indexedDataset,…) >>> >>> You can then just write some simple pre processing code that converts >>> your database files to the appropriate format for Mahout and read it in as >>> an indexed dataset. >>> >>> This is another great end to end example that achieves a similar result >>> using spark-itemsimilarity. >>> https://mahout.apache.org/users/environment/how-to-build-an-app.html < >>> https://mahout.apache.org/users/environment/how-to-build-an-app.html> >>> >>> Let me know if you need more help. >>> >>> Thank you, >>> Nikaash Puri >>>> On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjai...@gmail.com> wrote: >>>> >>>> Hello Pat, >>>> Can you please explain it in little detail. I didn't understand how to >>> go >>>> about it. >>>> >>>> On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel <p...@occamsmachete.com> >>> wrote: >>>> >>>>> Sure, but at least some would be Scala. There are examples in Mahout >>> that >>>>> take PairRDDs as input but anything that constructs an IndexedDataset >>> would >>>>> be fine. I use this code in a system that creates an RDD from HBase. >>> Think >>>>> of the task as one of how to create a Spark RDD from your DB content. >>>>> >>>>> On May 3, 2016, at 4:32 AM, Rohit Jain <rohitkjai...@gmail.com> >>> wrote: >>>>> >>>>> Hello Everyone, >>>>> I have products and there are certain associated tags to each >>> product. So >>>>> to find similar products I am using mahout spark-rowsimilarity >>> algorithm in >>>>> following manner. >>>>> >>>>> $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs:// >>> 0.0.0.0:9000/wtrousers >>>>> -o >>>>> hdfs://0.0.0.0:9000/s_trousers_out1/ -D:spark.io.compression.=lzf -ma >>>>> spark://0.0.0.0:7077 >>>>> To run this command I need to pull data from database to flat file. Is >>>>> there anyway I can use this command / write java code directly to >>> work on >>>>> database? >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> >>>>> *Rohit Jain* >>>>> Web developer | Consultant >>>>> Mob +91 8097283931 >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> *Rohit Jain* >>>> Web developer | Consultant >>>> Mob +91 8097283931 >>> >>> >> >> >> -- >> Thanks & Regards, >> >> *Rohit Jain* >> Web developer | Consultant >> Mob +91 8097283931 >> > > > > -- > Thanks & Regards, > > *Rohit Jain* > Web developer | Consultant > Mob +91 8097283931 > -- Thanks & Regards, *Rohit Jain* Web developer | Consultant Mob +91 8097283931