Re: Mahout rowSimilarity

Rohit Jain Tue, 03 May 2016 22:55:34 -0700

And If yes, can you please help me with what exactly do you mean by "You
can then just write some simple pre processing code that converts your
database files to the appropriate format for Mahout and read it in as an
indexed dataset."


On Wed, May 4, 2016 at 11:21 AM, Rohit Jain <rohitkjai...@gmail.com> wrote:

> Hello Nikaash,
> So you mean I need to first read data from my mogodb using scala's mongo
> driver and then convert it into indexed datasets. And then process it using
> row similarity?
>
> On Wed, May 4, 2016 at 7:56 AM, Nikaash Puri <nikaashp...@gmail.com>
> wrote:
>
>> Hi Rohit,
>>
>> This would be a good place to start.
>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
>> <
>> https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/drivers/RowSimilarityDriver.scala
>> >
>>
>> This bit of code, in particular is how to call the spark-rowsimilarity
>> from Scala:
>>
>> val rowSimilarityIDS =
>> SimilarityAnalysis.rowSimilarityIDS(indexedDataset,…)
>>
>> You can then just write some simple pre processing code that converts
>> your database files to the appropriate format for Mahout and read it in as
>> an indexed dataset.
>>
>> This is another great end to end example that achieves a similar result
>> using spark-itemsimilarity.
>> https://mahout.apache.org/users/environment/how-to-build-an-app.html <
>> https://mahout.apache.org/users/environment/how-to-build-an-app.html>
>>
>> Let me know if you need more help.
>>
>> Thank you,
>> Nikaash Puri
>> > On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjai...@gmail.com> wrote:
>> >
>> > Hello Pat,
>> > Can you please explain it in little detail. I didn't understand how to
>> go
>> > about it.
>> >
>> > On Tue, May 3, 2016 at 9:08 PM, Pat Ferrel <p...@occamsmachete.com>
>> wrote:
>> >
>> >> Sure, but at least some would be Scala. There are examples in Mahout
>> that
>> >> take PairRDDs as input but anything that constructs an IndexedDataset
>> would
>> >> be fine. I use this code in a system that creates an RDD from HBase.
>> Think
>> >> of the task as one of how to create a Spark RDD from your DB content.
>> >>
>> >> On May 3, 2016, at 4:32 AM, Rohit Jain <rohitkjai...@gmail.com> wrote:
>> >>
>> >> Hello Everyone,
>> >> I have products and there are certain associated tags to each product.
>> So
>> >> to find similar products I am using mahout spark-rowsimilarity
>> algorithm in
>> >> following manner.
>> >>
>> >> $MAHOUT_HOME/mahout spark-rowsimilarity -i hdfs://
>> 0.0.0.0:9000/wtrousers
>> >> -o
>> >> hdfs://0.0.0.0:9000/s_trousers_out1/ -D:spark.io.compression.=lzf -ma
>> >> spark://0.0.0.0:7077
>> >> To run this command I need to pull data from database to flat file. Is
>> >> there anyway I can use this command / write java code  directly to
>> work on
>> >> database?
>> >>
>> >> --
>> >> Thanks & Regards,
>> >>
>> >> *Rohit Jain*
>> >> Web developer | Consultant
>> >> Mob +91 8097283931
>> >>
>> >>
>> >
>> >
>> > --
>> > Thanks & Regards,
>> >
>> > *Rohit Jain*
>> > Web developer | Consultant
>> > Mob +91 8097283931
>>
>>
>
>
> --
> Thanks & Regards,
>
> *Rohit Jain*
> Web developer | Consultant
> Mob +91 8097283931
>



-- 
Thanks & Regards,

*Rohit Jain*
Web developer | Consultant
Mob +91 8097283931

Re: Mahout rowSimilarity

Reply via email to