Re: running SparkALS

Diana Carroll Mon, 28 Apr 2014 12:36:27 -0700

Should I file a JIRA to remove the example?  I think it is confusing to
include example code without explanation of how to run it, and it sounds
like this one isn't worth running or reviewing anyway.





On Mon, Apr 28, 2014 at 2:34 PM, Debasish Das <debasish.da...@gmail.com>wrote:

> Don't use SparkALS...that's the first version of the code and does not
> scale...
>
> Li is right...you have to do the dictionary generation on users, products
> and then generate indexed file....I wrote some utilities but looks like it
> is application dependent....the indexed netflix format is more generic...
>
>
> On Mon, Apr 28, 2014 at 10:47 AM, Diana Carroll <dcarr...@cloudera.com>wrote:
>
>> Thanks, Deb.  But I'm looking at  org.apache.spark.examples.SparkALS,
>> which is not in the mllib examples, and does not take any file parameters.
>>
>> I don't see the class you refer to in the examples ...however, if I did
>> want to run that example, where would I find the file in question?
>>
>> It would be great if this were documented, perhaps in the source code.
>>  I'll add a JIRA.
>>
>> Thanks,
>> Diana
>>
>>
>> On Mon, Apr 28, 2014 at 1:41 PM, Debasish Das 
>> <debasish.da...@gmail.com>wrote:
>>
>>> Diana,
>>>
>>> Here are the parameters:
>>>
>>> ./bin/spark-class org.apache.spark.mllib.recommendation.ALS
>>> Usage: ALS <master> <ratings_file> <rank> <iterations> <output_dir>
>>> [<lambda>] [<implicitPrefs>] [<alpha>] [<blocks>]
>>>
>>> Master: Local/Deployed spark cluster master
>>> ratings_file: Netflix format data
>>> rank: Reduced dimension of the User and Product factors
>>> iterations: How many ALS iterations you would like to run
>>> output_dir: Where to generate the usera and product factors
>>>
>>> lambda: regularization for nuclear norm
>>> implicitPrefs: true will run Koren's netflix prize paper's implicit
>>> algorithm
>>> alpha: is valid for implicitPrefs
>>> blocks: How many blocks you want to partition your rating, user and
>>> product factor matrix
>>>
>>> I have run with 64 GB JVM with 20M users, 1.6M ratings and 50
>>> factors....you should be able to go even beyond that if you want to
>>> increase the JVM size...
>>>
>>> The scalability issue comes from the fact that each JVM has to collect
>>> either user/product factors before doing a BLAS posv solve....seems like
>>> that's the bottleneck step...but making double to float is one way to scale
>>> it even further...
>>>
>>> Thanks.
>>> Deb
>>>
>>>
>>>
>>> On Mon, Apr 28, 2014 at 10:30 AM, Diana Carroll 
>>> <dcarr...@cloudera.com>wrote:
>>>
>>>> Hi everyone.  I'm trying to run some of the Spark example code, and
>>>> most of it appears to be undocumented (unless I'm missing something).  Can
>>>> someone help me out?
>>>>
>>>> I'm particularly interested in running SparkALS, which wants parameters:
>>>> M U F iter slices
>>>>
>>>> What are these variables?  They appear to be integers and the default
>>>> values are 100, 500 and 10 respectively but beyond that...huh?
>>>>
>>>> Thanks!
>>>>
>>>> Diana
>>>>
>>>
>>>
>>
>

Re: running SparkALS

Reply via email to