Diana,

Here are the parameters:

./bin/spark-class org.apache.spark.mllib.recommendation.ALS
Usage: ALS <master> <ratings_file> <rank> <iterations> <output_dir>
[<lambda>] [<implicitPrefs>] [<alpha>] [<blocks>]

Master: Local/Deployed spark cluster master
ratings_file: Netflix format data
rank: Reduced dimension of the User and Product factors
iterations: How many ALS iterations you would like to run
output_dir: Where to generate the usera and product factors

lambda: regularization for nuclear norm
implicitPrefs: true will run Koren's netflix prize paper's implicit
algorithm
alpha: is valid for implicitPrefs
blocks: How many blocks you want to partition your rating, user and product
factor matrix

I have run with 64 GB JVM with 20M users, 1.6M ratings and 50
factors....you should be able to go even beyond that if you want to
increase the JVM size...

The scalability issue comes from the fact that each JVM has to collect
either user/product factors before doing a BLAS posv solve....seems like
that's the bottleneck step...but making double to float is one way to scale
it even further...

Thanks.
Deb



On Mon, Apr 28, 2014 at 10:30 AM, Diana Carroll <dcarr...@cloudera.com>wrote:

> Hi everyone.  I'm trying to run some of the Spark example code, and most
> of it appears to be undocumented (unless I'm missing something).  Can
> someone help me out?
>
> I'm particularly interested in running SparkALS, which wants parameters:
> M U F iter slices
>
> What are these variables?  They appear to be integers and the default
> values are 100, 500 and 10 respectively but beyond that...huh?
>
> Thanks!
>
> Diana
>

Reply via email to