Diana, Here are the parameters:
./bin/spark-class org.apache.spark.mllib.recommendation.ALS Usage: ALS <master> <ratings_file> <rank> <iterations> <output_dir> [<lambda>] [<implicitPrefs>] [<alpha>] [<blocks>] Master: Local/Deployed spark cluster master ratings_file: Netflix format data rank: Reduced dimension of the User and Product factors iterations: How many ALS iterations you would like to run output_dir: Where to generate the usera and product factors lambda: regularization for nuclear norm implicitPrefs: true will run Koren's netflix prize paper's implicit algorithm alpha: is valid for implicitPrefs blocks: How many blocks you want to partition your rating, user and product factor matrix I have run with 64 GB JVM with 20M users, 1.6M ratings and 50 factors....you should be able to go even beyond that if you want to increase the JVM size... The scalability issue comes from the fact that each JVM has to collect either user/product factors before doing a BLAS posv solve....seems like that's the bottleneck step...but making double to float is one way to scale it even further... Thanks. Deb On Mon, Apr 28, 2014 at 10:30 AM, Diana Carroll <dcarr...@cloudera.com>wrote: > Hi everyone. I'm trying to run some of the Spark example code, and most > of it appears to be undocumented (unless I'm missing something). Can > someone help me out? > > I'm particularly interested in running SparkALS, which wants parameters: > M U F iter slices > > What are these variables? They appear to be integers and the default > values are 100, 500 and 10 respectively but beyond that...huh? > > Thanks! > > Diana >