Hi Yanbo, You mean pyspark.mllib.recommendation right? That is the one used in the official tutorial.
Thank you, From: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>> Date: Friday, 4 December 2015 03:17 To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> Cc: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Python API Documentation Mismatch Hi Roberto, There are two ALS available: ml.recommendation.ALS<http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation> and mllib.recommendation.ALS<http://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#module-pyspark.mllib.recommendation> . They have different usage and methods. I know it's confusion that Spark provide two version of the same algorithm. I strongly recommend to use the ALS algorithm at ML package. Yanbo 2015-12-04 1:31 GMT+08:00 Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>: Please open an issue in JIRA, thanks! On Thu, Dec 3, 2015 at 3:03 AM -0800, "Roberto Pagliari" <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> wrote: Hello, I believe there is a mismatch between the API documentation (1.5.2) and the software currently available. Not all functions mentioned here http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation are, in fact available. For example, the code below from the tutorial works # Build the recommendation model using Alternating Least Squaresrank = 10numIterations = 10model = ALS.train(ratings, rank, numIterations) While the alternative shown in the API documentation will not (it will complain that ALS takes no arguments. Also, but inspecting the module with Python utilities I could not find several methods mentioned in the API docs) >>> df = sqlContext.createDataFrame(... [(0, 0, 4.0), (0, 1, 2.0), (1, 1, >>> 3.0), (1, 2, 4.0), (2, 1, 1.0), (2, 2, 5.0)],... ["user", "item", >>> "rating"])>>> als = ALS(rank=10, maxIter=5)>>> model = als.fit(df) Thank you,