Hello; I am using the ALS recommendation MLLibb. To select the optimal rank, I have a number of users who used multiple items as my test. I then get the prediction on these users and compare it to the observed. I use the RegressionMetrics to estimate the R^2. I keep getting a negative value. r2 = -1.18966999676 explained var = -1.18955347415 count = 11620309 Here is my Pyspark code :
train1.cache() test1.cache() numIterations =10 for i in range(10) : rank = int(40+i*10) als = ALS(rank=rank, maxIter=numIterations,implicitPrefs=False) model = als.fit(train1) predobs = model.transform(test1).select("prediction","rating").map(lambda p : (p.prediction,p.rating)).filter(lambda p: (math.isnan(p[0]) == False)) metrics = RegressionMetrics(predobs) mycount = predobs.count() myr2 = metrics.r2 myvar = metrics.explainedVariance print "hooo",rank, " r2 = ",myr2, "explained var = ", myvar, "count = ",mycount -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-the-RegressionMetrics-produce-negative-R2-and-explained-variance-tp23779.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org