Hi all, There are a few algorithms in pyspark where the prediction part is implemented in scala (e.g. ALS, decision trees) where it is not very easy to manipulate the prediction methods.
I think it is a very common scenario that the user would like to generate prediction for a datasets, so that each predicted value is identifiable (e.g. have a unique id attached to it). this is not possible in the current implementation as predict functions take a feature vector and return the predicted values where, I believe, the order is not guaranteed, so there is no way to join it back with the original data the predictions are generated from. Is there a way around this at the moment? thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pass-unique-ID-to-mllib-algorithms-pyspark-tp18051.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org