Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/3459#discussion_r20901112
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -28,13 +28,16 @@ import org.apache.spark.rdd.RDD
/**
* Model representing the result of matrix factorization.
*
+ * NB: If you create the model directly using constructor, please be aware
that fast prediction
+ * requires cached user/product features and the availability of their
partitioning information.
+ *
* @param rank Rank for the features in this model.
* @param userFeatures RDD of tuples where each tuple represents the
userId and
* the features computed for this user.
* @param productFeatures RDD of tuples where each tuple represents the
productId
* and the features computed for this product.
*/
-class MatrixFactorizationModel private[mllib] (
+class MatrixFactorizationModel(
--- End diff --
With this now public, it might be good to add either (a) one check upon
initialization doing a take(1) and comparing with rank, or (b) runtime checks
in the various methods in MatrixFactorizationModel. IMO, it's OK if not since
those would both add extra overhead, but perhaps there should be a warning for
the constructor noting that the arguments are not checked.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]