felixcheung commented on a change in pull request #25789: [SPARK-28927][ML] 
Show warning when input data to ALS is indeterminate
URL: https://github.com/apache/spark/pull/25789#discussion_r324443661
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
 ##########
 @@ -920,6 +924,14 @@ object ALS extends DefaultParamsReadable[ALS] with 
Logging {
     require(intermediateRDDStorageLevel != StorageLevel.NONE,
       "ALS is not designed to run without persisting intermediate RDDs.")
 
+    // Indeterminate rating RDD causes inconsistent in/out blocks in case of 
rerun.
+    // It can cause runtime error when matching in/out user/item blocks.
+    if (ratings.outputDeterministicLevel == DeterministicLevel.INDETERMINATE) {
 
 Review comment:
   if I understanding this correctly, mismatch -> failure is only one possible 
outcome? it could also end up matching the wrong user/item before the index is 
wrong? that seems more subtle and much harder to detect

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to