viirya commented on a change in pull request #25789: [SPARK-28927][ML] Show
warning when input data to ALS is indeterminate
URL: https://github.com/apache/spark/pull/25789#discussion_r324478452
##########
File path: R/pkg/R/mllib_recommendation.R
##########
@@ -82,6 +82,10 @@ setClass("ALSModel", representation(jobj = "jobj"))
#' statsS <- summary(modelS)
#' }
#' @note spark.als since 2.1.0
+#' @note the input rating dataframe to the ALS implementation should not be
indeterminate.
Review comment:
I think checkpoint is relatively reliable. In case of checkpoint loss, Spark
job fails without rerun. So you should not get an inconsistent data once you do
checkpoint.
We have two ways to fix it, one is checkpoint, another is to sort data
before sample/randomSplit. I added into the updated note.
Sounds like targeting a specific problem here is better. I do the catching
AIOOBE thing and remove the warning as it seems not too much useful.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]