[GitHub] spark pull request #20759: Added description of checkpointInterval parameter
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20759#discussion_r205838324 --- Diff: docs/ml-collaborative-filtering.md --- @@ -19,6 +19,7 @@ by a small set of latent factors that can be used to predict missing entries. algorithm to learn these latent factors. The implementation in `spark.ml` has the following parameters: +* *checkpointInterval* helps with recovery when nodes fail and StackOverflow exceptions caused by long lineage. **Will be silently ignored if *SparkContext.CheckpointDir* is not set.** (defaults to 10). --- End diff -- Nit: StackOverflow exceptions -> either StackOverflowError or stack overflow errors. Also you're nesting `*` and `**` in the markdown; does that work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20759: Added description of checkpointInterval parameter
Github user MrMathias commented on a diff in the pull request: https://github.com/apache/spark/pull/20759#discussion_r173563217 --- Diff: docs/ml-collaborative-filtering.md --- @@ -19,6 +19,7 @@ by a small set of latent factors that can be used to predict missing entries. algorithm to learn these latent factors. The implementation in `spark.ml` has the following parameters: +* *checkpointInterval* helps with recovery when nodes fail and StackOverflow exceptions caused by long lineage. **Will be silently ignored if *SparkContext.CheckpointDir* is not set.** (defaults to 10). --- End diff -- Checkpointing exists to better deal with node failure and decrease memory consumption from lineage. This wording is taken from the parameter-comment in the ALS implementation itself, so I think it is fitting. This list of parameters is both a sub-set and unordered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20759: Added description of checkpointInterval parameter
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20759#discussion_r173379554 --- Diff: docs/ml-collaborative-filtering.md --- @@ -19,6 +19,7 @@ by a small set of latent factors that can be used to predict missing entries. algorithm to learn these latent factors. The implementation in `spark.ml` has the following parameters: +* *checkpointInterval* helps with recovery when nodes fail and StackOverflow exceptions caused by long lineage. **Will be silently ignored if *SparkContext.CheckpointDir* is not set.** (defaults to 10). --- End diff -- the wording is a bit severe... do we have to say node failure or stackoverflow (latter should be rare anyway?) also is this list of param sorted in any way? perhaps add checkpointInterval to the end? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20759: Added description of checkpointInterval parameter
GitHub user MrMathias opened a pull request: https://github.com/apache/spark/pull/20759 Added description of checkpointInterval parameter Current behavior of ALS and checkpointInterval can result in unexpected behavior, I have added explicit description to hopefully reduce confusion. ## What changes were proposed in this pull request? better documentation of ml.ALS ## How was this patch tested? compiled the docs You can merge this pull request into a Git repository by running: $ git pull https://github.com/MrMathias/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20759 commit 17a71c357bcc5ca68f3fd11f49bb61a91603527a Author: Mathias Andersen Date: 2018-03-07T13:50:20Z Added description of checkpointInterval parameter Current behavior of ALS and checkpointInterval can result in unexpected behavior, I have added explicit description to hopefully reduce confusion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org