[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-114877984 Thanks for the note, @andrewor14. @Forevian is not working with Spark lately, but I'm happy to take over this change from him. From a superficial look at the code it seems to me that the same approach would still work. It's a fantastically flexible solution yet it's entirely backward-compatible. Since the old code path would remain unchanged, there is also very little risk in it. So I'd like to dust it off and send a new pull request against the current master. I'd just like to ask first if you have any recommendations for avoiding the same fate as this pull request. Why was it never reviewed? Why did it not get any comments? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-114968139 Hi @darabos, in my experience I have noticed that PRs that do get attention are usually high priority issues or patches that many in the community are interested in. Pinging committers for review is a start, but it also really helps if the original author is active and responsive . In this particular case, @pwendell has been busy with many other things and hasn't been doing reviews for many patches, so maybe he is not the best person to ping. For core patches like this, you could ping me and I will either review it myself or try to triage them to the right person. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3346 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-113321056 @Forevian thanks for submitting the patch. Unfortunately it has mostly gone stale at this point and many changes have gone into master between now and when it was created. Since it's unlikely to be merged, would you mind closing this PR? Feel free to reopen it against the same issue and we can start the discussion there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-96770081 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-72533564 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user Forevian closed the pull request at: https://github.com/apache/spark/pull/1345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
GitHub user Forevian opened a pull request: https://github.com/apache/spark/pull/3346 [SPARK-2418] Custom checkpointing with an external function as parameter https://issues.apache.org/jira/browse/SPARK-2418 If a job consists of many shuffle heavy transformations the current resilience model might be unsatisfactory. In our current use-case we need a persistent checkpoint that we can use to save our RDDs on disk in a custom location and load it back even if the driver dies. (Possible other use cases: store the checkpointed data in various formats: SequenceFile, csv, Parquet file, MySQL etc.) After talking to Patrick Wendell at the Spark Summit 2014 we concluded that a checkpoint where one can customize the saving and RDD reloading behavior can be a good solution. I am open to further suggestions if you have better ideas about how to make checkpointing more flexible. *** Note1: I deleted the previous fork as I had messed that version up by some unsuccessful rebasing attempt. Note2: The contribution is my original work and I license the work to the project under the project's open source license. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Forevian/spark custom-checkpoint-f Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3346 commit f3c8efbb64dbff3bd4aa93161eefe75ff6c433a7 Author: Forevian andras.bar...@gmail.com Date: 2014-11-18T14:48:54Z Custom checkpointing by a provided function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-63482653 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user Forevian commented on the pull request: https://github.com/apache/spark/pull/3346#issuecomment-63482768 @pwendell, I have adopted it for the recent spark core master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user Forevian commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-63482943 I failed to rebase this to the most recent spark version properly, thus I have reforked spark and created another clean PR: https://github.com/apache/spark/pull/3346 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-63110318 Hey @Forevian would you have any interest in bringing this up to date for a contribution? This was brought up in the context of some other use cases, and I think it would be nice to have. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user Forevian commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-63116043 Sure, I will do that next week! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-55912866 @Forevian, can you please update it to merge cleanly? Then hunt down a reviewer! It would be great to have this in 1.2. It would make our code significantly more efficient. (Currently we save to S3 and load from S3 to checkpoint. With your change I think we could avoid the unnecessary loading.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-54694609 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-52090746 @Forevian is on vacation from tomorrow to next Tuesday. But if you have any questions I can try to answer until then. @pwendell, are you interested in this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
GitHub user Forevian opened a pull request: https://github.com/apache/spark/pull/1345 [SPARK-2418] Custom checkpointing with an external function as parameter https://issues.apache.org/jira/browse/SPARK-2418 If a job consists of many shuffle heavy transformations the current resilience model might be unsatisfactory. In our current use-case we need a persistent checkpoint that we can use to save our RDDs on disk in a custom location and load it back even if the driver dies. (Possible other use cases: store the checkpointed data in various formats: SequenceFile, csv, Parquet file, MySQL etc.) After talking to Patrick Wendell at the Spark Summit 2014 we concluded that a checkpoint where one can customize the saving and RDD reloading behavior can be a good solution. I am open to further suggestions if you have better ideas about how to make checkpointing more flexible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Forevian/spark custom-checkpoint Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1345 commit fe7a234e33baa0c5e9c0b90b1daaa00f8a29293d Author: Forevian andras.bar...@lynxanalytics.com Date: 2014-07-09T15:04:50Z Custom checkpointing with an external function as parameter commit 9c2adce783dbb61c81aff775fdb0f3216a36412f Author: András Barják andras.bar...@gmail.com Date: 2014-07-09T16:53:53Z parallelize - makeRDD to stay consistent with the other tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1345#issuecomment-48503192 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---