[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-06-24 Thread darabos
Github user darabos commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-114877984
  
Thanks for the note, @andrewor14. @Forevian is not working with Spark 
lately, but I'm happy to take over this change from him. From a superficial 
look at the code it seems to me that the same approach would still work. It's a 
fantastically flexible solution yet it's entirely backward-compatible. Since 
the old code path would remain unchanged, there is also very little risk in it.

So I'd like to dust it off and send a new pull request against the current 
master. I'd just like to ask first if you have any recommendations for avoiding 
the same fate as this pull request. Why was it never reviewed? Why did it not 
get any comments? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-06-24 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-114968139
  
Hi @darabos, in my experience I have noticed that PRs that do get attention 
are usually high priority issues or patches that many in the community are 
interested in. Pinging committers for review is a start, but it also really 
helps if the original author is active and responsive .

In this particular case, @pwendell has been busy with many other things and 
hasn't been doing reviews for many patches, so maybe he is not the best person 
to ping. For core patches like this, you could ping me and I will either review 
it myself or try to triage them to the right person.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-06-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3346


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-06-18 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-113321056
  
@Forevian thanks for submitting the patch. Unfortunately it has mostly gone 
stale at this point and many changes have gone into master between now and when 
it was created. Since it's unlikely to be merged, would you mind closing this 
PR? Feel free to reopen it against the same issue and we can start the 
discussion there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-96770081
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2015-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-72533564
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-18 Thread Forevian
Github user Forevian closed the pull request at:

https://github.com/apache/spark/pull/1345


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-18 Thread Forevian
GitHub user Forevian opened a pull request:

https://github.com/apache/spark/pull/3346

[SPARK-2418] Custom checkpointing with an external function as parameter

https://issues.apache.org/jira/browse/SPARK-2418

If a job consists of many shuffle heavy transformations the current 
resilience model might be unsatisfactory. In our current use-case we need a 
persistent checkpoint that we can use to save our RDDs on disk in a custom 
location and load it back even if the driver dies. (Possible other use cases: 
store the checkpointed data in various formats: SequenceFile, csv, Parquet 
file, MySQL etc.)
After talking to Patrick Wendell at the Spark Summit 2014 we concluded that 
a checkpoint where one can customize the saving and RDD reloading behavior can 
be a good solution. I am open to further suggestions if you have better ideas 
about how to make checkpointing more flexible.

***

Note1: I deleted the previous fork as I had messed that version up by some 
unsuccessful rebasing attempt.
Note2: The contribution is my original work and I license the work to the 
project under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Forevian/spark custom-checkpoint-f

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3346


commit f3c8efbb64dbff3bd4aa93161eefe75ff6c433a7
Author: Forevian andras.bar...@gmail.com
Date:   2014-11-18T14:48:54Z

Custom checkpointing by a provided function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-63482653
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-18 Thread Forevian
Github user Forevian commented on the pull request:

https://github.com/apache/spark/pull/3346#issuecomment-63482768
  
@pwendell, I have adopted it for the recent spark core master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-18 Thread Forevian
Github user Forevian commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-63482943
  
I failed to rebase this to the most recent spark version properly, thus I 
have reforked spark and created another clean PR: 
https://github.com/apache/spark/pull/3346


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-63110318
  
Hey @Forevian would you have any interest in bringing this up to date for a 
contribution? This was brought up in the context of some other use cases, and I 
think it would be nice to have.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-11-14 Thread Forevian
Github user Forevian commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-63116043
  
Sure, I will do that next week!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-09-17 Thread darabos
Github user darabos commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-55912866
  
@Forevian, can you please update it to merge cleanly? Then hunt down a 
reviewer! It would be great to have this in 1.2. It would make our code 
significantly more efficient. (Currently we save to S3 and load from S3 to 
checkpoint. With your change I think we could avoid the unnecessary loading.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-54694609
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-08-13 Thread darabos
Github user darabos commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-52090746
  
@Forevian is on vacation from tomorrow to next Tuesday. But if you have any 
questions I can try to answer until then. @pwendell, are you interested in this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-07-09 Thread Forevian
GitHub user Forevian opened a pull request:

https://github.com/apache/spark/pull/1345

[SPARK-2418] Custom checkpointing with an external function as parameter

https://issues.apache.org/jira/browse/SPARK-2418

If a job consists of many shuffle heavy transformations the current 
resilience model might be unsatisfactory. In our current use-case we need a 
persistent checkpoint that we can use to save our RDDs on disk in a custom 
location and load it back even if the driver dies. (Possible other use cases: 
store the checkpointed data in various formats: SequenceFile, csv, Parquet 
file, MySQL etc.)
After talking to Patrick Wendell at the Spark Summit 2014 we concluded that 
a checkpoint where one can customize the saving and RDD reloading behavior can 
be a good solution. I am open to further suggestions if you have better ideas 
about how to make checkpointing more flexible.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Forevian/spark custom-checkpoint

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1345


commit fe7a234e33baa0c5e9c0b90b1daaa00f8a29293d
Author: Forevian andras.bar...@lynxanalytics.com
Date:   2014-07-09T15:04:50Z

Custom checkpointing with an external function as parameter

commit 9c2adce783dbb61c81aff775fdb0f3216a36412f
Author: András Barják andras.bar...@gmail.com
Date:   2014-07-09T16:53:53Z

parallelize - makeRDD

to stay consistent with the other tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2418] Custom checkpointing with an exte...

2014-07-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1345#issuecomment-48503192
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---