[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-23 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35851996 but why not just preventing users from overwriting the directory, no matter whether there is part-*? --- If your project is set up for it, you can reply to th

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-23 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35851751 I just went through the Spark Streaming document, it seems that it's safe to follow your suggestion @pwendell --- If your project is set up for it, you can r

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-23 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35842285 @pwendell the second situation can be avoided, sorry, just brain damaged..the only issue is if there is a component relies on the fact that Spark allows th

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-23 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35841703 @pwendell Thanks for the comments, I also considered what you mentioned, but will that prevent other components like Spark Streaming from doing the right job?

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-23 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35841445 Hey @CodingCat this approach has a few drawbacks. First, it will mean a pretty bad regression for some users. For instance, say that a user is calling saveAsHad

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-23 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35838665 OK, fixed some bugs and squashed the commits, I think it's ready for further review --- If your project is set up for it, you can reply to this email and have

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-21 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35761128 I run the following command scala> val a = sc.textFile("/Users/nanzhu/code/incubator-spark/LICENSE", 4).map(line => ("a", "b")) scala> val a =

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-21 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35760516 @mridulm I tested that and found that it is actually not handled in Spark, ![abc](https://f.cloud.github.com/assets/678008/2233295/4a6efde6-9b28-11e3-

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-21 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35731273 @jyotiska that would be nice! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-21 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35731089 @mridulm Thank you for telling me the standard solution, I will revise my patch today. I learnt a lot from the discussion with you in my other patches --- If

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-20 Thread jyotiska
Github user jyotiska commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35704511 I think it is a good idea to add an extra flag for overwriting. If the flag is not present, Spark should throw an exception. I will see if the bug is also prese

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-20 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35703455 Typically, the way this gets done is - write to a temporary directory, taking care of multiple attempts for same partition (failure case)/multiple concurrent exe

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/incubator-spark/pull/626#issuecomment-35701775 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To

[GitHub] incubator-spark pull request: [SPARK-1100] prevent Spark from over...

2014-02-20 Thread CodingCat
GitHub user CodingCat opened a pull request: https://github.com/apache/incubator-spark/pull/626 [SPARK-1100] prevent Spark from overwriting directory silently and leaving dirty directory Thanks for Diana Carroll to report this issue the current saveAsTextFile/SequenceFile w