Changing how we compute release hashes

2018-03-15 Thread Nicholas Chammas
To verify that I’ve downloaded a Hadoop release correctly, I can just do this: $ shasum --check hadoop-2.7.5.tar.gz.sha256 hadoop-2.7.5.tar.gz: OK However, since we generate Spark release hashes with GPG

Re: Accumulators of Spark 1.x no longer work with Spark 2.x

2018-03-15 Thread Sergey Zhemzhitsky
One more option is to override writeReplace [1] in LegacyAccumulatorWrapper to prevent such failures. What do you think? [1] https://github.com/apache/spark/blob/4f5bad615b47d743b8932aea1071652293981604/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L158 On Fri, Mar 16, 2018 at

Accumulators of Spark 1.x no longer work with Spark 2.x

2018-03-15 Thread Sergey Zhemzhitsky
Hi there, I've noticed that accumulators of Spark 1.x no longer work with Spark 2.x failing with java.lang.AssertionError: assertion failed: copyAndReset must return a zero value copy It happens while serializing an accumulator here [1] although copyAndReset returns zero-value copy for sure,

RDD checkpoint failures in case of insufficient memory

2018-03-15 Thread Sergey Zhemzhitsky
Hi there, A while ago running GraphX jobs I've discovered that PeriodicRDDCheckpointer fails with FileNotFoundException's in case of insufficient memory resources. I believe that any iterative job which uses PeriodicRDDCheckpointer (like ML) suffers from the same issue (but it's not visible