[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-07 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39812228 I'm still tracking down exactly where this problem is coming from, but here's a little more detail on what's going wrong with Python accumulators when the closure passed

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39610852 So I just added this to `ClosureCleanerSuite` and I'm not seeing the same behavior. (I already had added a captured-field test to `ClosureCleanerSuite`.) I don't have

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39610969 (e.g. here: https://github.com/willb/spark/commit/12c63a7e03bce359fd7eb7faf0a054bd32f85824#diff-f949ef08cc8a2b36861af3beb4309a88R161) --- If your project is set up

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39613329 @mateiz, these naming and stylistic suggestions make sense; thanks! Cloning the closure in `runJob` is what caused Python accumulators to stop working. I will have

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39613399 (BTW, as a matter of style, is it better to rebase my branch or add another commit that makes the change?) --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39617193 OK, I've made the changes and will push updates after re-running tests locally. (I'll also follow up on the Python accumulators.) --- If your project is set up

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-04 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39626070 Matei, my latest commit addresses the style and naming issues; I'll need to dig in to the cause of the Python issue some more over the weekend. Thanks again for your

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-04-03 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39490073 Thanks for taking another look, Matei! I know there's a lot of stuff to get in before the merge window closes and appreciate the update. --- If your project is set up

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-03-31 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-39144319 It looks like this Travis error is the same one others have seen on the dev list -- that is, the hive test is timing out. --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-03-27 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-38839409 I have rebased this branch to remove the commit that took out the serializability check in `DAGScheduler`. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-03-21 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-38330588 @mateiz, I was pretty confused about this but it looks like the python accumulator tests are what is failing. I'm not super-familiar with pyspark yet but am trying

[GitHub] spark pull request: SPARK-729: Closures not always serialized at c...

2014-03-21 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/189#issuecomment-38340085 The Python accumulators still work on master and with the changes from #143 (which serializes the closure at closure cleaning but doesn't use the serialized value

<    1   2