[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...

2018-10-08 Thread AbdealiJK
Github user AbdealiJK commented on the issue:

https://github.com/apache/spark/pull/22635
  
@cloud-fan @viirya Any chance of this making it into 2.4 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-05 Thread AbdealiJK
Github user AbdealiJK commented on a diff in the pull request:

https://github.com/apache/spark/pull/22635#discussion_r222954381
  
--- Diff: python/pyspark/accumulators.py ---
@@ -109,10 +109,14 @@
 
 def _deserialize_accumulator(aid, zero_value, accum_param):
 from pyspark.accumulators import _accumulatorRegistry
-accum = Accumulator(aid, zero_value, accum_param)
-accum._deserialized = True
-_accumulatorRegistry[aid] = accum
-return accum
+# If this certain accumulator was deserialized, don't overwrite it.
+if aid in _accumulatorRegistry:
--- End diff --

I see - got it :+1:


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-04 Thread AbdealiJK
Github user AbdealiJK commented on a diff in the pull request:

https://github.com/apache/spark/pull/22635#discussion_r222899988
  
--- Diff: python/pyspark/accumulators.py ---
@@ -109,10 +109,14 @@
 
 def _deserialize_accumulator(aid, zero_value, accum_param):
 from pyspark.accumulators import _accumulatorRegistry
-accum = Accumulator(aid, zero_value, accum_param)
-accum._deserialized = True
-_accumulatorRegistry[aid] = accum
-return accum
+# If this certain accumulator was deserialized, don't overwrite it.
+if aid in _accumulatorRegistry:
--- End diff --

That doesnt seem right because the constructor for `Accumulator` has:
```
...
self._deserialized = False
_accumulatorRegistry[aid] = self
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...

2018-10-04 Thread AbdealiJK
Github user AbdealiJK commented on a diff in the pull request:

https://github.com/apache/spark/pull/22635#discussion_r222890103
  
--- Diff: python/pyspark/accumulators.py ---
@@ -109,10 +109,14 @@
 
 def _deserialize_accumulator(aid, zero_value, accum_param):
 from pyspark.accumulators import _accumulatorRegistry
-accum = Accumulator(aid, zero_value, accum_param)
-accum._deserialized = True
-_accumulatorRegistry[aid] = accum
-return accum
+# If this certain accumulator was deserialized, don't overwrite it.
+if aid in _accumulatorRegistry:
--- End diff --

Should it be `if aid in _accumulatorRegistry and 
_accumulatorRegistry[aid]._deserialized is True`
or:
```
if aid in _accumulatorRegistry:
_accumulatorRegistry[aid]._deserialize = True
return _accumulatorRegistry[aid]
```
To make double sure that this function always returns a deserialize version 
of the accum ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org