[GitHub] spark issue #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting deserializ...
Github user AbdealiJK commented on the issue: https://github.com/apache/spark/pull/22635 @cloud-fan @viirya Any chance of this making it into 2.4 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...
Github user AbdealiJK commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r222954381 --- Diff: python/pyspark/accumulators.py --- @@ -109,10 +109,14 @@ def _deserialize_accumulator(aid, zero_value, accum_param): from pyspark.accumulators import _accumulatorRegistry -accum = Accumulator(aid, zero_value, accum_param) -accum._deserialized = True -_accumulatorRegistry[aid] = accum -return accum +# If this certain accumulator was deserialized, don't overwrite it. +if aid in _accumulatorRegistry: --- End diff -- I see - got it :+1: --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...
Github user AbdealiJK commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r222899988 --- Diff: python/pyspark/accumulators.py --- @@ -109,10 +109,14 @@ def _deserialize_accumulator(aid, zero_value, accum_param): from pyspark.accumulators import _accumulatorRegistry -accum = Accumulator(aid, zero_value, accum_param) -accum._deserialized = True -_accumulatorRegistry[aid] = accum -return accum +# If this certain accumulator was deserialized, don't overwrite it. +if aid in _accumulatorRegistry: --- End diff -- That doesnt seem right because the constructor for `Accumulator` has: ``` ... self._deserialized = False _accumulatorRegistry[aid] = self ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22635: [SPARK-25591][PySpark][SQL] Avoid overwriting des...
Github user AbdealiJK commented on a diff in the pull request: https://github.com/apache/spark/pull/22635#discussion_r222890103 --- Diff: python/pyspark/accumulators.py --- @@ -109,10 +109,14 @@ def _deserialize_accumulator(aid, zero_value, accum_param): from pyspark.accumulators import _accumulatorRegistry -accum = Accumulator(aid, zero_value, accum_param) -accum._deserialized = True -_accumulatorRegistry[aid] = accum -return accum +# If this certain accumulator was deserialized, don't overwrite it. +if aid in _accumulatorRegistry: --- End diff -- Should it be `if aid in _accumulatorRegistry and _accumulatorRegistry[aid]._deserialized is True` or: ``` if aid in _accumulatorRegistry: _accumulatorRegistry[aid]._deserialize = True return _accumulatorRegistry[aid] ``` To make double sure that this function always returns a deserialize version of the accum ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org