[GitHub] [spark] Yikun opened a new pull request #34285: [SPARK-36337][PYTHON][CORE] Switch pyrolite v4.30 to pickle v1.2 to fix decimal NaN issue

GitBox Thu, 14 Oct 2021 04:54:36 -0700


Yikun opened a new pull request #34285:
URL: https://github.com/apache/spark/pull/34285



   ### What changes were proposed in this pull request?
   Switch (or upgrade) the 
[irmen/pyrolite.pickle](https://github.com/irmen/Pyrolite/tree/master) v4.30 to 
[irmen/pickle](https://github.com/irmen/pickle) v1.2 in this patch
   
   ### Why are the changes needed?
   - Spark was using `Pyrolite.pickle` (v4.30) to pickle Java objects to python 
objects, but there was [a problem when pickling 
decimal(NaN)](https://github.com/irmen/pickle/issues/7) .
   - irmen/Pyrolite pickle is splited as separate 
[irmen/pickle](https://github.com/irmen/pickle) library after Pyrolite v5, the 
bugfix would not be backported to v4.x, that means we have to switch pyrolite 
to pickle.
   - The double NaN pickled issue solved in 
https://github.com/irmen/pickle/issues/7 in 
[irmen/pickle](https://github.com/irmen/pickle) v1.2
   
   So, We switch (or upgrade) the pyrolite.pickle to pickle in this patch.
   
   Before this patch:
   ```python
   >>> import decimal
   >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal')
   DataFrame[value: decimal(10,0)]
   >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], 
schema='decimal').collect()
   21/10/14 18:06:47 ERROR Executor: Exception in task 7.0 in stage 5.0 (TID 31)
   net.razorvine.pickle.PickleException: problem construction object: 
java.lang.reflect.InvocationTargetException
       at 
net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
       at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
       at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
       at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
       at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
       at 
org.apache.spark.api.python.SerDeUtil$.$anonfun$pythonToJava$2(SerDeUtil.scala:121)
       ... ...
   ```
   After this patch:
   ```python
   >>> import decimal
   >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal')
   DataFrame[value: decimal(10,0)]
   >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], 
schema='decimal').collect()
   [Row(value=None)]
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   ```
   >>> import decimal
   >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], 
schema='decimal').collect()
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Yikun opened a new pull request #34285: [SPARK-36337][PYTHON][CORE] Switch pyrolite v4.30 to pickle v1.2 to fix decimal NaN issue

Reply via email to