Yikun opened a new pull request #34285: URL: https://github.com/apache/spark/pull/34285
### What changes were proposed in this pull request? Switch (or upgrade) the [irmen/pyrolite.pickle](https://github.com/irmen/Pyrolite/tree/master) v4.30 to [irmen/pickle](https://github.com/irmen/pickle) v1.2 in this patch ### Why are the changes needed? - Spark was using `Pyrolite.pickle` (v4.30) to pickle Java objects to python objects, but there was [a problem when pickling decimal(NaN)](https://github.com/irmen/pickle/issues/7) . - irmen/Pyrolite pickle is splited as separate [irmen/pickle](https://github.com/irmen/pickle) library after Pyrolite v5, the bugfix would not be backported to v4.x, that means we have to switch pyrolite to pickle. - The double NaN pickled issue solved in https://github.com/irmen/pickle/issues/7 in [irmen/pickle](https://github.com/irmen/pickle) v1.2 So, We switch (or upgrade) the pyrolite.pickle to pickle in this patch. Before this patch: ```python >>> import decimal >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal') DataFrame[value: decimal(10,0)] >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal').collect() 21/10/14 18:06:47 ERROR Executor: Exception in task 7.0 in stage 5.0 (TID 31) net.razorvine.pickle.PickleException: problem construction object: java.lang.reflect.InvocationTargetException at net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) at org.apache.spark.api.python.SerDeUtil$.$anonfun$pythonToJava$2(SerDeUtil.scala:121) ... ... ``` After this patch: ```python >>> import decimal >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal') DataFrame[value: decimal(10,0)] >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal').collect() [Row(value=None)] ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ``` >>> import decimal >>> spark.createDataFrame(data=[decimal.Decimal('NaN')], schema='decimal').collect() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
