[ https://issues.apache.org/jira/browse/SPARK-30473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021695#comment-17021695 ]
Hyukjin Kwon commented on SPARK-30473: -------------------------------------- This was fixed in the upstream master by upgrading cloudpickle. > PySpark enum subclass crashes when used inside UDF > -------------------------------------------------- > > Key: SPARK-30473 > URL: https://issues.apache.org/jira/browse/SPARK-30473 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.4 > Environment: Databricks Runtime 6.2 (includes Apache Spark 2.4.4, > Scala 2.11) > Reporter: Max Härtwig > Priority: Major > > PySpark enum subclass crashes when used inside a UDF. > > Example: > {code:java} > from enum import Enum > class Direction(Enum): > NORTH = 0 > SOUTH = 1 > {code} > > Working: > {code:java} > Direction.NORTH{code} > > Crashing: > {code:java} > @udf > def fn(a): > Direction.NORTH > return "" > df.withColumn("test", fn("a")){code} > > Stacktrace: > {noformat} > SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed > 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 235, > 10.139.64.21, executor 0): org.apache.spark.api.python.PythonException: > Traceback (most recent call last): > File "/databricks/spark/python/pyspark/serializers.py", line 182, in > _read_with_length return self.loads(obj) > File "/databricks/spark/python/pyspark/serializers.py", line 695, in > loads return pickle.loads(obj, encoding=encoding) > File "/databricks/python/lib/python3.7/enum.py", line 152, in __new__ > enum_members = {k: classdict[k] for k in classdict._member_names} > AttributeError: 'dict' object has no attribute '_member_names'{noformat} > > I suspect the problem is in *python/pyspark/cloudpickle.py*. On line 586 in > the function *_save_dynamic_enum*, the attribute *_member_names* is removed > from the enum. Yet, this attribute is required by the *Enum* class. This > results in all Enum subclasses crashing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org