Hi, from pyspark.sql.functions import pandas_udf, PandasUDFType import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame( [(1, True, 1.0, 'aa'), (1, False, 2.0, 'aa'), (2, True, 3.0, 'aa'), (2, True, 5.0, 'aa'), (2, True, 10.0, 'aa')], ("id", "is_deleted", "v", "a")) @pandas_udf("id long, is_deleted boolean, v double, a string", PandasUDFType.GROUPED_MAP) def subtract_mean(pdf): v = pdf.v return pdf.assign(v=v - v.mean()) df.groupby("id").apply(subtract_mean).show() ----> works fine. @pandas_udf('double', PandasUDFType.SCALAR) def pandas_plus_one(v): return v + 1 df.withColumn('v2', pandas_plus_one(df.v)).show() --->* throw error* Error: Traceback (most recent call last): File "<input>", line 1, in <module> File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 380, in show print(self._jdf.showString(n, 20, vertical)) File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) IllegalArgumentException: u'Unsupported class file major version 56' Any idea? Appreciate any help! https://stackoverflow.com/questions/53583199/pyspark-error-unsupported-class-file-major-version-55 does not help since I have already been using java 1.8. I am using: pyspark 2.4.4 pandas 0.23.4 numpy 1.14.5 pyarrow 0.10.0 Note that I got numpy, pyarrow, pandas versions from https://stackoverflow.com/questions/51713705/python-pandas-udf-spark-error which help to make UDAF subtract_mean work.