Re: pickling a udf

Abdeali Kothari Thu, 04 Apr 2019 04:35:48 -0700

The syntax looks right.
Are you still getting the error when you open a new python session and run
this same code ?
Are you running on your laptop with spark local mode or are you running
this on a yarn based cluster ?
It does seem like something in your python session isnt getting serialized
right. But does not look like it's related to this code snippet.


On Thu, Apr 4, 2019 at 3:49 PM Adaryl Wakefield <
adaryl.wakefi...@hotmail.com> wrote:

> Are we not supposed to be using udfs anymore? I copied an example straight
> from a book and I’m getting weird results and I think it’s because the book
> is using a much older version of Spark.  The code below is pretty straight
> forward but I’m getting an error none the less. I’ve been doing a bunch of
> googling and not getting much results.
>
>
>
> from pyspark.sql import SparkSession
>
> from pyspark.sql.functions import *
>
> from pyspark.sql.types import *
>
>
>
> spark = SparkSession \
>
>     .builder \
>
>     .appName("Python Spark SQL basic example") \
>
>     .getOrCreate()
>
>
>
> df = spark.read.csv("full201801.dat",header="true")
>
>
>
> columntransform = udf(lambda x: 'Non-Fat Dry Milk' if x == '23040010' else
> 'foo', StringType())
>
>
>
> df.select(df.PRODUCT_NC,
> columntransform(df.PRODUCT_NC).alias('COMMODITY')).show()
>
>
>
> Error.
>
> *Py4JJavaError*: An error occurred while calling o110.showString.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage
> 2.0 (TID 2, localhost, executor driver):
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>
>   File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 242, in
> main
>
>   File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 144, in
> read_udfs
>
>   File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 120, in
> read_single_udf
>
>   File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 60, in
> read_command
>
>   File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 171,
> in _read_with_length
>
>     return self.loads(obj)
>
>   File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 566,
> in loads
>
>     return pickle.loads(obj, encoding=encoding)
>
> TypeError: _fill_function() missing 4 required positional arguments:
> 'defaults', 'dict', 'module', and 'closure_values'
>
>
>
>
>
> B.
>
>
>
>
>
>
>

Re: pickling a udf

Reply via email to