pig-0.17.0bin/pig -x local very basic UDF file:
#!/usr/bin/python3 from pig_util import outputSchema @outputSchema("as:int") def square(num): if num == None: return None return ((num) * (num)) @outputSchema("word:chararray") def concat(word): return word + word Exceedingly simple pig script: REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs; A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING PigStorage(',') AS (state:int,name:chararray); B = FOREACH A GENERATE myFuncs.square(state) AS state, name; If I do a "DUMP A" I get exactly what I would expect. But, on a "DUMP B", I get a failed job: java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506) grunt> Exception in thread "Thread-82" java.lang.NullPointerException: Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the return value of "org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)" is null at org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471) 2024-10-29 13:02:15,296 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - map > map ?