Hmmm, 1) So, I tried backing up to pig 16. No joy.
2) Then, I checked that Java UDFs worked, and yea, that was good. 3) So, then I backed up to Pig 13...as I'm pretty sure this had worked in the past at that version. Low and behold, it did actually work. So, it seems that if you want to do python UDFs (the language my students actually know...), you will need to use an older version of Pig. To be fair, I haven't checked 14 or 15...but 16 forward don't appear to be a great plan. hth, mew On Wed, Oct 30, 2024 at 4:07 PM Mark Woodcock <woodc...@usna.edu> wrote: > pig-0.17.0bin/pig -x local > > very basic UDF file: > > #!/usr/bin/python3 > > from pig_util import outputSchema > > @outputSchema("as:int") > def square(num): > if num == None: > return None > return ((num) * (num)) > > @outputSchema("word:chararray") > def concat(word): > return word + word > > Exceedingly simple pig script: > > REGISTER '/home/scs/woodcock/SD411/lab_udf/test.py' USING > org.apache.pig.scripting.streaming.Python.PythonScriptEngine AS myFuncs; > > --N.B. I've also tried jython and streaming_python for the USING clause. > > A = LOAD '/home/scs/woodcock/SD411/DATA/accident.csv' USING > PigStorage(',') AS (state:int,name:chararray); > > B = FOREACH A GENERATE myFuncs.square(state) AS state, name; > > > > If I do a "DUMP A" I get exactly what I would expect. > > But, on a "DUMP B", I get a failed job: > > java.lang.Exception: org.apache.pig.impl.streaming.StreamingUDFException: > LINE : > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : > at > org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:506) > > grunt> Exception in thread "Thread-82" java.lang.NullPointerException: > Cannot invoke "java.util.concurrent.BlockingQueue.put(Object)" because the > return value of > "org.apache.pig.impl.builtin.StreamingUDF.access$500(org.apache.pig.impl.builtin.StreamingUDF)" > is null > at > org.apache.pig.impl.builtin.StreamingUDF$ProcessOutputThread.run(StreamingUDF.java:471) > 2024-10-29 13:02:15,296 [communication thread] INFO > org.apache.hadoop.mapred.LocalJobRunner - map > map > > ? >