Thanks, Cheolsoo. The bigger problem right now is JsonStorage() blows up trying to write .pig_schema so I've had to just use PigStorage() and parse it later so the fieldnames with null:: are not a problem. /a
On Sun, Jun 9, 2013 at 2:58 PM, Cheolsoo Park <[email protected]> wrote: > Hi Alan, > > >> When I register this UDF an unexpected warning pops up which I'm going > to ignore for now (unless someone says this is important): > > Yes, you can usually ignore them except ERROR messages. If these messages > annoy you a lot, you can redirect stderr to a file (i.e. 2>errors.txt). > > > >> The other strange thing is *null::* gets prepended to each field name. > This is mostly annoying, and, in the case of JsonStorage(), clutters things > unnecessarily. Is there a way to resolve this? > > The reason why "null::" is prepended is because a python udf returns a > tuple, but the tuple is not given a name. So if you change the outputSchema > of your udf to something like this: > > @outputSchema("t:( < field schemas here > )") > > You will see "t::" is prepended instead. > > You can also remove the prefix by adding another FOREACH and re-define > names using AS clauses for every field. That is, > > aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line)); > aprs_cleaned = FOREACH aprs GENERATE time AS time, from_call AS from_call, > <and other fields>; > > This is somewhat annoying if there are a lot of fields like your example. > In fact, there is a jira to add a built-in UDF that removes the prefixes: > https://issues.apache.org/jira/browse/PIG-3088. I will probably rebase the > patch and get it committed. > > Thanks, > Cheolsoo > > > > > > > On Sat, Jun 8, 2013 at 12:01 PM, Alan Crosswell <[email protected]> wrote: > > > Hello, > > > > I'm new to Pig and am having a few small problems that I'd appreciate > some > > help with. I'm using Pig-0.11.1 after 0.9.2 just plain didn't work right > > with my Python UDF. > > > > I am using a Python UDF that has two functions with the following > > outputSchema: > > > > > > > @outputSchema("time:chararray,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray") > > def aprs(l): > > ... > > > > and > > > > > > > @outputSchema("latitude:double,longitude:double,ambiguity:double,course:double,speed:double") > > def position(to_call,info): > > ... > > > > When I register this UDF an unexpected warning pops up which I'm going to > > ignore for now (unless someone says this is important): > > > > grunt> *Register 's3n://n2ygk/aprspig.py' using jython as myudf;* > > 2013-06-08 18:38:03,990 [main] INFO > > org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening > > 's3n://n2ygk/aprspig.py' for reading > > 2013-06-08 18:38:04,118 [main] INFO > > org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop > library > > 2013-06-08 18:38:04,175 [main] INFO > > org.apache.pig.scripting.jython.JythonScriptEngine - created tmp > > python.cachedir=/tmp/pig_jython_6851471253258374122 > > 2013-06-08 18:38:08,576 [main] WARN > > org.apache.pig.scripting.jython.JythonScriptEngine - > > pig.cmd.args.remainders is empty. This is not expected unless on testing. > > 2013-06-08 18:38:11,981 [main] INFO > > org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting > > UDF: myudf.position > > 2013-06-08 18:38:11,984 [main] INFO > > org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting > > UDF: myudf.aprs > > > > The other strange thing is *null::* gets prepended to each field name. > This > > is mostly annoying, and, in the case of JsonStorage(), clutters > > things unnecessarily. Is there a way to resolve this? > > > > grunt> *aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));* > > 2013-06-08 01:06:37,324 [main] INFO > > org.apache.pig.scripting.jython.JythonFunction - Schema 'time:chararra > > > > > y,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray' > > defined for func aprs > > grunt> *DESCRIBE aprs;* > > aprs: {null::time: chararray,null::from_call: chararray,null::to_call: > > chararray,null::digis: chararray,null::gtype: chararray,null::gate: > > chararray,null::info: chararray,null::firsthop: chararray} > > > > Is my UDF being defined or invoked incorrectly to result in the null:: or > > is this just a feature? > > > > This is just annoying but I'd appreciate any pointers on how to make it > go > > away. > > > > Thanks. > > /a > > >
