Hi Prashant, The Pig wiki says: "If a UDF returns a tuple or a bag and schema information is not provided, Pig assumes that the tuple contains a single field of type bytearray. If this is not the case, then not specifying the schema can cause failures.". It also offers some examples in this case. It seems the solution is to overload and define your schema information in the OutputSchema function in the EvalFunction. I find this in the following link: http://wiki.apache.org/pig/UDFManual
Hope it can help. Best, Xuting 2011/11/17 pRaShAnT <[email protected]> > As per Alan F Gates in "Programming Pig" : > > *Pig does not know whether integer values in baseball are stored as ASCII > strings, Java serialized values, binary coded decimal, or some other > format. So it asks the load function. It is the responsibility of the load > function to cast bytearrays to other types. In general this works nicely, > but it does lead to a few corner cases where Pig does not know how to cast > a bytearray. In particular, if a UDF returns a bytearray Pig will not know > how to perform casts on it, because that bytearray is not generated by a > load function.* > > I have a UDF that does exactly this, return a Tuple of bytearrays and I am > unable to cast them to other types. *How can I get around this? > * > For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray). > > A = load 'input' using PigStorage(); > B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r); > > But if I try > > C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST EXCEPTION - > bytearray cannot be cast to chararray > > OR IF I TRY > > C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION- > bytearray cannot be cast to chararray > > Thanks, > Prashant >
