Hi Prashant,

   The Pig wiki says: "If a UDF returns a tuple or a bag and schema
information is not provided, Pig assumes that the tuple contains a single
field of type bytearray. If this is not the case, then not specifying the
schema can cause failures.". It also offers some examples in this case. It
seems the solution is to overload and define your schema information in the
OutputSchema function in the EvalFunction. I find this in the following
link:
http://wiki.apache.org/pig/UDFManual

   Hope it can help.
Best,
Xuting

2011/11/17 pRaShAnT <[email protected]>

> As per Alan F Gates in "Programming Pig" :
>
> *Pig does not know whether integer values in baseball are stored as ASCII
> strings, Java serialized values, binary coded decimal, or some other
> format. So it asks the load function. It is the responsibility of the load
> function to cast bytearrays to other types. In general this works nicely,
> but it does lead to a few corner cases where Pig does not know how to cast
> a bytearray. In particular, if a UDF returns a bytearray Pig will not know
> how to perform casts on it, because that bytearray is not generated by a
> load function.*
>
> I have a UDF that does exactly this, return a Tuple of bytearrays and I am
> unable to cast them to other types. *How can I get around this?
> *
> For eg, UDF(A..Z) returns a tuple (bytearray, bytearray...bytearray).
>
> A = load 'input' using PigStorage();
> B = FOREACH A GENERATE FLATTEN(UDF('arg1', 'arg2', 'arg3')) as (p,q,r);
>
> But if I try
>
> C = FOREACH B GENERATE (chararray)p as p; //FAILS OUT WITH CAST EXCEPTION -
> bytearray cannot be cast to chararray
>
> OR IF I TRY
>
> C = FILTER B BY p matches "pig"; //FAILS OUT WITH CAST EXCEPTION-
> bytearray cannot be cast to chararray
>
> Thanks,
> Prashant
>

Reply via email to