[ 
https://issues.apache.org/jira/browse/PIG-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619722#action_12619722
 ] 

Alan Gates commented on PIG-354:
--------------------------------

I don't think we want to be converting data to chararray by default for input 
to UDFs, for several reasons:

1 It's expensive
2 It mangles any data that isn't utf8
3 It is a fair amount of work for users to provide type specific 
implementations of their UDFs, and so I suspect most won't.

By contrast, on the outbound side I agree that chararray is the right default, 
for two reasons:

1 It's very easy to determine what type the UDF is returning, either by 
declaring a schema or by pig reflecting the return type.  Only in the case 
where they do not give a schema and their return type is tuple or bag (thus we 
have no idea what inside that tuple or bag) will we be forcing data to strings.

2 In general pig does not assume any particular representation of data in byte 
arrays.  That's why we make the load function provide casts.  So if we took 
this unknown data from UDFs to be byte arrays we'd have no idea how to convert 
it to anything else.  Conversions from strings on the other hand are well 
understood.

> Change to default outputSchema for UDFs
> ---------------------------------------
>
>                 Key: PIG-354
>                 URL: https://issues.apache.org/jira/browse/PIG-354
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Priority: Critical
>             Fix For: types_branch
>
>
> Currently, if UDF writer does not specify outputSchema the default is 
> bytearray which is not what you would want most of the time. Making chararray 
> a default would make things backward compatible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to