[
https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-312:
---------------------------
Attachment: intcast.patch
Fix for this issue. This fix does differ a bit from what I said in the initial
posting. At this point anything that cannot be cast to the requested numeric
type is still returned as a null rather than 0. After further thought, this
seems like a better course, as putting a 0 in there implies we managed to cast
the data rather than we didn't know what to do with the data.
It does fix the issue of casting double values to ints and longs. The casts
now first try to cast to int (or long) and if that fails they then cast to a
double and then cast that to an int (or long) checking to make sure there isn't
an overflow.
> Casting a byte array that contains a double value to an int results in a null
> pointer
> -------------------------------------------------------------------------------------
>
> Key: PIG-312
> URL: https://issues.apache.org/jira/browse/PIG-312
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Alan Gates
> Assignee: Alan Gates
> Fix For: types_branch
>
> Attachments: intcast.patch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);
>
> c = foreach a generate age * 10, (int)gpa * 2;
>
>
> store c into 'outfile';
> {code}
> The values in gpa are doubles. The issue is that they are read as byte
> arrays and then when the user tries to cast them to an int, the system does a
> direct cast from byte array to int, which results in a null. First of all,
> it should result in a zero, not a null (unless the underlying value is null).
> Second, we have to clarify semantics here. gpa was never officially
> declared to be a double, so trying to do a cast directly from bytearray to
> int is a reasonable thing to do. But users may not see it that way. Do we
> want to first cast numbers to double and then to anything subsequent to avoid
> this? Or should we force users to write this as (int)(double)gpa * 2 so we
> know to first cast to double and then int? In the interest of speed
> (especially considering the rarity of doubles in most data) I'd vote for the
> latter.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.