Pradeep Kamath updated PIG-538:
Status: Patch Available (was: Open)
The patch has the following changes:
* support a null constant through the "null" keyword
* allow for picking right overloaded UDF function when input schema has
bytearrays and there are many candidate UDFs to pick from
* fix for a bug in the matching function for picking right overloaded
function when the UDF supports input schema with complex types which have null
1) Support for null constants - For this there are changes in QueryParser.jjt
and DataType.java to represent null as a Constant of type bytearray. Besides
that, there are changes in TypeCheckingVisitor to handle implicit casting of
null constants (which are bytearrays) to the appropriate type in Binconds, AND,
OR, ==, !=. Arithmetic operators already cast byearrays to doubles.
2) The algorithm for picking the right overloaded UDF function is explained in
a comment in TypeCheckingVisitor.java pasted here for reference. The changes
follow the comment:
* Here is an explanation of the way the matching UDF funcspec will be
* based on actual types in the input schema.
* First an "exact" match is tried for each of the fields in the input
* with the corresponding fields in the candidate funcspecs' schemas.
* If exact match fails, then first a check if made if the input schema
* bytearrays in it.
* If there are NO bytearrays in the input schema, then a best fit
match is attempted
* for the different fields. Essential a permissible cast from one type
* is given a "score" based on its position in the "castLookup" table.
* score for a candidate funcspec is deduced as
* If no permissible casts are possible, the score for the candidate is
* the non -1 score candidates, the candidate with the lowest score is
* If there are bytearrays in the input schema, a modified exact match
is tried. In this
* matching, bytearrays in the input schema are not considered. As a
* ignoring the bytearrays, we could get multiple candidate funcspecs
* "exactly" for the other columns - if this is the case, we notify the
* the ambiguity and error out. Else if all other (non byte array)
* matched exactly, then we can cast bytearray(s) to the corresponding
* in the matched udf schema. If this modified exact match fails, the
above best fit
* algorithm is attempted by initially coming up with scores and
* (with bytearray(s) being ignored in the scoring process). Then a
* made to ensure that the positions which have bytearrays in the input
* have the same type (for a given position) in the corresponding
* all the candidate funcSpecs. If this is not the case, it indicates a
* and the user is notified of the error (because we have more than
* one choice for the destination type of the cast for the bytearray).
If this is the case,
* the candidate with the lowest score is chosen.
3) To allow the matching function to pick a UDF which supports a schema with a
complex type which has null inner schema, the schema equality for matching
purposes is relaxed for inner schemas of complex types.
> bincond can't work with flatten bags
> Key: PIG-538
> URL: https://issues.apache.org/jira/browse/PIG-538
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Olga Natkovich
> Assignee: Pradeep Kamath
> Fix For: types_branch
> The following script is user with trunk code to simulated outer join not
> directly supported by pig:
> A = load '/studenttab10k' as (name: chararray, age: int, gpa: float);
> B = load 'votertab10k' as (name: chararray, age: int, registration:
> chararray, donation: float);
> C = cogroup A by name, B by name;
> D = foreach C generate group, (IsEmpty(A) ? '' : flatten(A)), (IsEmpty(B) ?
> 'null' : flatten(B));
> On types branch this gives syntax error and even beyond that not supported
> since bincond requires that both expressions be of the same type. Santhosh
> suggested to have special NULL expression that matches any type. This seems
> to make sense.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.