Pradeep Kamath updated PIG-538:

    Status: Patch Available  (was: Open)

The patch has the following changes:
   * support a null constant through the "null" keyword
   * allow for picking right overloaded UDF function when input schema has 
bytearrays and there are many candidate UDFs to pick from
   * fix for a bug in the matching function for picking right overloaded 
function when the UDF supports input schema with complex types which have null 
inner schemas

1) Support for null constants - For this there are changes in QueryParser.jjt 
and DataType.java to represent null as a Constant of type bytearray. Besides 
that, there are changes in TypeCheckingVisitor to handle implicit casting of 
null constants (which are bytearrays) to the appropriate type in Binconds, AND, 
OR, ==, !=. Arithmetic operators already cast byearrays to doubles.

2) The algorithm for picking the right overloaded UDF function is explained in 
a comment in TypeCheckingVisitor.java pasted here for reference. The changes 
follow the comment:
         * Here is an explanation of the way the matching UDF funcspec will be 
         * based on actual types in the input schema.
         * First an "exact" match is tried for each of the fields in the input 
         * with the corresponding fields in the candidate funcspecs' schemas. 
         * If exact match fails, then first a check if made if the input schema 
has any
         * bytearrays in it. 
         * If there are NO bytearrays in the input schema, then a best fit 
match is attempted
         * for the different fields. Essential a permissible cast from one type 
to another
         * is given a "score" based on its position in the "castLookup" table. 
A final
         * score for a candidate funcspec is deduced as  
         *               SUM(score_of_particular_cast*noOfCastsSoFar). 
         * If no permissible casts are possible, the score for the candidate is 
-1. Among 
         * the non -1 score candidates, the candidate with the lowest score is 
         * If there are bytearrays in the input schema, a modified exact match 
is tried. In this
         * matching, bytearrays in the input schema are not considered. As a 
result of
         * ignoring the bytearrays, we could get multiple candidate funcspecs 
which match
         * "exactly" for the other columns - if this is the case, we notify the 
user of
         * the ambiguity and error out. Else if all other (non byte array) 
         * matched exactly, then we can cast bytearray(s) to the corresponding 
         * in the matched udf schema. If this modified exact match fails, the 
above best fit 
         * algorithm is attempted by initially coming up with scores and 
candidate funcSpecs 
         * (with bytearray(s) being ignored in the scoring process). Then a 
check is 
         * made to ensure that the positions which have bytearrays in the input 
         * have the same type (for a given position) in the corresponding 
positions in
         * all the candidate funcSpecs. If this is not the case, it indicates a 
         * and the user is notified of the error (because we have more than
         * one choice for the destination type of the cast for the bytearray). 
If this is the case,
         * the candidate with the lowest score is chosen. 

3) To allow the matching function to pick a UDF which supports a schema with a 
complex type which has null inner schema, the schema equality for matching 
purposes is relaxed for inner schemas of complex types.

> bincond can't work with flatten bags
> ------------------------------------
>                 Key: PIG-538
>                 URL: https://issues.apache.org/jira/browse/PIG-538
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
> The following script is user with trunk code to simulated outer join not 
> directly supported by pig:
> A = load '/studenttab10k' as (name: chararray, age: int, gpa: float);
> B = load 'votertab10k' as (name: chararray, age: int, registration: 
> chararray, donation: float);
> C = cogroup A by name, B by name;
> D = foreach C generate group, (IsEmpty(A) ? '' : flatten(A)), (IsEmpty(B) ? 
> 'null' : flatten(B));
> On types branch this gives syntax error and even beyond that not supported 
> since bincond requires that both expressions be of the same type. Santhosh 
> suggested to have  special NULL expression that matches any type. This seems 
> to make sense.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to