[
https://issues.apache.org/jira/browse/PIG-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradeep Kamath updated PIG-538:
-------------------------------
Status: Patch Available (was: Open)
The patch has the following changes:
* support a null constant through the "null" keyword
* allow for picking right overloaded UDF function when input schema has
bytearrays and there are many candidate UDFs to pick from
* fix for a bug in the matching function for picking right overloaded
function when the UDF supports input schema with complex types which have null
inner schemas
1) Support for null constants - For this there are changes in QueryParser.jjt
and DataType.java to represent null as a Constant of type bytearray. Besides
that, there are changes in TypeCheckingVisitor to handle implicit casting of
null constants (which are bytearrays) to the appropriate type in Binconds, AND,
OR, ==, !=. Arithmetic operators already cast byearrays to doubles.
2) The algorithm for picking the right overloaded UDF function is explained in
a comment in TypeCheckingVisitor.java pasted here for reference. The changes
follow the comment:
{noformat}
/**
* Here is an explanation of the way the matching UDF funcspec will be
chosen
* based on actual types in the input schema.
* First an "exact" match is tried for each of the fields in the input
schema
* with the corresponding fields in the candidate funcspecs' schemas.
*
* If exact match fails, then first a check if made if the input schema
has any
* bytearrays in it.
*
* If there are NO bytearrays in the input schema, then a best fit
match is attempted
* for the different fields. Essential a permissible cast from one type
to another
* is given a "score" based on its position in the "castLookup" table.
A final
* score for a candidate funcspec is deduced as
* SUM(score_of_particular_cast*noOfCastsSoFar).
* If no permissible casts are possible, the score for the candidate is
-1. Among
* the non -1 score candidates, the candidate with the lowest score is
chosen.
*
* If there are bytearrays in the input schema, a modified exact match
is tried. In this
* matching, bytearrays in the input schema are not considered. As a
result of
* ignoring the bytearrays, we could get multiple candidate funcspecs
which match
* "exactly" for the other columns - if this is the case, we notify the
user of
* the ambiguity and error out. Else if all other (non byte array)
fields
* matched exactly, then we can cast bytearray(s) to the corresponding
type(s)
* in the matched udf schema. If this modified exact match fails, the
above best fit
* algorithm is attempted by initially coming up with scores and
candidate funcSpecs
* (with bytearray(s) being ignored in the scoring process). Then a
check is
* made to ensure that the positions which have bytearrays in the input
schema
* have the same type (for a given position) in the corresponding
positions in
* all the candidate funcSpecs. If this is not the case, it indicates a
conflict
* and the user is notified of the error (because we have more than
* one choice for the destination type of the cast for the bytearray).
If this is the case,
* the candidate with the lowest score is chosen.
*/
{noformat}
3) To allow the matching function to pick a UDF which supports a schema with a
complex type which has null inner schema, the schema equality for matching
purposes is relaxed for inner schemas of complex types.
> bincond can't work with flatten bags
> ------------------------------------
>
> Key: PIG-538
> URL: https://issues.apache.org/jira/browse/PIG-538
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Olga Natkovich
> Assignee: Pradeep Kamath
> Fix For: types_branch
>
>
> The following script is user with trunk code to simulated outer join not
> directly supported by pig:
> A = load '/studenttab10k' as (name: chararray, age: int, gpa: float);
> B = load 'votertab10k' as (name: chararray, age: int, registration:
> chararray, donation: float);
> C = cogroup A by name, B by name;
> D = foreach C generate group, (IsEmpty(A) ? '' : flatten(A)), (IsEmpty(B) ?
> 'null' : flatten(B));
> On types branch this gives syntax error and even beyond that not supported
> since bincond requires that both expressions be of the same type. Santhosh
> suggested to have special NULL expression that matches any type. This seems
> to make sense.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.