[ https://issues.apache.org/jira/browse/PIG-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-538: ------------------------------- Status: Patch Available (was: Open) The patch has the following changes: * support a null constant through the "null" keyword * allow for picking right overloaded UDF function when input schema has bytearrays and there are many candidate UDFs to pick from * fix for a bug in the matching function for picking right overloaded function when the UDF supports input schema with complex types which have null inner schemas 1) Support for null constants - For this there are changes in QueryParser.jjt and DataType.java to represent null as a Constant of type bytearray. Besides that, there are changes in TypeCheckingVisitor to handle implicit casting of null constants (which are bytearrays) to the appropriate type in Binconds, AND, OR, ==, !=. Arithmetic operators already cast byearrays to doubles. 2) The algorithm for picking the right overloaded UDF function is explained in a comment in TypeCheckingVisitor.java pasted here for reference. The changes follow the comment: {noformat} /** * Here is an explanation of the way the matching UDF funcspec will be chosen * based on actual types in the input schema. * First an "exact" match is tried for each of the fields in the input schema * with the corresponding fields in the candidate funcspecs' schemas. * * If exact match fails, then first a check if made if the input schema has any * bytearrays in it. * * If there are NO bytearrays in the input schema, then a best fit match is attempted * for the different fields. Essential a permissible cast from one type to another * is given a "score" based on its position in the "castLookup" table. A final * score for a candidate funcspec is deduced as * SUM(score_of_particular_cast*noOfCastsSoFar). * If no permissible casts are possible, the score for the candidate is -1. Among * the non -1 score candidates, the candidate with the lowest score is chosen. * * If there are bytearrays in the input schema, a modified exact match is tried. In this * matching, bytearrays in the input schema are not considered. As a result of * ignoring the bytearrays, we could get multiple candidate funcspecs which match * "exactly" for the other columns - if this is the case, we notify the user of * the ambiguity and error out. Else if all other (non byte array) fields * matched exactly, then we can cast bytearray(s) to the corresponding type(s) * in the matched udf schema. If this modified exact match fails, the above best fit * algorithm is attempted by initially coming up with scores and candidate funcSpecs * (with bytearray(s) being ignored in the scoring process). Then a check is * made to ensure that the positions which have bytearrays in the input schema * have the same type (for a given position) in the corresponding positions in * all the candidate funcSpecs. If this is not the case, it indicates a conflict * and the user is notified of the error (because we have more than * one choice for the destination type of the cast for the bytearray). If this is the case, * the candidate with the lowest score is chosen. */ {noformat} 3) To allow the matching function to pick a UDF which supports a schema with a complex type which has null inner schema, the schema equality for matching purposes is relaxed for inner schemas of complex types. > bincond can't work with flatten bags > ------------------------------------ > > Key: PIG-538 > URL: https://issues.apache.org/jira/browse/PIG-538 > Project: Pig > Issue Type: Bug > Affects Versions: types_branch > Reporter: Olga Natkovich > Assignee: Pradeep Kamath > Fix For: types_branch > > > The following script is user with trunk code to simulated outer join not > directly supported by pig: > A = load '/studenttab10k' as (name: chararray, age: int, gpa: float); > B = load 'votertab10k' as (name: chararray, age: int, registration: > chararray, donation: float); > C = cogroup A by name, B by name; > D = foreach C generate group, (IsEmpty(A) ? '' : flatten(A)), (IsEmpty(B) ? > 'null' : flatten(B)); > On types branch this gives syntax error and even beyond that not supported > since bincond requires that both expressions be of the same type. Santhosh > suggested to have special NULL expression that matches any type. This seems > to make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.