[
https://issues.apache.org/jira/browse/PIG-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shravan Matthur Narayanamurthy updated PIG-427:
-----------------------------------------------
Status: Patch Available (was: Open)
The patch implements the following logic:
checks if a fit is possible and returns a score if so. The lesser the score the
better the fit.
A table of possible casts is maintained and the table is ordered so as produce
a sensible heuristic for the fit score. The principle behind the heuristic is
that it tries to choose lesser number of casts and if the number of casts is
same tries to choose conversions to a smaller type where ordering of types is:
INTEGER, LONG, FLOAT, DOUBLE, CHARARRAY, TUPLE, BAG, MAP (from small to
big)
Once the best fit is determined, casts are introduced to suit that fit.
However, if the schema contains a schema embedded as a Tuple or a Bag, the
bestFit function wants these schemas to match exactly. For ex., if SUM provides
a mapping to BAG(integers} & BAG(floats), and we have BAG(longs) as input, the
best fit doesn't try to insert a cast here because the nesting here can be
arbitrary and finding the right project where the cast should be inserted is a
bit complicated.
The patch also includes a test case which tests three scenarios for casting.
> casting parameters of a UDF
> ---------------------------
>
> Key: PIG-427
> URL: https://issues.apache.org/jira/browse/PIG-427
> Project: Pig
> Issue Type: Improvement
> Affects Versions: types_branch
> Reporter: Olga Natkovich
> Assignee: Shravan Matthur Narayanamurthy
> Fix For: types_branch
>
>
> Currently if we have a UDF that declares via getArgToFuncMapping that it can
> only handle a subset of types, passing any other types to the function would
> result in an error. However, some types can be promoted to others and it
> would be useful if typechecker to perform best fit cast. For instance, if the
> input parameter has type of Long and the UDF support Int and Double, the code
> should cast the paraneter into Double.
> This would be very useful for conversion of the UDFs from the piigybank to
> the new code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.