[ 
https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1595:
-------------------------------

    Description: 
If load functions that don't follow the same bytearray format as PigStorage for 
other supported datatypes, or those that don't implement the LoadCaster 
interface are used in 'casting relation to scalar' (PIG-1434), it can cause the 
query to fail or create incorrect results.

The root cause of the problem is that there is a real dependency between the 
ReadScalars udf that returns the scalar value and the LogicalOperator that acts 
as its input. But the logicalplan does not capture this dependency. So in 
SchemaResetter visitor used by the optimizer, the order in which schema is 
reset and evaluated does not take this into consideration. If the schema of the 
input LogicalOperator does not get evaluated before the ReadScalar udf, the 
resutltype of ReadScalar udf becomes bytearray. POUserFunc will convert the 
input to bytearray using ' new DataByteArray(inp.toString().getBytes())'. But 
this bytearray encoding of other supported types might not be same for the 
LoadFunction associated with the column, and that can result in problems.



  was:
If load functions that don't follow the same bytearray format as PigStorage for 
other supported datatypes, or those that don't implement the LoadCaster 
interface are used in 'casting relation to scalar' (PIG-1434), it can cause the 
query to fail or create incorrect results.

(I will add an example and elaborate further).



> casting relation to scalar- problem with handling of data from non PigStorage 
> loaders
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-1595
>                 URL: https://issues.apache.org/jira/browse/PIG-1595
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>
> If load functions that don't follow the same bytearray format as PigStorage 
> for other supported datatypes, or those that don't implement the LoadCaster 
> interface are used in 'casting relation to scalar' (PIG-1434), it can cause 
> the query to fail or create incorrect results.
> The root cause of the problem is that there is a real dependency between the 
> ReadScalars udf that returns the scalar value and the LogicalOperator that 
> acts as its input. But the logicalplan does not capture this dependency. So 
> in SchemaResetter visitor used by the optimizer, the order in which schema is 
> reset and evaluated does not take this into consideration. If the schema of 
> the input LogicalOperator does not get evaluated before the ReadScalar udf, 
> the resutltype of ReadScalar udf becomes bytearray. POUserFunc will convert 
> the input to bytearray using ' new DataByteArray(inp.toString().getBytes())'. 
> But this bytearray encoding of other supported types might not be same for 
> the LoadFunction associated with the column, and that can result in problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to