Cheolsoo Park created PIG-4227:
----------------------------------

             Summary: Streaming Python UDF handles bag outputs incorrectly
                 Key: PIG-4227
                 URL: https://issues.apache.org/jira/browse/PIG-4227
             Project: Pig
          Issue Type: Bug
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.15.0


I have a udf that generates different outputs when running as jython and 
streaming python.
{code:title=jython}
{([[BBC Worldwide]])}
{code} 
{code:title=streaming python}
{(BC Worldwid)}
{code}
The problem is that streaming python encodes a bag output incorrectly. For this 
particular example, it serializes the output string as follows-
{code}
|{_[[BBC Worldwide]]|}_
{code}
where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_' and 
'\}' => '|\}\_'.

But this is wrong because bag must contain tuples not chararrays. i.e. the 
correct encoding is as follows-
{code}
|{_|(_[[BBC Worldwide]]|)_|}_
{code}
where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.

This results in truncated outputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to