Cheolsoo Park created PIG-4227: ---------------------------------- Summary: Streaming Python UDF handles bag outputs incorrectly Key: PIG-4227 URL: https://issues.apache.org/jira/browse/PIG-4227 Project: Pig Issue Type: Bug Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.15.0
I have a udf that generates different outputs when running as jython and streaming python. {code:title=jython} {([[BBC Worldwide]])} {code} {code:title=streaming python} {(BC Worldwid)} {code} The problem is that streaming python encodes a bag output incorrectly. For this particular example, it serializes the output string as follows- {code} |{_[[BBC Worldwide]]|}_ {code} where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_' and '\}' => '|\}\_'. But this is wrong because bag must contain tuples not chararrays. i.e. the correct encoding is as follows- {code} |{_|(_[[BBC Worldwide]]|)_|}_ {code} where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters. This results in truncated outputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)