[ https://issues.apache.org/jira/browse/PIG-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates reassigned PIG-504: ------------------------------ Assignee: Shubham Chopra > Illustrate and Dump do not seem to work correctly for files containing utf8 > --------------------------------------------------------------------------- > > Key: PIG-504 > URL: https://issues.apache.org/jira/browse/PIG-504 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.2.0 > Environment: Hadoop 18 > Reporter: Viraj Bhat > Assignee: Shubham Chopra > Fix For: 0.2.0 > > Attachments: 504.patch, utf8.txt > > > For the snippet of code which runs on the latest types branch. (utf8.txt > attached) > {code} > A = load 'utf8.txt' using PigStorage() as (t1: chararray); > illustrate A; > {code} > results in this output being produced > ------------------------------- > | A | t1: bytearray cn: 1 | > ------------------------------- > | | gabriella?? | > ------------------------------- > Three observations: > 1) text should be chararray, not bytearray. > 2) cn: 1 should be removed from the display > 3) Value for text is "username??" is not displayed properly > Now replacing illustrate with dump > {code} > A = load 'utf8.txt' using PigStorage() as (t1: chararray); > dump A; > {code} > (david?) > (rachel?) > (jessica?) > (sarah?) > (katie?) > (wendy?) > (david?) > (priscilla?) > (oscar?) > (xavier?) > ..some more. > The utf8 characters after username are not displayed correctly but instead > substituted by ?. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.