[ 
https://issues.apache.org/jira/browse/PIG-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-63:
------------------------------

    Attachment: utf8_v4.patch

I have made to more changes to the code to make end-to-end tests pass:

(1) Made sure that we write utf8 in the store function
(2) Make sure we properly maintained position in readLine.

We still need code review and performance numbers before this can be committed

> PigStorage does not properly handle UTF8 data
> ---------------------------------------------
>
>                 Key: PIG-63
>                 URL: https://issues.apache.org/jira/browse/PIG-63
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Benjamin Reed
>         Attachments: utf8.patch, utf8.patch, utf8.patch, utf8_v4.patch
>
>
> From Ben:
> I just checked the code and the problem seems to be PigStorage. getNext() uses
> readLine() which does not handle UTF8 correctly. putNext() also uses default 
> encoder rather than UTF8 explicitly.
> Internally and in BinStorage UTF8 appears to be handled correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to