[ 
https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703219#action_12703219
 ] 

David Ciemiewicz commented on PIG-771:
--------------------------------------

I'm just using Mac OS terminal to connect to a RHEL-4 gateway server to a 
RHEL-4 grid.

I changed the code to use PigDump() storage format for the STORE statement and 
reran the code, trying to eliminate the terminal aspect.  Pig itself is writing 
the question marks ('?', 0x3f).

{code}
-bash-3.00$ cat ch2.pig
A = load 'ch.txt' using PigStorage() as (str: chararray);
store A into 'ch.dmp' using PigDump();

-bash-3.00$ hadoop fs -cat ch.dmp/*
(????)

-bash-3.00$ hadoop fs -cat ch.dmp/* | od -xc
0000000 3f28 3f3f 293f 000a
          (   ?   ?   ?   ?   )  \n  \0
0000007
{code}

> PigDump does not properly output Chinese UTF8 characters - they are displayed 
> as question marks ??
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-771
>                 URL: https://issues.apache.org/jira/browse/PIG-771
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>
> PigDump does not properly output Chinese UTF8 characters.
> The reason for this is that the function Tuple.toString() is called.
> DefaultTuple implements Tuple.toString() and it calls Object.toString() on 
> the opaque object d.
> Instead, I think that the code should be changed instead to call the new 
> DataType.toString() function.
> {code}
>     @Override
>     public String toString() {
>         StringBuilder sb = new StringBuilder();
>         sb.append('(');
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object d = it.next();
>             if(d != null) {
>                 if(d instanceof Map) {
>                     sb.append(DataType.mapToString((Map<Object, Object>)d));
>                 } else {
>                     sb.append(DataType.toString(d));  // <<< Change this one 
> line
>                     if(d instanceof Long) {
>                         sb.append("L");
>                     } else if(d instanceof Float) {
>                         sb.append("F");
>                     }
>                 }
>             } else {
>                 sb.append("");
>             }
>             if (it.hasNext())
>                 sb.append(",");
>         }
>         sb.append(')');
>         return sb.toString();
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to