[ 
https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703383#action_12703383
 ] 

David Ciemiewicz edited comment on PIG-771 at 4/27/09 2:20 PM:
---------------------------------------------------------------

Daniel,

Thanks.  locale reported LANG=POSIX

I used locale -a to list the locales and then did:

export LANG=en_US.utf8

Then I got the correct PigDump output.

I found that also setting LESSCHARSET to utf-8 was valuable as well.

For bash users:

export LANG=en_US.utf8
export LESSCHARSET=utf-8


It would be useful if dump/PigDump() had a warning which indicated to the user 
if LANG=POSIX, then asian language characters may not display properly.  
Something like:

{code}if 
(function_which_returns_local_setting_which_I_dont_know_name_of().equals("POSIX"))
  {
    System.out.println("WARNING: dump will not properly display multibyte UTF-8 
characters when\n" + 
    "environment variable LANG=\"POSIX\".  Try setting your environment 
variable LANG=en_US.utf8.\n"  +
    "See locale -a for other possible values.");
}{code}

      was (Author: ciemo):
    Daniel,

Thanks.  locale reported LANG=POSIX

I used locale -a to list the locales and then did:

export LANG=en_US.utf8

Then I got the correct PigDump output.

I found that also setting LESSCHARSET to utf-8 was valuable as well.

For bash users:

export LANG=en_US.utf8
export LESSCHARSET=utf-8


It would be useful if dump/PigDump() had a warning which indicated to the user 
if LANG=POSIX, then asian language characters may not display properly.  
Something like:

{code}if 
(function_which_returns_local_setting_which_I_dont_know_name_of().equals("POSIX"))
  {
    System.out.println("WARNING: dump will not properly display multibyte UTF-8 
characters when\n" + 
    environment variable LANG=\"POSIX\".  Try setting your environment variable 
LANG=en_US.utf8.\n"  + "See locale -a for other possible values.")
}{code}
  
> PigDump does not properly output Chinese UTF8 characters - they are displayed 
> as question marks ??
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-771
>                 URL: https://issues.apache.org/jira/browse/PIG-771
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>
> PigDump does not properly output Chinese UTF8 characters.
> The reason for this is that the function Tuple.toString() is called.
> DefaultTuple implements Tuple.toString() and it calls Object.toString() on 
> the opaque object d.
> Instead, I think that the code should be changed instead to call the new 
> DataType.toString() function.
> {code}
>     @Override
>     public String toString() {
>         StringBuilder sb = new StringBuilder();
>         sb.append('(');
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object d = it.next();
>             if(d != null) {
>                 if(d instanceof Map) {
>                     sb.append(DataType.mapToString((Map<Object, Object>)d));
>                 } else {
>                     sb.append(DataType.toString(d));  // <<< Change this one 
> line
>                     if(d instanceof Long) {
>                         sb.append("L");
>                     } else if(d instanceof Float) {
>                         sb.append("F");
>                     }
>                 }
>             } else {
>                 sb.append("");
>             }
>             if (it.hasNext())
>                 sb.append(",");
>         }
>         sb.append(')');
>         return sb.toString();
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to