[
https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703383#action_12703383
]
David Ciemiewicz commented on PIG-771:
--------------------------------------
Daniel,
Thanks. locale reported LANG=POSIX
I used locale -a to list the locales and then did:
export LANG=en_US.utf8
Then I got the correct PigDump output.
I found that also setting LESSCHARSET to utf-8 was valuable as well.
For bash users:
export LANG=en_US.utf8
export LESSCHARSET=utf-8
It would be useful if dump/PigDump() had a warning which indicated to the user
if LANG=POSIX, then asian language characters may not display properly.
Something like:
if
(function_which_returns_local_setting_which_I_dont_know_name_of().equals("POSIX"))
{
System.out.println("WARNING: dump will not properly display multibyte UTF-8
characters when environment variable LANG=\"POSIX\". Try setting your
environment variable LANG=en_US.utf8. See locale -a for other possible
values.")
}
> PigDump does not properly output Chinese UTF8 characters - they are displayed
> as question marks ??
> --------------------------------------------------------------------------------------------------
>
> Key: PIG-771
> URL: https://issues.apache.org/jira/browse/PIG-771
> Project: Pig
> Issue Type: Bug
> Reporter: David Ciemiewicz
>
> PigDump does not properly output Chinese UTF8 characters.
> The reason for this is that the function Tuple.toString() is called.
> DefaultTuple implements Tuple.toString() and it calls Object.toString() on
> the opaque object d.
> Instead, I think that the code should be changed instead to call the new
> DataType.toString() function.
> {code}
> @Override
> public String toString() {
> StringBuilder sb = new StringBuilder();
> sb.append('(');
> for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
> Object d = it.next();
> if(d != null) {
> if(d instanceof Map) {
> sb.append(DataType.mapToString((Map<Object, Object>)d));
> } else {
> sb.append(DataType.toString(d)); // <<< Change this one
> line
> if(d instanceof Long) {
> sb.append("L");
> } else if(d instanceof Float) {
> sb.append("F");
> }
> }
> } else {
> sb.append("");
> }
> if (it.hasNext())
> sb.append(",");
> }
> sb.append(')');
> return sb.toString();
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.