Hello!
I have meet a code problem about the charset. I use Hadoop to store the log
data, and my log data is not coded in UTF-8, for example GBK in china. If I use
the PigStorage() to process my data, the data will be treated as UTF-8, then, I
use my program to process the UTF-8 data, it can also run, but the result will
be
not right.
And can we use the pig LOAD and STORE like Hadoop, not change the orignal
data charset, store it as it was! Any one can help me? Or tell me why use the
default UTF8?