PigStorage is written to work with UTF8 data. You will need to write
your on load/store function to get different semantics.

Olga 

> -----Original Message-----
> From: paradisehit [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, August 26, 2008 1:52 AM
> To: [EMAIL PROTECTED]; [email protected]
> Subject: Why the default LOAD and STORE use UTF-8? Why not use byte?
> 
> Hello!
>     I have meet a code problem about the charset. I use 
> Hadoop to store the log data, and my log data is not coded in 
> UTF-8, for example GBK in china. If I use the PigStorage() to 
> process my data, the data will be treated as UTF-8, then, I 
> use my program to process the UTF-8 data, it can also run, 
> but the result will be  not right.
>     And can we use the pig LOAD and STORE like Hadoop, not 
> change the orignal data charset, store it as it was! Any one 
> can help me? Or tell me why use the default UTF8?
>  
> 

Reply via email to