PigStorage is written to work with UTF8 data. You will need to write your on load/store function to get different semantics.
Olga > -----Original Message----- > From: paradisehit [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 26, 2008 1:52 AM > To: [EMAIL PROTECTED]; [email protected] > Subject: Why the default LOAD and STORE use UTF-8? Why not use byte? > > Hello! > I have meet a code problem about the charset. I use > Hadoop to store the log data, and my log data is not coded in > UTF-8, for example GBK in china. If I use the PigStorage() to > process my data, the data will be treated as UTF-8, then, I > use my program to process the UTF-8 data, it can also run, > but the result will be not right. > And can we use the pig LOAD and STORE like Hadoop, not > change the orignal data charset, store it as it was! Any one > can help me? Or tell me why use the default UTF8? > >
