Re: Some questions about decode/encode

2008-01-28 Thread glacier
On Jan 28, 2:31 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Jan 28, 2:53 pm, glacier <[EMAIL PROTECTED]> wrote: > > > > > Thanks,John. > > It's no doubt that you proved SAX didn't support GBK encoding. > > But can you give some suggestion on how to make SAX parse some GBK > > string? > > Yes, t

Re: Some questions about decode/encode

2008-01-27 Thread John Machin
On Jan 28, 2:53 pm, glacier <[EMAIL PROTECTED]> wrote: > > Thanks,John. > It's no doubt that you proved SAX didn't support GBK encoding. > But can you give some suggestion on how to make SAX parse some GBK > string? Yes, the same suggestion as was given to you by others very early in this thread,

Re: Some questions about decode/encode

2008-01-27 Thread glacier
On 1月28日, 上午5时50分, John Machin <[EMAIL PROTECTED]> wrote: > On Jan 28, 7:47 am, "Mark Tolonen" <[EMAIL PROTECTED]> > wrote: > > > > > > > >"John Machin" <[EMAIL PROTECTED]> wrote in message > > >news:[EMAIL PROTECTED] > > >On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote: > > >> On 1月24日, 下午3时

Re: Some questions about decode/encode

2008-01-27 Thread John Machin
On Jan 28, 7:47 am, "Mark Tolonen" <[EMAIL PROTECTED]> wrote: > >"John Machin" <[EMAIL PROTECTED]> wrote in message > >news:[EMAIL PROTECTED] > >On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote: > >> On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> > >> wrote: > > >*IF* the file is w

Re: Some questions about decode/encode

2008-01-27 Thread Mark Tolonen
>"John Machin" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] >On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote: >> On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> >> wrote: > >*IF* the file is well-formed GBK, then the codec will not mess up when >decoding it to Un

Re: Some questions about decode/encode

2008-01-27 Thread Martin v. Löwis
>> Is there any way to solve this better? >> I mean if I shouldn't convert the GBK string to unicode string, what >> should I do to make SAX work? > > Decode it and then encode it to utf-8 before feeding it to the parser. The tricky part is that you also need to change the encoding declaration in

Re: Some questions about decode/encode

2008-01-27 Thread glacier
On 1月27日, 下午7时20分, John Machin <[EMAIL PROTECTED]> wrote: > On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote: > > > > > > > On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > > En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió: > > > > > According to

Re: Some questions about decode/encode

2008-01-27 Thread glacier
On 1月27日, 下午7时04分, John Machin <[EMAIL PROTECTED]> wrote: > On Jan 27, 9:18 pm, glacier <[EMAIL PROTECTED]> wrote: > > > > > > > On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > > > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote: > > > > My second question is: is there

Re: Some questions about decode/encode

2008-01-27 Thread John Machin
On Jan 27, 9:17 pm, glacier <[EMAIL PROTECTED]> wrote: > On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > > > > > En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió: > > > > According to your reply, what will happen if I try to decode a long > > > string sep

Re: Some questions about decode/encode

2008-01-27 Thread John Machin
On Jan 27, 9:18 pm, glacier <[EMAIL PROTECTED]> wrote: > On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote: > > > My second question is: is there any one who has tested very long mbcs > > > decode? I tried to decode a long

Re: Some questions about decode/encode

2008-01-27 Thread Marc 'BlackJack' Rintsch
On Sun, 27 Jan 2008 02:18:48 -0800, glacier wrote: > Yepp. I feed SAX with the unicode string since SAX didn't support my > encoding system(GBK). If the `decode()` method supports it, IMHO SAX should too. > Is there any way to solve this better? > I mean if I shouldn't convert the GBK string to

Re: Some questions about decode/encode

2008-01-27 Thread glacier
On 1月24日, 下午5时51分, John Machin <[EMAIL PROTECTED]> wrote: > On Jan 24, 2:49 pm, glacier <[EMAIL PROTECTED]> wrote: > > > I use chinese charactors as an example here. > > > >>>s1='你好吗' > > >>>repr(s1) > > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > > >>>b1=s1.decode('GBK') > > > My first question is :

Re: Some questions about decode/encode

2008-01-27 Thread glacier
On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote: > > My second question is: is there any one who has tested very long mbcs > > decode? I tried to decode a long(20+MB) xml yesterday, which turns out > > to be very strange and

Re: Some questions about decode/encode

2008-01-27 Thread glacier
On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió: > > > According to your reply, what will happen if I try to decode a long > > string seperately. > > I mean: > > ## > > a

Re: Some questions about decode/encode

2008-01-24 Thread 7stud
On Jan 24, 1:44 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote: > > My second question is: is there any one who has tested very long mbcs > > decode? I tried to decode a long(20+MB) xml yesterday, which turns out > > to be very strange an

Re: Some questions about decode/encode

2008-01-24 Thread John Machin
On Jan 24, 2:49 pm, glacier <[EMAIL PROTECTED]> wrote: > I use chinese charactors as an example here. > > >>>s1='你好吗' > >>>repr(s1) > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > >>>b1=s1.decode('GBK') > > My first question is : what strategy does 'decode' use to tell the way > to seperate the words. I

Re: Some questions about decode/encode

2008-01-24 Thread Marc 'BlackJack' Rintsch
On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote: > My second question is: is there any one who has tested very long mbcs > decode? I tried to decode a long(20+MB) xml yesterday, which turns out > to be very strange and cause SAX fail to parse the decoded string. That's because SAX wants bytes,

Re: Some questions about decode/encode

2008-01-23 Thread glacier
On 1月24日, 下午1时49分, [EMAIL PROTECTED] wrote: > On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote: > > > I use chinese charactors as an example here. > > > >>>s1='你好吗' > > >>>repr(s1) > > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > > >>>b1=s1.decode('GBK') > > > My first question is : what strategy

Re: Some questions about decode/encode

2008-01-23 Thread Gabriel Genellina
En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió: > According to your reply, what will happen if I try to decode a long > string seperately. > I mean: > ## > a='你好吗'*10 > s1 = u'' > cur = 0 > while cur < len(a): > d = min(len(a)-i

Re: Some questions about decode/encode

2008-01-23 Thread glacier
On 1月24日, 下午1时41分, Ben Finney <[EMAIL PROTECTED]> wrote: > Ben Finney <[EMAIL PROTECTED]> writes: > > glacier <[EMAIL PROTECTED]> writes: > > > > I use chinese charactors as an example here. > > > > >>>s1='你好吗' > > > >>>repr(s1) > > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > > >>>b1=s1.decode('GBK')

Re: Some questions about decode/encode

2008-01-23 Thread bbtestingbb
On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote: > I use chinese charactors as an example here. > > >>>s1='你好吗' > >>>repr(s1) > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > >>>b1=s1.decode('GBK') > > My first question is : what strategy does 'decode' use to tell the way > to seperate the words.

Re: Some questions about decode/encode

2008-01-23 Thread Ben Finney
Ben Finney <[EMAIL PROTECTED]> writes: > glacier <[EMAIL PROTECTED]> writes: > > > I use chinese charactors as an example here. > > > > >>>s1='你好吗' > > >>>repr(s1) > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > >>>b1=s1.decode('GBK') > > > > My first question is : what strategy does 'decode' use to

Re: Some questions about decode/encode

2008-01-23 Thread Ben Finney
glacier <[EMAIL PROTECTED]> writes: > I use chinese charactors as an example here. > > >>>s1='你好吗' > >>>repr(s1) > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > >>>b1=s1.decode('GBK') > > My first question is : what strategy does 'decode' use to tell the way > to seperate the words. I mean since s1 is an

Some questions about decode/encode

2008-01-23 Thread glacier
I use chinese charactors as an example here. >>>s1='你好吗' >>>repr(s1) "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" >>>b1=s1.decode('GBK') My first question is : what strategy does 'decode' use to tell the way to seperate the words. I mean since s1 is an multi-bytes-char string, how did it determine to seper