Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 18:12:33 +0100, Peter Otten wrote: > By the way: > print quopri.decodestring("=E4=F6=FC").decode("iso-8859-1") > äöü print r"\xe4\xf6\xfc".decode("string-escape").decode("iso-8859-1") > äöü Ah - better than a regex. Thanks! -- http://mail.python.org/mailman/listin

Re: Newbie needs regex help

2010-12-06 Thread Peter Otten
Dan M wrote: > I'm getting bogged down with backslash escaping. > > I have some text files containing characters with the 8th bit set. These > characters are encoded one of two ways: either "=hh" or "\xhh", where "h" > represents a hex digit, and "\x" is a literal backslash followed by a > lower-

Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 09:44:39 -0600, Dan M wrote: > That's what I had initially assumed was the case, but looking at the > data files with a hex editor showed me that I do indeed have > four-character sequences. That's what makes this such as interesting > task! Sorry, I misunderstood the first ti

Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 16:34:56 +0100, Alain Ketterlin wrote: > Dan M writes: > >> I took at look at http://docs.python.org/howto/regex.html, especially >> the section titled "The Backslash Plague". I started out trying : > > import re > r = re.compile('x([0-9a-fA-F]{2})') a = "This \x

Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 10:29:41 -0500, Mel wrote: > What you're missing is that string `a` doesn't actually contain four- > character sequences like '\', 'x', 'a', 'a' . It contains single > characters that you encode in string literals as '\xaa' and so on. You > might do better with > > p1 = r'([

Re: Newbie needs regex help

2010-12-06 Thread Alain Ketterlin
Dan M writes: > I took at look at http://docs.python.org/howto/regex.html, especially the > section titled "The Backslash Plague". I started out trying : import re r = re.compile('x([0-9a-fA-F]{2})') a = "This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0 The backs

Re: Newbie needs regex help

2010-12-06 Thread Mel
Dan M wrote: > I'm getting bogged down with backslash escaping. > > I have some text files containing characters with the 8th bit set. These > characters are encoded one of two ways: either "=hh" or "\xhh", where "h" > represents a hex digit, and "\x" is a literal backslash followed by a > lower-

Newbie needs regex help

2010-12-06 Thread Dan M
I'm getting bogged down with backslash escaping. I have some text files containing characters with the 8th bit set. These characters are encoded one of two ways: either "=hh" or "\xhh", where "h" represents a hex digit, and "\x" is a literal backslash followed by a lower-case x. Catching the f