Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Greg Ewing
Nick Maclaren wrote: > You can convert them to things that are sort of NFA/DFA > hybrids, If you could express it as an NFA, then you could (in principle) convert it to a DFA. So whatever it's using can't be an NFA either. -- Greg ___ Python-Dev mailing

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Nick Maclaren
James Y Knight <[EMAIL PROTECTED]> wrote: > > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > That could be 'solved' by a parser that kicked out such constructions, > > but it would get screams from many u

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > Your specification was "For Unicode, whatever people agree!" > > I would not call that "Unicode-based". Can we drop this, please? I am happy to agree that I was being unclear (it is a common failing of mine), but I did prov

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Greg Ewing
James Y Knight wrote: > On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > People keep saying things like this as if GNU grep and tcl's regular > expressio

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread James Y Knight
On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > Firstly, things like backreferences are an absolute no-no. They > are not regular, and REs with them in cannot be converted to DFAs. > That could be 'solved' by a parser that kicked out such constructions, > but it would get screams from many user

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread Greg Ewing
Martin v. Löwis wrote: > I know the term "printable character", which is what I read > in definitions of the isprint() routine. "printing character" > I never heard before. Hmmm... I guess this means your brain is using a part-of-speech-sensitive word->technical_meaning mapping. Perhaps this will

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread Martin v. Löwis
Nick Maclaren schrieb: >> The relevance is that your specification of "printing character" >> as "isprint returns true" is nearly useless, as it only applies >> to byte-oriented characters. > > Eh? That's ALL I used it to specify! I used a Unicode-based > specification for Unicode. Your specifi

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > There is no problem for isalnum: it will just go away if > byte-oriented characters go away. Fortunately, we have a > replacement for the Unicode case. As we do for isprint. > The relevance is that your specification of "pr

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
>> In the mediate term, locale-based testing will go away/be not >> implementable (in particular, Py3k won't have a byte-oriented >> character string type, so we can't use isprint). In general, >> isprint is unsuitable since it doesn't support multi-byte >> character sets. > > Well, iswprint isn't

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > >> Before discussing the escape, I'd like to see a specification of > >> it first - what characters precisely would classify as "printing"? > > > > For basic ASCII and locale-based testing, whatever isprint() says. > > Just

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
>> Before discussing the escape, I'd like to see a specification of >> it first - what characters precisely would classify as "printing"? > > For basic ASCII and locale-based testing, whatever isprint() says. > Just as for isalpha(). In the mediate term, locale-based testing will go away/be not i

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Mike Klaas
In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote: > >>> The other approach, which is to stick to true regular expressions, >>> and wholly or partially convert to DFAs, has already been rendered >>> impossible by even the limited Perl/PCRE extensions that Python >>> has adopted. >> >> Impossible? Sur

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
I am not on "Python 3000", so am restricting. Mike Klaas <[EMAIL PROTECTED]> wrote: > > > I have needed to push my stack to teach REs (don't ask), and am > > taking a look at the RE code. I may be able to extend it to support > > RFE 694374 and (more importantly) atomic groups and possessive > >

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
[ I would appreciate not getting private copies as well. ] =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > Before discussing the escape, I'd like to see a specification of > it first - what characters precisely would classify as "printing"? For basic ASCII and locale-bas

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Mike Klaas
On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote: > I have needed to push my stack to teach REs (don't ask), and am > taking a look at the RE code. I may be able to extend it to support > RFE 694374 and (more importantly) atomic groups and possessive > quantifiers. While I regard such things as revo

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
> Further to the above, I found the Unicode sources, have rebuilt > the files, but it involved some fairly serious hacking to the > building mechanism and I have had to disable the Unicode 3.2 > support. And, of course, that means that 4 of the tests fail. > > This area needs addressing, not leas

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
> My second one is about Unicode. I really, but REALLY regard it as > a serious defect that there is no escape for printing characters. > Any code that checks arbitrary text is likely to need them - yes, > I know why Perl and hence PCRE doesn't have that, but let's skip > that. That is easy to ad

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Georg Brandl
Nick Maclaren schrieb: > Further to the above, I found the Unicode sources, have rebuilt > the files, but it involved some fairly serious hacking to the > building mechanism and I have had to disable the Unicode 3.2 > support. And, of course, that means that 4 of the tests fail. > > This area nee

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
Further to the above, I found the Unicode sources, have rebuilt the files, but it involved some fairly serious hacking to the building mechanism and I have had to disable the Unicode 3.2 support. And, of course, that means that 4 of the tests fail. This area needs addressing, not least because Py

[Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
I have needed to push my stack to teach REs (don't ask), and am taking a look at the RE code. I may be able to extend it to support RFE 694374 and (more importantly) atomic groups and possessive quantifiers. While I regard such things as revolting beyond belief, they make a HELL of a difference t