Re: [Python-Dev] Regular expressions: splitting on zero-width patterns

2017-11-28 Thread MRAB
On 2017-11-28 22:27, Guido van Rossum wrote: On Tue, Nov 28, 2017 at 2:23 PM, MRAB > wrote: On 2017-11-28 20:04, Serhiy Storchaka wrote: The two largest problems in the re module are splitting on zero-width patterns and complete and

Re: [Python-Dev] Regular expressions: splitting on zero-width patterns

2017-11-28 Thread Guido van Rossum
On Tue, Nov 28, 2017 at 2:23 PM, MRAB wrote: > On 2017-11-28 20:04, Serhiy Storchaka wrote: > >> The two largest problems in the re module are splitting on zero-width >> patterns and complete and correct support of the Unicode standard. These >> problems are solved in regex. regex has many other

Re: [Python-Dev] Regular expressions: splitting on zero-width patterns

2017-11-28 Thread MRAB
On 2017-11-28 20:04, Serhiy Storchaka wrote: The two largest problems in the re module are splitting on zero-width patterns and complete and correct support of the Unicode standard. These problems are solved in regex. regex has many other features, but they are less important. I want to tell the

Re: [Python-Dev] Regular expressions: splitting on zero-width patterns

2017-11-28 Thread MRAB
On 2017-11-28 20:04, Serhiy Storchaka wrote: The two largest problems in the re module are splitting on zero-width patterns and complete and correct support of the Unicode standard. These problems are solved in regex. regex has many other features, but they are less important. I want to tell the

Re: [Python-Dev] Regular expressions: splitting on zero-width patterns

2017-11-28 Thread Guido van Rossum
I trust your instincts and powers of analysis here. Maybe MRAB has some useful feedback on the tar in the honey? On Tue, Nov 28, 2017 at 12:04 PM, Serhiy Storchaka wrote: > The two largest problems in the re module are splitting on zero-width > patterns and complete and correct support of the Un

[Python-Dev] Regular expressions: splitting on zero-width patterns

2017-11-28 Thread Serhiy Storchaka
The two largest problems in the re module are splitting on zero-width patterns and complete and correct support of the Unicode standard. These problems are solved in regex. regex has many other features, but they are less important. I want to tell the problem of splitting on zero-width pattern

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Greg Ewing
Nick Maclaren wrote: > You can convert them to things that are sort of NFA/DFA > hybrids, If you could express it as an NFA, then you could (in principle) convert it to a DFA. So whatever it's using can't be an NFA either. -- Greg ___ Python-Dev mailing

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Nick Maclaren
James Y Knight <[EMAIL PROTECTED]> wrote: > > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > That could be 'solved' by a parser that kicked out such constructions, > > but it would get screams from many u

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > Your specification was "For Unicode, whatever people agree!" > > I would not call that "Unicode-based". Can we drop this, please? I am happy to agree that I was being unclear (it is a common failing of mine), but I did prov

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-10 Thread Greg Ewing
James Y Knight wrote: > On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > People keep saying things like this as if GNU grep and tcl's regular > expressio

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread James Y Knight
On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > Firstly, things like backreferences are an absolute no-no. They > are not regular, and REs with them in cannot be converted to DFAs. > That could be 'solved' by a parser that kicked out such constructions, > but it would get screams from many user

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread Greg Ewing
Martin v. Löwis wrote: > I know the term "printable character", which is what I read > in definitions of the isprint() routine. "printing character" > I never heard before. Hmmm... I guess this means your brain is using a part-of-speech-sensitive word->technical_meaning mapping. Perhaps this will

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread Martin v. Löwis
Nick Maclaren schrieb: >> The relevance is that your specification of "printing character" >> as "isprint returns true" is nearly useless, as it only applies >> to byte-oriented characters. > > Eh? That's ALL I used it to specify! I used a Unicode-based > specification for Unicode. Your specifi

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-09 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > There is no problem for isalnum: it will just go away if > byte-oriented characters go away. Fortunately, we have a > replacement for the Unicode case. As we do for isprint. > The relevance is that your specification of "pr

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
>> In the mediate term, locale-based testing will go away/be not >> implementable (in particular, Py3k won't have a byte-oriented >> character string type, so we can't use isprint). In general, >> isprint is unsuitable since it doesn't support multi-byte >> character sets. > > Well, iswprint isn't

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > >> Before discussing the escape, I'd like to see a specification of > >> it first - what characters precisely would classify as "printing"? > > > > For basic ASCII and locale-based testing, whatever isprint() says. > > Just

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
>> Before discussing the escape, I'd like to see a specification of >> it first - what characters precisely would classify as "printing"? > > For basic ASCII and locale-based testing, whatever isprint() says. > Just as for isalpha(). In the mediate term, locale-based testing will go away/be not i

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Mike Klaas
In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote: > >>> The other approach, which is to stick to true regular expressions, >>> and wholly or partially convert to DFAs, has already been rendered >>> impossible by even the limited Perl/PCRE extensions that Python >>> has adopted. >> >> Impossible? Sur

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
I am not on "Python 3000", so am restricting. Mike Klaas <[EMAIL PROTECTED]> wrote: > > > I have needed to push my stack to teach REs (don't ask), and am > > taking a look at the RE code. I may be able to extend it to support > > RFE 694374 and (more importantly) atomic groups and possessive > >

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
[ I would appreciate not getting private copies as well. ] =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> wrote: > > Before discussing the escape, I'd like to see a specification of > it first - what characters precisely would classify as "printing"? For basic ASCII and locale-bas

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Mike Klaas
On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote: > I have needed to push my stack to teach REs (don't ask), and am > taking a look at the RE code. I may be able to extend it to support > RFE 694374 and (more importantly) atomic groups and possessive > quantifiers. While I regard such things as revo

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
> Further to the above, I found the Unicode sources, have rebuilt > the files, but it involved some fairly serious hacking to the > building mechanism and I have had to disable the Unicode 3.2 > support. And, of course, that means that 4 of the tests fail. > > This area needs addressing, not leas

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Martin v. Löwis
> My second one is about Unicode. I really, but REALLY regard it as > a serious defect that there is no escape for printing characters. > Any code that checks arbitrary text is likely to need them - yes, > I know why Perl and hence PCRE doesn't have that, but let's skip > that. That is easy to ad

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Georg Brandl
Nick Maclaren schrieb: > Further to the above, I found the Unicode sources, have rebuilt > the files, but it involved some fairly serious hacking to the > building mechanism and I have had to disable the Unicode 3.2 > support. And, of course, that means that 4 of the tests fail. > > This area nee

Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
Further to the above, I found the Unicode sources, have rebuilt the files, but it involved some fairly serious hacking to the building mechanism and I have had to disable the Unicode 3.2 support. And, of course, that means that 4 of the tests fail. This area needs addressing, not least because Py

[Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Nick Maclaren
I have needed to push my stack to teach REs (don't ask), and am taking a look at the RE code. I may be able to extend it to support RFE 694374 and (more importantly) atomic groups and possessive quantifiers. While I regard such things as revolting beyond belief, they make a HELL of a difference t

Re: [Python-Dev] Regular expressions

2005-11-24 Thread Dennis Allison
This is probably OT for [Python-dev] I suspect that your problem is not the GIL but is due to something else. Rather than dorking with the interpreter's threading, you probably would be better off rethinking your problem and finding a better way to accomplish your task. On Thu, 24 Nov 2005, Du