The "why" question is not very interesting -- it probably wasn't in PCRE and nobody was familiar with it when we moved off PCRE (maybe it wasn't even in Perl at the time -- it was ~15 years ago).
I didn't understand your description of \G so I googled it and found a helpful StackOverflow article: https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex. >From this I understand that when using e.g. findall() it forces successive matches to be adjacent. In general this seems to be a unique property of \G: it preserves *state* from one match to the next. This will make it somewhat difficult to implement -- e.g. that state should probably be thread-local in case multiple threads use the same compiled regex. It's also unclear when that state should be reset. (Only when you compile the regex? Each time you pass it a different source string?) So I'm not sure it's reasonable to add. But I also don't see a reason why it shouldn't be added -- presuming we can decide on good answer for the questions above about the "scope" of the anchor. I think it's okay to start a discussion on bugs.python.org about the precise specification of \G for Python. OTOH I expect that most core devs won't find this a very interesting problem (Python relies on regexes for parsing a lot less than Perl does). Good luck! On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horo...@gmail.com> wrote: > All, > > perl has a regex assertion (\G) that allows multiple-match regular > expressions to be able to use the position of the last match. Perl's > documentation puts it this way: > > \G Match only at pos() (e.g. at the end-of-match position of prior > m//g) > > Anyways, this is exceedingly powerful for matching regularly > structured free-form records, and I was really surprised when I found > out that python did not have it. For example, if findall supported > this, it would be possible to write things like this (a quick and > dirty ifconfig parser): > > pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S) > > val = """ > eth2 Link encap:Ethernet HWaddr xx > inet addr: xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:xx.xx.xx.xx > ... > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > """ > matches = re.findall(pat, val) > > So - why doesn't python have this? is it something that simply was > overlooked, or is there another method of doing the same thing with > arbitrarily complex freeform records? > > thanks much.. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com