On Mon, Feb 14, 2022 at 05:13:38PM -0600, Tim Peters wrote:

> An interesting lesson nobody wants to learn: the original major
> string-processing language, SNOBOL, had powerful pattern matching but
> no regexps. Griswold's more modern successor language, Icon, found no
> reason to change that.

I've been interested in the existence of SNOBOL string scanning for 
a long time, but I know very little about it.

How does it differ from regexes, and why have programming languages 
pretty much standardised on regexes rather than other forms of string 
matching?


> Naive regexps are both clumsy and prone to bad
> timing in many tasks that "should be" very easy to express. For
> example, "now match up to the next occurrence of 'X'". In SNOBOL and
> Icon, that's trivial. 75% of regexp users will write ".*X", with scant
> understanding that it may match waaaay more than they intended.

Indeed, I've been bitten by that many times :-)


> Another 20% will write ".*?X", with scant understanding that may
> extend beyond _just_ "the next" X in some cases.

But this surprises me. Do you have an example?

> That leaves the happy
> 5% who write "[^X]*X", which finally says what they intended from the
> start.

Doesn't that only work if X is literally a single character?

>>> import re
>>> string = "This is some spam and extra spam."
>>> re.search('[^spam]*spam', string)
<re.Match object; span=(11, 17), match='e spam'>

Whereas this seems to do what I expected:

>>> re.search('.*?spam', string)
<re.Match object; span=(0, 17), match='This is some spam'>


-- 
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XDTMX2JUSGOBT4KNRSAGJT3BBPDY645Q/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to