On 2022-02-15 06:05, Tim Peters wrote:
[Steven D'Aprano <st...@pearwood.info>]
I've been interested in the existence of SNOBOL string scanning for
a long time, but I know very little about it.

How does it differ from regexes, and why have programming languages
pretty much standardised on regexes rather than other forms of string
matching?

What we call "regexps" today contain all sorts of things that aren't
in the original formal definition of "regular expressions". For
example, even the ubiquitous "^" and "$" (start- and end-of-line
assertions) go beyond what the phrase formally means.

So the question is ill-defined. When Perl added recursive regular
expressions, I'm not sure there's any real difference in theoretical
capability remaining. Without that, though, and for example, you can't
write a regular expression that matches strings with balanced
parentheses ("regexps can't count"), while I earlier posted a simple
2-liner in SNOBOL that implements such a thing (patterns in SNOBOL can
freely invoke other patterns, including themselves).

As to why regexps prevailed, traction! They are useful tools, and
_started_ life as pretty simple things, with small, elegant, and
efficient implementations Feature creep and "faster! faster! faster!"
turned the implementations more into bottomless pits now ;-)

Adoption breeds more adoption in the computer world. They have no real
competition anymore. The same sociological illness has also cursed us,
e.g., with an eternity of floating point signed zeroes ;--)

Chris didn't say this, but I will: I'm amazed that things much
_simpler_ than regexps, like his scanf and REXX PARSE
examples,,haven't spread more. Simple solutions to simple problems are
very appealing to me. Although, to be fair, I get a kick too out of
massive overkill ;l-)

Regexes were simple to start with, so only a few metacharacters were needed, the remaining characters being treated as literals.

As new features were added, the existing metacharacters were used in new ways that had been illegal until then in order to remain backwards-compatible.

Add to that that there are multiple implementations with differing (and sometimes only slightly differing) features and behaviours.

It's a good example of evolution: often messy, and resulting in clunky designs.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NU3IQUNERTUQICFZT4XIR3MFY6LJV2NS/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to