On Mon, Feb 14, 2022 at 05:13:38PM -0600, Tim Peters wrote: > An interesting lesson nobody wants to learn: the original major > string-processing language, SNOBOL, had powerful pattern matching but > no regexps. Griswold's more modern successor language, Icon, found no > reason to change that.
I've been interested in the existence of SNOBOL string scanning for a long time, but I know very little about it. How does it differ from regexes, and why have programming languages pretty much standardised on regexes rather than other forms of string matching? > Naive regexps are both clumsy and prone to bad > timing in many tasks that "should be" very easy to express. For > example, "now match up to the next occurrence of 'X'". In SNOBOL and > Icon, that's trivial. 75% of regexp users will write ".*X", with scant > understanding that it may match waaaay more than they intended. Indeed, I've been bitten by that many times :-) > Another 20% will write ".*?X", with scant understanding that may > extend beyond _just_ "the next" X in some cases. But this surprises me. Do you have an example? > That leaves the happy > 5% who write "[^X]*X", which finally says what they intended from the > start. Doesn't that only work if X is literally a single character? >>> import re >>> string = "This is some spam and extra spam." >>> re.search('[^spam]*spam', string) <re.Match object; span=(11, 17), match='e spam'> Whereas this seems to do what I expected: >>> re.search('.*?spam', string) <re.Match object; span=(0, 17), match='This is some spam'> -- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XDTMX2JUSGOBT4KNRSAGJT3BBPDY645Q/ Code of Conduct: http://python.org/psf/codeofconduct/