On 25 Mar 2020, at 9:48, Stephen J. Turnbull wrote:

Walter Dörwald writes:

A `find()` that supports multiple search strings (and returns the
leftmost position where a search string can be found) is a great help in
implementing some kind of tokenizer:

In other words, you want the equivalent of Emacs's "(search-forward
(regexp-opt list-of-strings))", which also meets the requirement of
returning which string was found (as "(match-string 0)").

Sounds like it. I'm not familiar with Emacs.

Since Python already has a functionally similar API for regexps, we
can add a regexp-opt (with appropriate name) method to re, perhaps as
.compile_string_list(), and provide a convenience function
re.search_string_list() for your application.

If you're using regexps anyway, building the appropriate or-expression shouldn't be a problem. I guess that's what most lexers/tokenizers do anyway.

I'm applying practicality before purity, of course.  To some extent
we want to encourage simple string approaches, and putting this in
regex is not optimal for that.

Exactly. I'm always a bit hesitant when using regexps, if there's a simpler string approach.

Steve

Servus,
   Walter
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/46KMMKYHW7DIDNZFO27GNQCJVILNSQ6Q/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to