Re: [Python-ideas] What about regexp string litterals : re".*" ?

Abe Dillon Thu, 30 Mar 2017 18:38:37 -0700

> a huge advantage of REs is that they are common to many
> languages. You can take a regex from grep to Perl to your editor to
> Python. They're not absolutely identical, of course, but the basics
> are all the same. Creating a new search language means everyone has to
> learn anew.
> ChrisA

1) I'm not suggesting we get rid of the re module (the VE implementation I
linked requires it)
2) You can easily output regex from verbal expressions
3) verbal expressions are implemented in many different languages too:
https://verbalexpressions.github.io/
4) It even has a generic interface that all implementations are meant to
follow:
https://github.com/VerbalExpressions/implementation/wiki/List-of-methods-to-implement

Note that the entire documentation is 250 words while just the syntax
portion of Python docs for the re module is over 3000 words.

> You think that example is more readable than the proposed transalation
>     ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$
> which is better written
>     ^https?://(www\.)?[^ ]*$
> or even
>     ^https?://[^ ]*$

Yes. I find it *far* more readable. It's not a soup of symbols like Perl
code. I can only surmise that you're fluent in regex because it seems
difficult for you to see how the above could be less readable than English
words.

which makes it obvious that the regexp is not very useful from the
> word "^"?  (It matches only URLs which are the only thing, including
> whitespace, on the line, probably not what was intended.)

I could tell it only matches URLs that are the only thing inside the string
because it clearly says:
start_of_line() and end_of_line(). I would have had to refer to a reference
to know that "^" doesn't always mean "not", it sometimes means "start of
string" and probably other things. I would also have to check a reference
to know that "$" can mean "end of string" (and probably other things).

Are those groups capturing in Verbal Expressions?  The use of "find"
> (~ "search") rather than "match" is disconcerting to the experienced
> user.

You can alternately use the word "then". The source code is just one python
file. It's very easy to read. I actually like "then" over "find" for the
example:

verbal_expression.start_of_line()
    .then('http')
    .maybe('s')
    .then('://')
    .maybe('www.')
    .anything_but(' ')
    .end_of_line()

What does alternation look like?

.OR(option1).OR(option2).OR(option3)...

How about alternation of
> non-trivial regular expressions?

.OR(other_verbal_expression)

As far as I can see, Verbal Expressions are basically a way of making
> it so painful to write regular expressions that people will restrict
> themselves to regular expressions

What's so painful to write about them? Does your IDE not have
autocompletion? I find REs so painful to write that I usually just use
string methods if at all feasible.

I don't think that this failure to respect the
> developer's taste is restricted to this particular implementation,
> either.

I generally find it distasteful to write a pseudolanguage in strings inside
of other languages (this applies to SQL as well). Especially when the
design principals of that pseudolanguage are *diametrically opposed* to the
design principals of the host language. A key principal of Python's design
is: "you read code a lot more often than you write code, so emphasize
readability". Regex seems to be based on: "Do the most with the fewest
key-strokes. Readability be dammed!". It makes a lot more sense to wrap the
psudolanguage in constructs that bring it in-line with the host language
than to take on the mental burden of trying to comprehend two different
languages at the same time.

If you disagree, nothing's stopping you from continuing to write res the
old-fashion way. Can we at least agree that baking special re syntax
directly into the language is a bad idea?

On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan <ncogh...@gmail.com> wrote:

> On 28 March 2017 at 01:17, Simon D. <si...@acoeuro.com> wrote:
> > It would ease the use of regexps in Python
>
> We don't really want to ease the use of regexps in Python - while
> they're an incredibly useful tool in a programmer's toolkit, they're
> so cryptic that they're almost inevitably a maintainability nightmare.
>
> Baking them directly into the language runtime also locks people in to
> a particular regex engine implementation, rather than being able to
> swap in a third party one if they choose to do so (as many folks
> currently do with the `regex` PyPI module).
>
> So it's appropriate to keep them as a string-based library level
> capability, and hence on a relatively level playing field with less
> comprehensive, but typically easier to maintain, options like string
> methods and third party text parsing libraries (such as
> https://pypi.python.org/pypi/parse for something close to the inverse
> of str.format)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] What about regexp string litterals : re".*" ?

Reply via email to