Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-04-03 Thread Neil Girdhar
I've tried PyParsing.  I haven't tried Grako.

On Mon, Apr 3, 2017 at 8:54 AM Ryan Gonzalez  wrote:

> Have you tried PyParsing and/or Grako? They're some of my favorites (well,
> I like PLY too, but I'm thinking you wouldn't like it too much).
>
> --
> Ryan (ライアン)
> Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else
> http://refi64.com
>
> On Apr 3, 2017 3:26 AM, "Neil Girdhar"  wrote:
>
>
>
> On Mon, Apr 3, 2017 at 2:31 AM Mark Lawrence via Python-ideas <
> python-ideas@python.org> wrote:
>
> On 03/04/2017 02:22, Neil Girdhar wrote:
> > Same.  One day, Python will have a decent parsing library.
> >
>
> Nothing here https://wiki.python.org/moin/LanguageParsing suits your
> needs?
>
>
> No, unfortunately.
>
> I tried to make a simple grammar that parses latex code, and it was
> basically impossible with these tools.
>
> From what I remember, you need the match objects to be able to accept or
> reject their matched sub-nodes.
>
> It's same thing if you want to parse Python in one pass (not the usual two
> passes that CPython does whereby it creates an AST and then validates it).
> It would be cooler to validate as you go since the errors can be much
> richer since you have the whole parsing context?
>
> It's been a while, so I might be forgetting something, but I remember
> thinking that I'll check back in five years and see if anything new has
> come out.
>
>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/FSd6xLHowg8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-04-03 Thread Ryan Gonzalez
Have you tried PyParsing and/or Grako? They're some of my favorites (well,
I like PLY too, but I'm thinking you wouldn't like it too much).

--
Ryan (ライアン)
Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else
http://refi64.com

On Apr 3, 2017 3:26 AM, "Neil Girdhar"  wrote:

>
>
> On Mon, Apr 3, 2017 at 2:31 AM Mark Lawrence via Python-ideas <
> python-ideas@python.org> wrote:
>
>> On 03/04/2017 02:22, Neil Girdhar wrote:
>> > Same.  One day, Python will have a decent parsing library.
>> >
>>
>> Nothing here https://wiki.python.org/moin/LanguageParsing suits your
>> needs?
>>
>
> No, unfortunately.
>
> I tried to make a simple grammar that parses latex code, and it was
> basically impossible with these tools.
>
> From what I remember, you need the match objects to be able to accept or
> reject their matched sub-nodes.
>
> It's same thing if you want to parse Python in one pass (not the usual two
> passes that CPython does whereby it creates an AST and then validates it).
> It would be cooler to validate as you go since the errors can be much
> richer since you have the whole parsing context?
>
> It's been a while, so I might be forgetting something, but I remember
> thinking that I'll check back in five years and see if anything new has
> come out.
>
>>
>> --
>> My fellow Pythonistas, ask not what our language can do for you, ask
>> what you can do for our language.
>>
>> Mark Lawrence
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "python-ideas" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/
>> topic/python-ideas/FSd6xLHowg8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> python-ideas+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-04-03 Thread Neil Girdhar
On Mon, Apr 3, 2017 at 2:31 AM Mark Lawrence via Python-ideas <
python-ideas@python.org> wrote:

> On 03/04/2017 02:22, Neil Girdhar wrote:
> > Same.  One day, Python will have a decent parsing library.
> >
>
> Nothing here https://wiki.python.org/moin/LanguageParsing suits your
> needs?
>

No, unfortunately.

I tried to make a simple grammar that parses latex code, and it was
basically impossible with these tools.

>From what I remember, you need the match objects to be able to accept or
reject their matched sub-nodes.

It's same thing if you want to parse Python in one pass (not the usual two
passes that CPython does whereby it creates an AST and then validates it).
It would be cooler to validate as you go since the errors can be much
richer since you have the whole parsing context?

It's been a while, so I might be forgetting something, but I remember
thinking that I'll check back in five years and see if anything new has
come out.

>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/FSd6xLHowg8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-04-03 Thread Mark Lawrence via Python-ideas

On 03/04/2017 02:22, Neil Girdhar wrote:

Same.  One day, Python will have a decent parsing library.



Nothing here https://wiki.python.org/moin/LanguageParsing suits your needs?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-04-02 Thread Neil Girdhar
Same.  One day, Python will have a decent parsing library.

On Friday, March 31, 2017 at 4:21:51 AM UTC-4, Stephan Houben wrote:
>
> Hi all, 
>
> FWIW, I also strongly prefer the Verbal Expression style and consider 
> "normal" regular expressions to become quickly unreadable and 
> unmaintainable. 
>
> Verbal Expressions are also much more composable. 
>
> Stephan 
>
> 2017-03-31 9:23 GMT+02:00 Stephen J. Turnbull 
> : 
> > Abe Dillon writes: 
> > 
> >  > Note that the entire documentation is 250 words while just the syntax 
> >  > portion of Python docs for the re module is over 3000 words. 
> > 
> > Since Verbal Expressions (below, VEs, indicating notation) "compile" 
> > to regular expressions (spelling out indicates the internal matching 
> > implementation), the documentation of VEs presumably ignores 
> > everything except the limited language it's useful for.  To actually 
> > understand VEs, you need to refer to the RE docs.  Not a win IMO. 
> > 
> >  > > You think that example is more readable than the proposed 
> transalation 
> >  > > ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ 
> >  > > which is better written 
> >  > > ^https?://(www\.)?[^ ]*$ 
> >  > > or even 
> >  > > ^https?://[^ ]*$ 
> >  > 
> >  > 
> >  > Yes. I find it *far* more readable. It's not a soup of symbols like 
> Perl 
> >  > code. I can only surmise that you're fluent in regex because it seems 
> >  > difficult for you to see how the above could be less readable than 
> English 
> >  > words. 
> > 
> > Yes, I'm fairly fluent in regular expression notation (below, REs). 
> > I've maintained a compiler for one dialect. 
> > 
> > I'm not interested in the difference between words and punctuation 
> > though.  The reason I find the middle RE most readable is that it 
> > "looks like" what it's supposed to match, in a contiguous string as 
> > the object it will match will be contiguous.  If I need to parse it to 
> > figure out *exactly* what it matches, yes, that takes more effort. 
> > But to understand a VE's semantics correctly, I'd have to look it up 
> > as often as you have to look up REs because many words chosen to notate 
> > VEs have English meanings that are (a) ambiguous, as in all natural 
> > language, and (b) only approximate matches to RE semantics. 
> > 
> >  > I could tell it only matches URLs that are the only thing inside 
> >  > the string because it clearly says: start_of_line() and 
> >  > end_of_line(). 
> > 
> > That's not the problem.  The problem is the semantics of the method 
> > "find".  "then" would indeed read better, although it doesn't exactly 
> > match the semantics of concatenation in REs. 
> > 
> >  > I would have had to refer to a reference to know that "^" doesn't 
> >  > always mean "not", it sometimes means "start of string" and 
> >  > probably other things. I would also have to check a reference to 
> >  > know that "$" can mean "end of string" (and probably other things). 
> > 
> > And you'll still have to do that when reading other people's REs. 
> > 
> >  > > Are those groups capturing in Verbal Expressions?  The use of 
> >  > > "find" (~ "search") rather than "match" is disconcerting to the 
> >  > > experienced user. 
> >  > 
> >  > You can alternately use the word "then". The source code is just 
> >  > one python file. It's very easy to read. I actually like "then" 
> >  > over "find" for the example: 
> > 
> > You're missing the point.  The reader does not get to choose the 
> > notation, the author does.  I do understand what several varieties of 
> > RE mean, but the variations are of two kinds: basic versus extended 
> > (ie, what tokens need to be escaped to be taken literally, which ones 
> > have special meaning if escaped), and extensions (which can be 
> > ignored).  Modern RE facilities are essentially all of the extended 
> > variety.  Once you've learned that, you're in good shape for almost 
> > any RE that should be written outside of an obfuscated code contest. 
> > 
> > This is a fundamental principle of Python design: don't make readers 
> > of code learn new things.  That includes using notation developed 
> > elsewhere in many cases. 
> > 
> >  > What does alternation look like? 
> >  > 
> >  > .OR(option1).OR(option2).OR(option3)... 
> >  > 
> >  > How about alternation of 
> >  > > non-trivial regular expressions? 
> >  > 
> >  > .OR(other_verbal_expression) 
> > 
> > Real examples, rather than pseudo code, would be nice.  I think you, 
> > too, will find that examples of even fairly simple nested alternations 
> > containing other constructs become quite hard to read, as they fall 
> > off the bottom of the screen. 
> > 
> > For example, the VE equivalent of 
> > 
> > scheme = "(https?|ftp|file):" 
> > 
> > would be (AFAICT): 
> > 
> > scheme = VerEx().then(VerEx().then("http") 
> >  .maybe("s") 
> >  .OR("ftp") 
> >

Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-31 Thread Paul Moore
On 31 March 2017 at 09:20, Stephan Houben  wrote:
> FWIW, I also strongly prefer the Verbal Expression style and consider
> "normal" regular expressions to become quickly unreadable and
> unmaintainable.

Do you publish your code widely? What's the view of 3rd party users of
your code? Until this thread, I'd never even heard of the Verbal
Expression style, and I read a *lot* of open source Python code. While
it's purely anecdotal, that suggests to me that the style isn't
particularly commonly used.

(OTOH, there's also a lot less use of REs in Python code than in other
languages. Much string manipulation in Python avoids using regular
languages at all, in my experience. I think that's a good thing - use
simpler tools when appropriate and keep the power tools for the hard
cases where they justify their complexity).

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-31 Thread Stephan Houben
Hi all,

FWIW, I also strongly prefer the Verbal Expression style and consider
"normal" regular expressions to become quickly unreadable and
unmaintainable.

Verbal Expressions are also much more composable.

Stephan

2017-03-31 9:23 GMT+02:00 Stephen J. Turnbull
:
> Abe Dillon writes:
>
>  > Note that the entire documentation is 250 words while just the syntax
>  > portion of Python docs for the re module is over 3000 words.
>
> Since Verbal Expressions (below, VEs, indicating notation) "compile"
> to regular expressions (spelling out indicates the internal matching
> implementation), the documentation of VEs presumably ignores
> everything except the limited language it's useful for.  To actually
> understand VEs, you need to refer to the RE docs.  Not a win IMO.
>
>  > > You think that example is more readable than the proposed transalation
>  > > ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$
>  > > which is better written
>  > > ^https?://(www\.)?[^ ]*$
>  > > or even
>  > > ^https?://[^ ]*$
>  >
>  >
>  > Yes. I find it *far* more readable. It's not a soup of symbols like Perl
>  > code. I can only surmise that you're fluent in regex because it seems
>  > difficult for you to see how the above could be less readable than English
>  > words.
>
> Yes, I'm fairly fluent in regular expression notation (below, REs).
> I've maintained a compiler for one dialect.
>
> I'm not interested in the difference between words and punctuation
> though.  The reason I find the middle RE most readable is that it
> "looks like" what it's supposed to match, in a contiguous string as
> the object it will match will be contiguous.  If I need to parse it to
> figure out *exactly* what it matches, yes, that takes more effort.
> But to understand a VE's semantics correctly, I'd have to look it up
> as often as you have to look up REs because many words chosen to notate
> VEs have English meanings that are (a) ambiguous, as in all natural
> language, and (b) only approximate matches to RE semantics.
>
>  > I could tell it only matches URLs that are the only thing inside
>  > the string because it clearly says: start_of_line() and
>  > end_of_line().
>
> That's not the problem.  The problem is the semantics of the method
> "find".  "then" would indeed read better, although it doesn't exactly
> match the semantics of concatenation in REs.
>
>  > I would have had to refer to a reference to know that "^" doesn't
>  > always mean "not", it sometimes means "start of string" and
>  > probably other things. I would also have to check a reference to
>  > know that "$" can mean "end of string" (and probably other things).
>
> And you'll still have to do that when reading other people's REs.
>
>  > > Are those groups capturing in Verbal Expressions?  The use of
>  > > "find" (~ "search") rather than "match" is disconcerting to the
>  > > experienced user.
>  >
>  > You can alternately use the word "then". The source code is just
>  > one python file. It's very easy to read. I actually like "then"
>  > over "find" for the example:
>
> You're missing the point.  The reader does not get to choose the
> notation, the author does.  I do understand what several varieties of
> RE mean, but the variations are of two kinds: basic versus extended
> (ie, what tokens need to be escaped to be taken literally, which ones
> have special meaning if escaped), and extensions (which can be
> ignored).  Modern RE facilities are essentially all of the extended
> variety.  Once you've learned that, you're in good shape for almost
> any RE that should be written outside of an obfuscated code contest.
>
> This is a fundamental principle of Python design: don't make readers
> of code learn new things.  That includes using notation developed
> elsewhere in many cases.
>
>  > What does alternation look like?
>  >
>  > .OR(option1).OR(option2).OR(option3)...
>  >
>  > How about alternation of
>  > > non-trivial regular expressions?
>  >
>  > .OR(other_verbal_expression)
>
> Real examples, rather than pseudo code, would be nice.  I think you,
> too, will find that examples of even fairly simple nested alternations
> containing other constructs become quite hard to read, as they fall
> off the bottom of the screen.
>
> For example, the VE equivalent of
>
> scheme = "(https?|ftp|file):"
>
> would be (AFAICT):
>
> scheme = VerEx().then(VerEx().then("http")
>  .maybe("s")
>  .OR("ftp")
>  .OR("file"))
> .then(":")
>
> which is pretty hideous, I think.  And the colon is captured by a
> group.  If perversely I wanted to extract that group from a match,
> what would its index be?
>
> I guess you could keep the linear arrangement with
>
> scheme = (VerEx().add("(")
>  .then("http")
>  .maybe("s")
>  .OR("ftp")
>  .OR("file")
>  

Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-31 Thread Stephen J. Turnbull
Abe Dillon writes:

 > Note that the entire documentation is 250 words while just the syntax
 > portion of Python docs for the re module is over 3000 words.

Since Verbal Expressions (below, VEs, indicating notation) "compile"
to regular expressions (spelling out indicates the internal matching
implementation), the documentation of VEs presumably ignores
everything except the limited language it's useful for.  To actually
understand VEs, you need to refer to the RE docs.  Not a win IMO.

 > > You think that example is more readable than the proposed transalation
 > > ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$
 > > which is better written
 > > ^https?://(www\.)?[^ ]*$
 > > or even
 > > ^https?://[^ ]*$
 > 
 > 
 > Yes. I find it *far* more readable. It's not a soup of symbols like Perl
 > code. I can only surmise that you're fluent in regex because it seems
 > difficult for you to see how the above could be less readable than English
 > words.

Yes, I'm fairly fluent in regular expression notation (below, REs).
I've maintained a compiler for one dialect.

I'm not interested in the difference between words and punctuation
though.  The reason I find the middle RE most readable is that it
"looks like" what it's supposed to match, in a contiguous string as
the object it will match will be contiguous.  If I need to parse it to
figure out *exactly* what it matches, yes, that takes more effort.
But to understand a VE's semantics correctly, I'd have to look it up
as often as you have to look up REs because many words chosen to notate
VEs have English meanings that are (a) ambiguous, as in all natural
language, and (b) only approximate matches to RE semantics.

 > I could tell it only matches URLs that are the only thing inside
 > the string because it clearly says: start_of_line() and
 > end_of_line().

That's not the problem.  The problem is the semantics of the method
"find".  "then" would indeed read better, although it doesn't exactly
match the semantics of concatenation in REs.

 > I would have had to refer to a reference to know that "^" doesn't
 > always mean "not", it sometimes means "start of string" and
 > probably other things. I would also have to check a reference to
 > know that "$" can mean "end of string" (and probably other things).

And you'll still have to do that when reading other people's REs.

 > > Are those groups capturing in Verbal Expressions?  The use of
 > > "find" (~ "search") rather than "match" is disconcerting to the
 > > experienced user.
 > 
 > You can alternately use the word "then". The source code is just
 > one python file. It's very easy to read. I actually like "then"
 > over "find" for the example:

You're missing the point.  The reader does not get to choose the
notation, the author does.  I do understand what several varieties of
RE mean, but the variations are of two kinds: basic versus extended
(ie, what tokens need to be escaped to be taken literally, which ones
have special meaning if escaped), and extensions (which can be
ignored).  Modern RE facilities are essentially all of the extended
variety.  Once you've learned that, you're in good shape for almost
any RE that should be written outside of an obfuscated code contest.

This is a fundamental principle of Python design: don't make readers
of code learn new things.  That includes using notation developed
elsewhere in many cases.

 > What does alternation look like?
 > 
 > .OR(option1).OR(option2).OR(option3)...
 >
 > How about alternation of
 > > non-trivial regular expressions?
 > 
 > .OR(other_verbal_expression)

Real examples, rather than pseudo code, would be nice.  I think you,
too, will find that examples of even fairly simple nested alternations
containing other constructs become quite hard to read, as they fall
off the bottom of the screen.

For example, the VE equivalent of

scheme = "(https?|ftp|file):"

would be (AFAICT):

scheme = VerEx().then(VerEx().then("http")
 .maybe("s")
 .OR("ftp")
 .OR("file"))
.then(":")

which is pretty hideous, I think.  And the colon is captured by a
group.  If perversely I wanted to extract that group from a match,
what would its index be?

I guess you could keep the linear arrangement with

scheme = (VerEx().add("(")
 .then("http")
 .maybe("s")
 .OR("ftp")
 .OR("file")
 .add(")")
 .then(":"))

but is that really an improvement over

scheme = VerEx().add("(https?|ftp|file):")

;-)

 > > As far as I can see, Verbal Expressions are basically a way of
 > > making it so painful to write regular expressions that people
 > > will restrict themselves to regular expressions
 > 
 > What's so painful to write about them?

One thing that's painful is that VEs "look like" context-free
grammars, but clumsy and without the powerful 

Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-30 Thread Abe Dillon
> a huge advantage of REs is that they are common to many
> languages. You can take a regex from grep to Perl to your editor to
> Python. They're not absolutely identical, of course, but the basics
> are all the same. Creating a new search language means everyone has to
> learn anew.
> ChrisA


1) I'm not suggesting we get rid of the re module (the VE implementation I
linked requires it)
2) You can easily output regex from verbal expressions
3) verbal expressions are implemented in many different languages too:
https://verbalexpressions.github.io/
4) It even has a generic interface that all implementations are meant to
follow:
https://github.com/VerbalExpressions/implementation/wiki/List-of-methods-to-implement

Note that the entire documentation is 250 words while just the syntax
portion of Python docs for the re module is over 3000 words.


> You think that example is more readable than the proposed transalation
> ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$
> which is better written
> ^https?://(www\.)?[^ ]*$
> or even
> ^https?://[^ ]*$


Yes. I find it *far* more readable. It's not a soup of symbols like Perl
code. I can only surmise that you're fluent in regex because it seems
difficult for you to see how the above could be less readable than English
words.

which makes it obvious that the regexp is not very useful from the
> word "^"?  (It matches only URLs which are the only thing, including
> whitespace, on the line, probably not what was intended.)


I could tell it only matches URLs that are the only thing inside the string
because it clearly says:
start_of_line() and end_of_line(). I would have had to refer to a reference
to know that "^" doesn't always mean "not", it sometimes means "start of
string" and probably other things. I would also have to check a reference
to know that "$" can mean "end of string" (and probably other things).

Are those groups capturing in Verbal Expressions?  The use of "find"
> (~ "search") rather than "match" is disconcerting to the experienced
> user.


You can alternately use the word "then". The source code is just one python
file. It's very easy to read. I actually like "then" over "find" for the
example:

verbal_expression.start_of_line()
.then('http')
.maybe('s')
.then('://')
.maybe('www.')
.anything_but(' ')
.end_of_line()

What does alternation look like?


.OR(option1).OR(option2).OR(option3)...

How about alternation of
> non-trivial regular expressions?


.OR(other_verbal_expression)

As far as I can see, Verbal Expressions are basically a way of making
> it so painful to write regular expressions that people will restrict
> themselves to regular expressions


What's so painful to write about them? Does your IDE not have
autocompletion? I find REs so painful to write that I usually just use
string methods if at all feasible.

I don't think that this failure to respect the
> developer's taste is restricted to this particular implementation,
> either.


I generally find it distasteful to write a pseudolanguage in strings inside
of other languages (this applies to SQL as well). Especially when the
design principals of that pseudolanguage are *diametrically opposed* to the
design principals of the host language. A key principal of Python's design
is: "you read code a lot more often than you write code, so emphasize
readability". Regex seems to be based on: "Do the most with the fewest
key-strokes. Readability be dammed!". It makes a lot more sense to wrap the
psudolanguage in constructs that bring it in-line with the host language
than to take on the mental burden of trying to comprehend two different
languages at the same time.

If you disagree, nothing's stopping you from continuing to write res the
old-fashion way. Can we at least agree that baking special re syntax
directly into the language is a bad idea?

On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan  wrote:

> On 28 March 2017 at 01:17, Simon D.  wrote:
> > It would ease the use of regexps in Python
>
> We don't really want to ease the use of regexps in Python - while
> they're an incredibly useful tool in a programmer's toolkit, they're
> so cryptic that they're almost inevitably a maintainability nightmare.
>
> Baking them directly into the language runtime also locks people in to
> a particular regex engine implementation, rather than being able to
> swap in a third party one if they choose to do so (as many folks
> currently do with the `regex` PyPI module).
>
> So it's appropriate to keep them as a string-based library level
> capability, and hence on a relatively level playing field with less
> comprehensive, but typically easier to maintain, options like string
> methods and third party text parsing libraries (such as
> https://pypi.python.org/pypi/parse for something close to the inverse
> of str.format)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> 

Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-29 Thread Nick Coghlan
On 28 March 2017 at 01:17, Simon D.  wrote:
> It would ease the use of regexps in Python

We don't really want to ease the use of regexps in Python - while
they're an incredibly useful tool in a programmer's toolkit, they're
so cryptic that they're almost inevitably a maintainability nightmare.

Baking them directly into the language runtime also locks people in to
a particular regex engine implementation, rather than being able to
swap in a third party one if they choose to do so (as many folks
currently do with the `regex` PyPI module).

So it's appropriate to keep them as a string-based library level
capability, and hence on a relatively level playing field with less
comprehensive, but typically easier to maintain, options like string
methods and third party text parsing libraries (such as
https://pypi.python.org/pypi/parse for something close to the inverse
of str.format)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-29 Thread Chris Angelico
On Thu, Mar 30, 2017 at 12:47 PM, Abe Dillon  wrote:
>> I feel like that borders on a bit too wordy...
>
>
> I think the use of words instead of symbols is one of the things that makes
> Python so readable. The ternary operator is done with words:
>
> value = option1 if condition else option2
>
> reads almost like English, while:
>
> value = condition ? option1: option2;
>
> Is just weird.
>
> I can read Verbal Expressions very quickly and understand exactly what's
> going on. If I have a decent IDE, I can write them almost as easily. I see
> no problem with wordiness if it means I don't have to stare at the code and
> scratch my head longer, or worse, open a reference to help me translate it
> (which is invariably the case when I look at regular expressions).

However, a huge advantage of REs is that they are common to many
languages. You can take a regex from grep to Perl to your editor to
Python. They're not absolutely identical, of course, but the basics
are all the same. Creating a new search language means everyone has to
learn anew.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-29 Thread Abe Dillon
>
> I feel like that borders on a bit too wordy...


I think the use of words instead of symbols is one of the things that makes
Python so readable. The ternary operator is done with words:

value = option1 if condition else option2

reads almost like English, while:

value = condition ? option1: option2;

Is just weird.

I can read Verbal Expressions very quickly and understand exactly what's
going on. If I have a decent IDE, I can write them almost as easily. I see
no problem with wordiness if it means I don't have to stare at the code and
scratch my head longer, or worse, open a reference to help me translate it
(which is invariably the case when I look at regular expressions).

On Wed, Mar 29, 2017 at 8:16 PM, Ryan Gonzalez  wrote:

> I feel like that borders on a bit too wordy...
>
> Personally, I'd like to see something like Felix's regular definitions:
>
>
> http://felix-lang.org/share/src/web/tut/regexp_01.fdoc#
> Regular_definitions._h
>
>
> --
> Ryan (ライアン)
> Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else
> http://refi64.com
>
> On Mar 29, 2017 3:30 PM, "Abe Dillon"  wrote:
>
> My 2 cents is that regular expressions are pretty un-pythonic because of
> their horrible readability. I would much rather see Python adopt something
> like Verbal Expressions ( https://github.com/VerbalExp
> ressions/PythonVerbalExpressions ) into the standard library than add
> special syntax support for normal REs.
>
> On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore  wrote:
>
>> On 28 March 2017 at 08:54, Simon D.  wrote:
>> > I believe that the u"" notation in Python 2.7 is defined by while
>> > importing the unicode_litterals module.
>>
>> That's not true. The u"..." syntax is part of the language. from
>> future import unicode_literals is something completely different.
>>
>> > Each regexp lib could provide its instanciation of regexp litteral
>> > notation.
>>
>> The Python language has no way of doing that - user (or library)
>> defined literals are not possible.
>>
>> > And if only the default one does, it would still be won for the
>> > beginers, and the majority of persons using the stdlib.
>>
>> How? You've yet to prove that having a regex literal form is an
>> improvement over re.compile(r'put your regex here'). You've asserted
>> it, but that's a matter of opinion. We'd need evidence of real-life
>> code that was clearly improved by the existence of your proposed
>> construct.
>>
>> Paul
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-29 Thread Ryan Gonzalez
I feel like that borders on a bit too wordy...

Personally, I'd like to see something like Felix's regular definitions:


http://felix-lang.org/share/src/web/tut/regexp_01.fdoc#Regular_definitions._h


--
Ryan (ライアン)
Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else
http://refi64.com

On Mar 29, 2017 3:30 PM, "Abe Dillon"  wrote:

My 2 cents is that regular expressions are pretty un-pythonic because of
their horrible readability. I would much rather see Python adopt something
like Verbal Expressions ( https://github.com/VerbalExpressions/
PythonVerbalExpressions ) into the standard library than add special syntax
support for normal REs.

On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore  wrote:

> On 28 March 2017 at 08:54, Simon D.  wrote:
> > I believe that the u"" notation in Python 2.7 is defined by while
> > importing the unicode_litterals module.
>
> That's not true. The u"..." syntax is part of the language. from
> future import unicode_literals is something completely different.
>
> > Each regexp lib could provide its instanciation of regexp litteral
> > notation.
>
> The Python language has no way of doing that - user (or library)
> defined literals are not possible.
>
> > And if only the default one does, it would still be won for the
> > beginers, and the majority of persons using the stdlib.
>
> How? You've yet to prove that having a regex literal form is an
> improvement over re.compile(r'put your regex here'). You've asserted
> it, but that's a matter of opinion. We'd need evidence of real-life
> code that was clearly improved by the existence of your proposed
> construct.
>
> Paul
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-29 Thread Markus Meskanen
On Mar 29, 2017 23:31, "Abe Dillon"  wrote:

My 2 cents is that regular expressions are pretty un-pythonic because of
their horrible readability. I would much rather see Python adopt something
like Verbal Expressions ( https://github.com/VerbalExpressions/
PythonVerbalExpressions ) into the standard library than add special syntax
support for normal REs.


I've never heard of this before, looks *awesome*. Thanks, if it's as good
as it sounds, I too would love something like this added to the standard
library.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-29 Thread Abe Dillon
My 2 cents is that regular expressions are pretty un-pythonic because of
their horrible readability. I would much rather see Python adopt something
like Verbal Expressions (
https://github.com/VerbalExpressions/PythonVerbalExpressions ) into the
standard library than add special syntax support for normal REs.

On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore  wrote:

> On 28 March 2017 at 08:54, Simon D.  wrote:
> > I believe that the u"" notation in Python 2.7 is defined by while
> > importing the unicode_litterals module.
>
> That's not true. The u"..." syntax is part of the language. from
> future import unicode_literals is something completely different.
>
> > Each regexp lib could provide its instanciation of regexp litteral
> > notation.
>
> The Python language has no way of doing that - user (or library)
> defined literals are not possible.
>
> > And if only the default one does, it would still be won for the
> > beginers, and the majority of persons using the stdlib.
>
> How? You've yet to prove that having a regex literal form is an
> improvement over re.compile(r'put your regex here'). You've asserted
> it, but that's a matter of opinion. We'd need evidence of real-life
> code that was clearly improved by the existence of your proposed
> construct.
>
> Paul
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-28 Thread Paul Moore
On 28 March 2017 at 08:54, Simon D.  wrote:
> I believe that the u"" notation in Python 2.7 is defined by while
> importing the unicode_litterals module.

That's not true. The u"..." syntax is part of the language. from
future import unicode_literals is something completely different.

> Each regexp lib could provide its instanciation of regexp litteral
> notation.

The Python language has no way of doing that - user (or library)
defined literals are not possible.

> And if only the default one does, it would still be won for the
> beginers, and the majority of persons using the stdlib.

How? You've yet to prove that having a regex literal form is an
improvement over re.compile(r'put your regex here'). You've asserted
it, but that's a matter of opinion. We'd need evidence of real-life
code that was clearly improved by the existence of your proposed
construct.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-28 Thread Simon D.
* Serhiy Storchaka  [2017-03-27 18:39:19 +0300]:
> There are several regular expression libraries for Python. One of them is
> included in the stdlib, but this is not the first regular expression library
> in the stdlib and may be not the last. Particular project can choose using
> an alternative regular expression library (because it has additional
> features or is faster for particular cases).
>

I believe that the u"" notation in Python 2.7 is defined by while
importing the unicode_litterals module.

Each regexp lib could provide its instanciation of regexp litteral
notation.

And if only the default one does, it would still be won for the
beginers, and the majority of persons using the stdlib.

--
Simon Descarpentries
+336 769 702 53
http://s.d12s.fr
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-27 Thread Markus Meskanen
On Tue, Mar 28, 2017 at 8:37 AM, Chris Angelico  wrote:
>
> Yes, but if the "in" operator is used, it would still work, because
> r"..." is a str, and "str" in "string" is meaningful.
>
> But I think a better solution will be for regex literals to be
> syntax-highlighted differently. If they're a truly-supported syntactic
> feature, they can be made visually different in your editor, making
> the distinction blatantly obvious.
>
> That said, though, I'm -1 on this. Currently, every prefix letter has
> its own meaning, and broadly speaking, combining them combines their
> meanings. An re"..." literal should be a raw "e-string", whatever that
> is, so I would expect that e"..." is the same kind of thing but with
> different backslash handling.
> 
>

Fair enough, I haven't followed this thread too closely and didn't consider
the "in" operator being used. Even then I find it unlikely that confusing
re'...' with r'...' and not noticing would turn out to be an issue.

That being said, I'm also -1 on this, especially now after your point on
"e-string". Adding these re-strings would straight out prevent e-string
from ever being implemented.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-27 Thread Markus Meskanen
On Mar 28, 2017 06:08, "Steven D'Aprano"  wrote:

On Mon, Mar 27, 2017 at 05:17:40PM +0200, Simon D. wrote:

> The regexp string litteral could be represented by : re""
>
> It would ease the use of regexps in Python, allowing to have some regexp
> litterals, like in Perl or JavaScript.
>
> We may end up with an integration like :
>
> >>> import re
> >>> if re".k" in 'ok':
> ... print "ok"
> ok

I dislike the suggested syntax re".k". It looks ugly and not different
enough from a raw string. I can easily see people accidentally writing:

if r".k" in 'ok':
...

and wondering why their regex isn't working.


While I agree with most of your arguments, surely you must be the one
joking here? "Ugly" is obviously a matter of opinion, I personally find the
proposed syntax more beautiful than the // used in many other languages.
But claiming it's bad because people would mix it up with raw strings and
people not realizing is nonsense. Not only does it look very different, but
attempting to call match() or any other regex method on it would surely
give out a reasonable error:

  AttributeError: 'str' object has no attribute 'match'

Which _in the worst case scenario_ results into googling where the top
rated StackOverflow question clearly explains the difference between r''
and re''
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-27 Thread Steven D'Aprano
On Mon, Mar 27, 2017 at 05:17:40PM +0200, Simon D. wrote:

> The regexp string litteral could be represented by : re""
> 
> It would ease the use of regexps in Python, allowing to have some regexp
> litterals, like in Perl or JavaScript.
> 
> We may end up with an integration like :
> 
> >>> import re
> >>> if re".k" in 'ok':
> ... print "ok"
> ok

I dislike the suggested syntax re".k". It looks ugly and not different 
enough from a raw string. I can easily see people accidentally writing:

if r".k" in 'ok':
...

and wondering why their regex isn't working.


Javascript uses /regex/ as a literal syntax for creating RegExp objects. 
That's the closest equivalent to the way Python would have to operate, 
although I don't think we can use the /.../ syntax without breaking the 
rule that Python's parser will not be more complex than LL(1). So I 
think /.../ is definitely out.

Perl 6 uses m/regex/ and a number of other variations:

https://docs.perl6.org/language/regexes


I doubt that this will actually be useful. It *seems* useful if you just 
write trivial regexes like your example, but without Perl's rich set of 
terse (cryptic?) operators, I don't know that literal regexes 
makes enough difference to be worth the trouble. There's not very 
much difference between (say) these:

mo = re.search(r'.k', mystring)
if mo:
print(mo.group())

mo = re.'.k'.search(mystring)
if mo:
print(mo.group())


You effectively save two parentheses, that's all. That doesn't seem like 
much of a win for introducing new syntax. Can you show some example code 
where a regex literal will have a worthwhile advantage?


> Regexps are part of the language in Perl, and the rather complicated
> integration of regexp in other languages, especially in Python, is
> something that comes up easily in language comparing discussion.

Surely you are joking?

Regex integration in Python is simple. Regular expression objects are 
ordinary objects, like lists and dicts and floats. The only difference 
is that you don't call the Regex object constructor directly, you either 
pass a string to a module level function

re.match(r'my regex', mystring)

or you create a regex object:

regex = re.compile(r'my regex')
regex.match(mystring)


That's very neat, Pythonic and simple. The regex itself is very close to 
the same syntax uses by Perl, Javascript or other variations, the only 
complication is that due to Python's escaping rules you should use a raw 
string r'' instead of doubling up all backslashes. I wouldn't call that 
"rather complicated" -- it is a lot less complicated than Perl:

- m// can be abbreviated //
- when do you use // directly and when do you use qr// ?
- s/// operator implicitly defines a regex

In Perl 6, I *think* they use rx// instead of qr//, or are they 
different things? Both m// and the s/// operator can use arbitrary 
delimiters, e.g. ! or , (but not : or parentheses) instead of the 
slashes, and m// regexes will implicitly match against $_ if you don't 
explicitly match against something else.

Compared to Perl, I don't think Python's regexes are complicated.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] What about regexp string litterals : re".*" ?

2017-03-27 Thread Serhiy Storchaka

On 27.03.17 18:17, Simon D. wrote:

After some french discussions about this idea, I subscribed here to
suggest adding a new string litteral, for regexp, inspired by other
types like : u"", r"", b"", br"", f""…

The regexp string litteral could be represented by : re""

It would ease the use of regexps in Python, allowing to have some regexp
litterals, like in Perl or JavaScript.


There are several regular expression libraries for Python. One of them 
is included in the stdlib, but this is not the first regular expression 
library in the stdlib and may be not the last. Particular project can 
choose using an alternative regular expression library (because it has 
additional features or is faster for particular cases).



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/