Re: [Python-ideas] Built-in parsing library

2019-04-09 Thread Barry Scott
Nam,

I'm not so sure that a "universal parsing library" is possible for the stdlib.

I think one way you could find out what the requirements are is to refactor at 
least 2
of the existing stdlib modules that you have identified as needing a better 
parser.

Did you find that you could use the same parser code for both?
Would it apply to other modules?

Barry


> On 9 Apr 2019, at 17:06, Nam Nguyen  wrote:
> 
> On Mon, Apr 8, 2019 at 7:59 AM Christopher Barker  > wrote:
> 
> 
> On Mon, Apr 8, 2019 at 12:02 AM Paul Moore  > wrote:
> I would expect that the only reasonable way of getting a parsing
> library in the stdlib would be to propose an established one from PyPI
> to be moved into the stdlib
> 
> Absolutely -- unlike some proposals, a stand-alone parsing lib could very 
> easily be developed external to the stdlib. If one gains traction as an 
> obvious choice, then we can talk about bringing it in.
> 
> All options are still on the table. It is important to closely align the 
> solution to the goal of making itself available for *internal use* in the 
> stdlib itself. Having a parser library in the stdlib for *general use* is not 
> an explicit goal that I am aiming for, just as pgen2 was not intended that 
> way. Neither should that deter one from being considered.
> 
> Nam
>  
> 
> -CHB
> 
> 
> -- 
> Christopher Barker, PhD
> 
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-09 Thread Nam Nguyen
On Mon, Apr 8, 2019 at 7:59 AM Christopher Barker 
wrote:

>
>
> On Mon, Apr 8, 2019 at 12:02 AM Paul Moore  wrote:
>
>> I would expect that the only reasonable way of getting a parsing
>> library in the stdlib would be to propose an established one from PyPI
>> to be moved into the stdlib
>
>
> Absolutely -- unlike some proposals, a stand-alone parsing lib could very
> easily be developed external to the stdlib. If one gains traction as an
> obvious choice, then we can talk about bringing it in.
>

All options are still on the table. It is important to closely align the
solution to the goal of making itself available for *internal use* in the
stdlib itself. Having a parser library in the stdlib for *general use* is
not an explicit goal that I am aiming for, just as pgen2 was not intended
that way. Neither should that deter one from being considered.

Nam


>
> -CHB
>
>
> --
> Christopher Barker, PhD
>
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-08 Thread Ai mu
@DavidMertz

Each one of them takes a dramatically different approach to the defining a
grammar

they work more towards implementing well known standards like the BNF. well
internally they might work different to parse etc.

Abdur-Rahmaan Janhangeer
Mauritius
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-08 Thread Christopher Barker
On Mon, Apr 8, 2019 at 12:02 AM Paul Moore  wrote:

> I would expect that the only reasonable way of getting a parsing
> library in the stdlib would be to propose an established one from PyPI
> to be moved into the stdlib


Absolutely -- unlike some proposals, a stand-alone parsing lib could very
easily be developed external to the stdlib. If one gains traction as an
obvious choice, then we can talk about bringing it in.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-08 Thread Paul Moore
On Mon, 8 Apr 2019 at 02:54, Nam Nguyen  wrote:

> Back to my original goal, I've gathered that there is some interest in having 
> a more general parser library in the stdlib. "Some", but not "much". Should I 
> start out with a straw proposal so that we can hash it out further?

I would expect that the only reasonable way of getting a parsing
library in the stdlib would be to propose an established one from PyPI
to be moved into the stdlib - and that would require the active
support of the library author. I can't imagine any way that I'd
support a brand new parsing library getting put in the stdlib - the
area is sufficiently complex, and the external alternatives too
mature, to make having a new, relatively untried library in the stdlib
be a good idea.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-07 Thread Nam Nguyen
On Mon, Apr 1, 2019 at 3:13 PM Terry Reedy  wrote:

> On 4/1/2019 1:14 AM, Guido van Rossum wrote:
> > We do have a parser generator in the standard library:
> > https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2
>
> It is effectively undocumented and by inference discouraged from use.
>

I've tried it out over the weekend. The undocumented-ness is kinda annoying
but surmountable. What I found was this library is tightly coupled to the
Python language, both at the lexer and parser levels. For example, defining
a simple grammar like this would not work:

  genericurl: scheme '://'
  scheme: ...

The reason is '://' is not a known token type in Python language. That is a
real bummer.

Back to my original goal, I've gathered that there is some interest in
having a more general parser library in the stdlib. "Some", but not "much".
Should I start out with a straw proposal so that we can hash it out further?

Cheers,
Nam

The entry for lib2to3 in the 2to3 doc:
> https://docs.python.org/3/library/2to3.html#module-lib2to3
> "
> lib2to3 - 2to3’s library
> Source code: Lib/lib2to3/
> Note: The lib2to3 API should be considered unstable and may change
> drastically in the future.
>
> help(pgen) is not much more helpful.
> :
> Help on package lib2to3.pgen2 in lib2to3:
>
> NAME
>  lib2to3.pgen2 - The pgen2 package.
>
> PACKAGE CONTENTS
>  conv
>  driver
>  grammar
>  literals
>  parse
>  pgen
>  token
>  tokenize
>
> FILE
>  c:\programs\python38\lib\lib2to3\pgen2\__init__.py
>
>
>
> --
> Terry Jan Reedy
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-01 Thread Terry Reedy

On 4/1/2019 1:14 AM, Guido van Rossum wrote:

We do have a parser generator in the standard library:
https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2


It is effectively undocumented and by inference discouraged from use. 
The entry for lib2to3 in the 2to3 doc:

https://docs.python.org/3/library/2to3.html#module-lib2to3
"
lib2to3 - 2to3’s library
Source code: Lib/lib2to3/
Note: The lib2to3 API should be considered unstable and may change 
drastically in the future.


help(pgen) is not much more helpful.
:
Help on package lib2to3.pgen2 in lib2to3:

NAME
lib2to3.pgen2 - The pgen2 package.

PACKAGE CONTENTS
conv
driver
grammar
literals
parse
pgen
token
tokenize

FILE
c:\programs\python38\lib\lib2to3\pgen2\__init__.py



--
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-01 Thread Nam Nguyen
Sure! Same examples mentioned in Victor's
https://vstinner.github.io/tag/security.html could have been fixed by
having a more proper parser. This one that I helped author was also a
parsing issue.

https://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html

Thanks for the pointer to pgen2, Guido. I have only quickly skimmed through
it and thought it was really closely tied to the Python language. Maybe I'm
wrong, so I'll need some time to try it out on some of those previous
security fixes.

Cheers,
Nam

On Mon, Apr 1, 2019 at 12:17 PM Nathaniel Smith  wrote:

> On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen  wrote:
> > Installing a package out of stdlib does not solve the problem that
> motivated this thread. The libraries included in the stdlib can't use those
> parsers.
>
> Can you be more specific about exactly which code in the stdlib you
> think should be rewritten to use a parsing library?
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-01 Thread Nathaniel Smith
On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen  wrote:
> Installing a package out of stdlib does not solve the problem that motivated 
> this thread. The libraries included in the stdlib can't use those parsers.

Can you be more specific about exactly which code in the stdlib you
think should be rewritten to use a parsing library?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-01 Thread Stephen J. Turnbull
David Mertz writes:

 > OK, I'll acknowledge my comment might have overstated the bar to overcome.
 > A parser added to the standard library doesn't need to be perfect for
 > everyone.  But adding to stdlib *does* provide a kind of endorsement of the
 > right default way to go about things.

Indeed it does, but TOOWTDI is not absolute.

 > However, cross-cutting that formal power issue, there are two main
 > programming styles used by different libraries.

I concede this tends to raise the bar quite a bit.

 > Something in the standard library would have to be partisan in
 > selecting one particular approach as the "official" one.

Perhaps.  Even there, though, we have an example: XML.  We gotcher
SAX, we gotcher DOM, we gotcher ElementTree, we gotcher expat.

I think XML processing is probably a *lot* more used and in a lot more
modes than general parsing.  But the analogy is valid, even though I
can't say it's powerful *enough*.

There definitely is a bar to clear.  I don't know if it's worth Nam's
effort to try to clear it -- there's no guarantee of success on
something like this.  I just think we shouldn't be *too* discouraging.
And I personally think parsing formal languages is an important enough
field to deserve consideration for stdlib inclusion, even if it's not
going to be used every day.

Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-01 Thread Stephen J. Turnbull
David Mertz writes:

 > While I can imagine proposing one for inclusion in the standard
 > library, you'd have to choose one (or write a new one) and explain
 > why that one is better for everyone (or at least a better starting
 > point) than all the others are.

In principle, no, one just needs to explain why this battery fits most
of the toys encountered in practice.  That's good enough, and if
during discussion somebody shows another one is better on a lot of
fronts, sure, do that instead.  We should avoid letting the perfect be
the enemy of the good (as people keep insisting about str.cutsuffix).

Politically, sure, it's almost 100% certain that somebody will object
that there's a whole class of cases handled by the PackMule parser
that the ShavedYacc parser doesn't handle, and somebody else will
point out the opposite, so neither is acceptable.  Ignore them,
they're both wrong about "acceptable". ;-)

 > You're also have to explain why it needs to be in the standard
 > library rather than installed by 'pip install someparser'.

Again, the bar isn't so high as "needs".  There's a balance of
equities, such as people with Python installations restricted by QA or
security vetting, applications where you really don't want to spend
most of your hour allocated to teaching the feature downloading
requirements, and cases where pretty much everybody performs the task
frequently (for some value of frequently), vs. costs of maintenance
(we generally require that a core developer vouch for someone who
volunteers to take responsibility for it for 3-5 years) and effects on
complexity of learning Python (usually not great for such a module,
since the excess burden on documentation ends up being one line in the
TOC and a half-dozen in the index).

Yes, Nam should be prepared for pushback on both grounds.  Most
pressingly, without a specific package being proposed, discussion will
just go in circles indefinitely.  But a parser generator package is
something that's been lurking, waiting for an enthusiastic proponent
for a long time.  There's a lot of low-level support for it.  Maybe it
just needs a specific proposal to take off.  And maybe it won't.  He
won't know unless he tries.

Steve

P.S. Guido mentioned lib2to3.pgen2, which is in the stdlib.  But
help(pgen2) isn't very helpful, so there's at least some documentation
work to be done there.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-03-31 Thread James Lu
Stack-based LL(1) push down automata can be implemented by hand, indeed isn’t 
that that a textmateLanguage file is? There’s also the option of using Iro to 
generate a tmLanguage.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-03-31 Thread Guido van Rossum
We do have a parser generator in the standard library:
https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2

On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen  wrote:

> On Sun, Mar 31, 2019 at 12:13 PM David Mertz  wrote:
>
>> I just found this nice summary. It's not complete, but it looks well
>> written. https://tomassetti.me/parsing-in-python/
>>
>> On Sun, Mar 31, 2019, 3:09 PM David Mertz  wrote:
>>
>>> There are about a half dozen widely used parsing libraries for Python.
>>> Each one of them takes a dramatically different approach to the defining a
>>> grammar. Each one has been debugged for over a decade.
>>>
>>> While I can imagine proposing one for inclusion in the standard library,
>>> you'd have to choose one (or write a new one) and explain why that one is
>>> better for everyone (or at least a better starting point) than all the
>>> others are.
>>>
>>
> I'm not at that stage, yet. By the way, it still is not clear to me if you
> think having one in the stdlib is desirable.
>
>
>> You're also have to explain why it needs to be in the standard library
>>> rather than installed by 'pip install someparser'.
>>>
>>
> Installing a package out of stdlib does not solve the problem that
> motivated this thread. The libraries included in the stdlib can't use those
> parsers.
>
> Cheers,
> Nam
>
>
>>
>>> On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen  wrote:
>>>
 Hello list,

 What do you think of a universal parsing library in the stdlib mainly
 for use by other libraries in the stdlib?

 Through out the years we have had many issues with protocol parsing.
 Some have even introduced security bugs. The main cause of these issues is
 the use of simple regular expressions.

 Having a universal parsing library in the stdlib would help cut down
 these issues. Such a library should be minimal yet encompassing, and whole
 parse trees should be entirely expressible in code. I am thinking of
 combinatoric parsing as the main candidate that fits this bill.

 What do you say?

 Thanks!
 Nam
 ___
 Python-ideas mailing list
 Python-ideas@python.org
 https://mail.python.org/mailman/listinfo/python-ideas
 Code of Conduct: http://python.org/psf/codeofconduct/

>>> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-03-31 Thread Nam Nguyen
On Sun, Mar 31, 2019 at 12:13 PM David Mertz  wrote:

> I just found this nice summary. It's not complete, but it looks well
> written. https://tomassetti.me/parsing-in-python/
>
> On Sun, Mar 31, 2019, 3:09 PM David Mertz  wrote:
>
>> There are about a half dozen widely used parsing libraries for Python.
>> Each one of them takes a dramatically different approach to the defining a
>> grammar. Each one has been debugged for over a decade.
>>
>> While I can imagine proposing one for inclusion in the standard library,
>> you'd have to choose one (or write a new one) and explain why that one is
>> better for everyone (or at least a better starting point) than all the
>> others are.
>>
>
I'm not at that stage, yet. By the way, it still is not clear to me if you
think having one in the stdlib is desirable.


> You're also have to explain why it needs to be in the standard library
>> rather than installed by 'pip install someparser'.
>>
>
Installing a package out of stdlib does not solve the problem that
motivated this thread. The libraries included in the stdlib can't use those
parsers.

Cheers,
Nam


>
>> On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen  wrote:
>>
>>> Hello list,
>>>
>>> What do you think of a universal parsing library in the stdlib mainly
>>> for use by other libraries in the stdlib?
>>>
>>> Through out the years we have had many issues with protocol parsing.
>>> Some have even introduced security bugs. The main cause of these issues is
>>> the use of simple regular expressions.
>>>
>>> Having a universal parsing library in the stdlib would help cut down
>>> these issues. Such a library should be minimal yet encompassing, and whole
>>> parse trees should be entirely expressible in code. I am thinking of
>>> combinatoric parsing as the main candidate that fits this bill.
>>>
>>> What do you say?
>>>
>>> Thanks!
>>> Nam
>>> ___
>>> Python-ideas mailing list
>>> Python-ideas@python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-03-31 Thread David Mertz
I just found this nice summary. It's not complete, but it looks well
written. https://tomassetti.me/parsing-in-python/

On Sun, Mar 31, 2019, 3:09 PM David Mertz  wrote:

> There are about a half dozen widely used parsing libraries for Python.
> Each one of them takes a dramatically different approach to the defining a
> grammar. Each one has been debugged for over a decade.
>
> While I can imagine proposing one for inclusion in the standard library,
> you'd have to choose one (or write a new one) and explain why that one is
> better for everyone (or at least a better starting point) than all the
> others are. You're also have to explain why it needs to be in the standard
> library rather than installed by 'pip install someparser'.
>
> On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen  wrote:
>
>> Hello list,
>>
>> What do you think of a universal parsing library in the stdlib mainly for
>> use by other libraries in the stdlib?
>>
>> Through out the years we have had many issues with protocol parsing. Some
>> have even introduced security bugs. The main cause of these issues is the
>> use of simple regular expressions.
>>
>> Having a universal parsing library in the stdlib would help cut down
>> these issues. Such a library should be minimal yet encompassing, and whole
>> parse trees should be entirely expressible in code. I am thinking of
>> combinatoric parsing as the main candidate that fits this bill.
>>
>> What do you say?
>>
>> Thanks!
>> Nam
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-03-31 Thread Nick Timkovich
What does it mean to be a universal parser? In my mind, to be universal you
should be able to parse anything, so you'd need something as versatile as
any Turing language, so one could stick with the one we already have
(Python). I'm vaguely aware of levels of grammar (regular, context-free?,
etc.), and how things like XML can't/shouldn't be parsed with regex [1].
Most protocols probably aren't *completely* free to do whatever and
probably fit into some level of the hierarchy, what level would this
putative parser perform at?

Doing something like this from-scratch is a very tall order, are there
candidate libraries that you'd want to see included in the stdlib? There is
an argument for trying to "promote" a library that would security into the
standard library over others that would just add features: trying to make
the "one obvious way to do it" also the safe way. However, all things
equal, more used libraries tend to be more secure. I think suggestions of
this form need to pose a library that a) exists, b) is well used and
regarded, c) stable (once in the the stdlib things are hard to change), and
d) has maintainers that are amenable to inclusion.

Nick

[1]: https://stackoverflow.com/a/1732454/194586

On Sat, Mar 30, 2019 at 12:57 PM Nam Nguyen  wrote:

> Hello list,
>
> What do you think of a universal parsing library in the stdlib mainly for
> use by other libraries in the stdlib?
>
> Through out the years we have had many issues with protocol parsing. Some
> have even introduced security bugs. The main cause of these issues is the
> use of simple regular expressions.
>
> Having a universal parsing library in the stdlib would help cut down these
> issues. Such a library should be minimal yet encompassing, and whole parse
> trees should be entirely expressible in code. I am thinking of combinatoric
> parsing as the main candidate that fits this bill.
>
> What do you say?
>
> Thanks!
> Nam
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Built-in parsing library

2019-03-30 Thread Nam Nguyen
Hello list,

What do you think of a universal parsing library in the stdlib mainly for
use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some
have even introduced security bugs. The main cause of these issues is the
use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these
issues. Such a library should be minimal yet encompassing, and whole parse
trees should be entirely expressible in code. I am thinking of combinatoric
parsing as the main candidate that fits this bill.

What do you say?

Thanks!
Nam
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/