On Sun, Mar 31, 2019 at 11:00 AM Nick Timkovich <prometheus...@gmail.com> wrote:
> What does it mean to be a universal parser? In my mind, to be universal > you should be able to parse anything, so you'd need something as versatile > as any Turing language, > I'm not aware of, nor looking for, such Turing-complete parsers. Parsing algorithms such as Earley's, Generalized LL/LR, parser combinators, often are universal in the sense that they can work with all context-free grammars. I do not know if they are Turing complete. so one could stick with the one we already have (Python). > One of the reasons why the parser should be "coded" in and not declared (e.g. in the sense of eBNF). Combinatoric parsers are usually glued together with functions which can act based on the current parse tree. > I'm vaguely aware of levels of grammar (regular, context-free?, etc.), and > how things like XML can't/shouldn't be parsed with regex [1]. Most > protocols probably aren't *completely* free to do whatever and probably > fit into some level of the hierarchy, what level would this putative parser > perform at? > I'd say any context-free grammars should be supported. But given the immediate use case (to help with other libraries in the stdlib), this could start small (but complete and correct). I am talking about simple parsing needs such as email validation, HTTP cookie format, URL parsing, well-known date formats. In fact, I would expect this parsing library to only offer primitives like parse any character, parse a character matching a predicate, parse a string, etc. > > Doing something like this from-scratch is a very tall order, are there > candidate libraries that you'd want to see included in the stdlib? There is > an argument for trying to "promote" a library that would security into the > standard library over others that would just add features: trying to make > the "one obvious way to do it" also the safe way. However, all things > equal, more used libraries tend to be more secure. I think suggestions of > this form need to pose a library that a) exists, b) is well used and > regarded, c) stable (once in the the stdlib things are hard to change), and > d) has maintainers that are amenable to inclusion. > This email wasn't to promote or consider any library in particular. I'm more interested in finding out which way the consensus is with respect to the need. Implementation-wise, I'm thinking of this paper ~25 years ago and a very bare-bone pyparsing. http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf Cheers, Nam > > Nick > > [1]: https://stackoverflow.com/a/1732454/194586 > > On Sat, Mar 30, 2019 at 12:57 PM Nam Nguyen <bits...@gmail.com> wrote: > >> Hello list, >> >> What do you think of a universal parsing library in the stdlib mainly for >> use by other libraries in the stdlib? >> >> Through out the years we have had many issues with protocol parsing. Some >> have even introduced security bugs. The main cause of these issues is the >> use of simple regular expressions. >> >> Having a universal parsing library in the stdlib would help cut down >> these issues. Such a library should be minimal yet encompassing, and whole >> parse trees should be entirely expressible in code. I am thinking of >> combinatoric parsing as the main candidate that fits this bill. >> >> What do you say? >> >> Thanks! >> Nam >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/