alister <alister.nospam.w...@ntlworld.com> writes: > On Tue, 13 Jan 2015 04:36:38 +0000, Steven D'Aprano wrote: > > > On Mon, 12 Jan 2015 19:48:18 +0000, Ian wrote: > > > >> My recommendation would be to write a recursive decent parser for > >> your files. > >> > >> That way will be easier to write, > > > > I know that writing parsers is a solved problem in computer > > science, and that doing so is allegedly one of the more trivial > > things computer scientists are supposed to be able to do, but the > > learning curve to write parsers is if anything even higher than > > the learning curve to write a regex. > > > > I wish that Python made it as easy to use EBNF to write a parser as it > > makes to use a regex :-( > > > > http://en.wikipedia.org/wiki/Extended_Backus–Naur_Form > > I would not say that writing parsers is a solved problem. there may > be solutions for a number of specific cases but many cases still > cause difficulty, as an example I do not think there is a 100% > complete parser for English (even native English speakers don't > always get it)
There is no complete characterization of English as a set of character strings, nor will there ever be. Linguists have a slogan for this: All Grammars Leak. (They used to write formal grammars to characterize "all and only the well-formed sentences" of a language, or to capture "necessary and sufficient conditions", and those grammars turned out to both "over-generate" and "under-generate".) Ambiguity doesn't help. In practice, it's not enough to find a parse. One wants a contextually appropriate parse. Sometimes this requires genuine understanding and knowledge. Also in practice, one may not be in the business of rejecting ill-formed sentences: one wants to make partial sense of even those. So, no, never 100 percent complete or 100 percent correct :) The solved problem is the unambiguous parsing of formal languages that are defined by a formal grammar to begin with, like the configuration file format at hand. -- https://mail.python.org/mailman/listinfo/python-list