Bob Foster wrote: > Yes, PEG is a scannerless parsing method. But I don't see how that leads to > adding every character to the AST. That would seem to make the program that > parses the AST reparse the original input.
You don't add every character to the AST, just the ones you choose. I'm certainly open to finding a better way, though. One thing I could do is introduce a rule type that concatenates a sequence of matched characters. It would save me calling (list->string (ast-c ast)) when I have rules like these: Id <- +[a-zA-Z] Ws Num <- +[0-9] Ws Allowing individual characters in the AST is nice because, I'm able to write things like this: Sum <- Prod *([+-] Ws Prod) Prod <- Num *([*/] Ws Num) No information is lost about the operator type but, I don't have to introduce a new rule, either. Here's an example from the JSON grammar: Number <- ?'-' ('0' | [1-9] *[0-9]) ?('.' +[0-9]) ?([eE] ?[+-] +[0-9]) As it's written, the code that processes Number is going to have to do a little bit of work. Is it negative? Is it an integer? Is there an exponent? It's not hard to do but, you could call that parsing since it does treat things as different 'parts'. The work is the same whether Number contains a string or an explicit list of characters. It could be rewritten so that more information is given in the structure. Here's a rewrite that makes use of allowing individual characters in the AST. Number <- ?'-' (Float | Integer) ?Exponent Float <- Integer :'.' +[0-9] Integer <- '0' | [1-9] *[0-9] Exponent <- :[eE] ?[+-] +[0-9] >> I'll gladly be wrong on this but, when only memoizing non-terminals, >> doesn't the following grammar give you O(n^2)? >> >> e.g. >> >> This grammar: >> >> S <- +A >> >> A <- *'a' 'b' | 'a' >> >> Parsing this input: >> >> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > > Yes, it does. To get linear complexity, you need to memoize terminals as > well. > > I'd be interested to know, though, as a matter of practice how much your > strategy of memoizing only non-terminals really costs. I'm not sure what it costs because I haven't done any experiments or benchmarks. I didn't notice much change when I recently added memoization to the automata based design, though, the files I'm parsing aren't that big. My guess is that, for most grammars, it's enough to just memoize the non-terminals. Or, even just some of them. Orlando. _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg