RE: [ruby.parslet] Re: issue #64 discussion (error trees)

Jonathan Rochkind Mon, 16 Apr 2012 19:11:58 -0700

I've struggled with figuring out how to provide useful parse error messages to 
users too.

If there are hints for how to do that with present parslet, please do share 
examples. 

If Parslet can be modified to make this easier, I'm all for it.  If you have to 
choose between error messages useful for grammar developers and error messages 
useful for end-users entering strings to be parsed -- I'd definitely choose in 
favor of the end-users. 

The developer has other tools available to him or her, like a debugger, and 
unit tests,  and the ability to change the code and see what happens, and the 
ability to send sub-strings through specific rules like Parlset makes so easy.  

The end-user has nothing but the error messages we can provide them. 

Parlset is super easy to debug grammars even without the error trees -- it's 
pure ruby nature and ability to take an individual rule and parse a sub-string 
with it make it easier than anything else I've used. I've barely ever needed to 
use the 'error tree' to debug my grammars when writing em. 

But I've needed to figure out a way to provide good "can't parse" errors to 
end-users, and not been able to figure one out. 
________________________________________
From: [email protected] [[email protected]] on behalf of John 
Mettraux [[email protected]]
Sent: Monday, April 16, 2012 6:17 PM
To: [email protected]
Subject: Re: [ruby.parslet] Re: issue #64 discussion (error trees)

On Mon, Apr 16, 2012 at 07:14:19PM +0200, Kaspar Schiess wrote:
>
> Just reviewing your latest set of changes on the arboriculture branch of
> your parslet repository. What you basically propose is a different
> approach to what errors matter and how they should be displayed. Let me
> try to explain to you your own approach and see if I get your idea
> straight. Only then will I try to do a critique of it and then maybe we
> can find convergence.

Hello Kaspar,

great.

> You use the concept of deepest error, which is the error that happened
> at the parse position that was most advanced in the source file. The
> changes you propose would completely remove the stack-trace like
> error-trees and replace them with just one error message that is
> associated with that deepest parse. It would also be mostly associated
> with the most concrete parse at that position: the error would not say
> that a high level rule failed, but that a rule like match() or str() failed.
>
> If the above doesn't capture your idea, consider what is below
> irrelevant to discussion. Feel free to set me right.

You're right. My motivation is coming from "end users" complaining about
error messages pointing at the haystack rather than at the needle. I was
already offering some pinpointing by reading the error_tree, but it fell
short because the error_tree felt truncated (hence my opening of
https://github.com/kschiess/parslet/issues/64 ).

> This approach seems to work well with the grammar you use. Have you
> thought about how this generalizes? It seems really easy to construct a
> pathological grammar where the deepest error carries no meaning to the
> user of your language.

Granted, my vision is certainly limited to the walls of my "grammar cubicle"

I have a couple other grammars I use elsewhere but they're not as fat as the
one with which I'm striving now.

> How does the grammar writer know how the deepest error relates to the
> grammar? What should I fiddle with if I know the input is correct but
> the grammar is not? It seems that we have two set of needs here. As a
> grammar writer I want to know how my grammar failed to parse X; as a
> writer of X I might indeed just want to know about one position to
> twiddle. The error tree anchors the errors back into the structure of
> the grammar; but it leaves the problem of what to display to the user
> (writer of X) completely unsolved. I know I've gone half way only and
> solved my own problem there. Finally somebody notices.

When developing the grammar, I used the error_tree a lot. Now that I'm
handing grammar, parser and transformer to the user, I need a helpful error,
the users aren't as patient as I am.

The parser I work with most of my time is the Ruby one. Most of the time it's
providing me with a decent error message pointing at my mistake, the rest of
the time

> Another concern I've been having (that you probably didn't think of
> here) is the time parslet is spending in the management of all those
> error objects. Even with efficient GC, constructing all those objects
> takes a lot of time when we probably don't need half of them. Your
> approach doesn't address the problem, it just filters what to keep
> differently.

Exactly.

"half of them": from what my exploration taught me, we could keep the error
with the deepest pos and discard errors (ie not instantiate them) with
smaller pos. Granted, some combinations of grammar and source could yield
"instantiate 90% of the errors" and the win would be worthless.

> I am thinking: could we do a first parse for getting just results, and
> once that fails, do a second parse that constructs error information
> using a kind of aggregator? Aggregation could then implement either of
> our ideas about how errors should look like... We might be winning on
> more than one front at once. How does that sound?

I like the idea a lot, sounds right, keep the happy path lean.

> We'd finally be comparing different kinds of apples when benchmarking
> against Treetop, at least...
>
> I will now try to hack your grammar to produce better error messages,
> without changing parslet. Just because I think this might be doable ;)
> I'll report back.

Looking forward to the results.

Please take some time to look at individual commits in my arboriculture fork,
it's not all misguided adventure ;-)

Thanks a ton!

John

RE: [ruby.parslet] Re: issue #64 discussion (error trees)

Reply via email to