On 2/21/13, David A. Wheeler <dwhee...@dwheeler.com> wrote:
> I said:
>> > I have a *lot* of concerns with that particular construct.
> Alan Manuel Gloria:
>> Why?  Compare:
> ....
> It's not "must never happen", but I have a lot of concerns.  Here are ones
> that come to mind:
> 1. It really complicates explanation and implementation of "$".  Some people
> require a second explanation now, and "$" is really simple. Adding this
> capability to "$" makes it much more complicated to describe.  Every time we
> add a complication, we risk losing some potential users and implementers.

Fair point.  "$" in current semantics is already difficult to explain as-is.

> 2. I'm not sure that there's enough *value* to adding it.  There *ARE* use
> cases, and these use cases are definitely common enough to discuss doing
> something special with them.  But I worry that the contravening downsides
> will overwhelm it.  Currently, in certain cases we have to add "\\"-only
> lines; that's not really a hardship, especially since the resulting
> constructs are pretty easy to understand.


> 3. It can be viewed as complicating the reading of code that uses it.  Up to
> this point, a dedent always ended the whole line above; now it can end it a
> part.  Perhaps the reduction in line count is fair compensation; that's not
> clear to me.

I suppose the main reason is "it's too easy to abuse".  Beni's
formulation has a single use case so far, the aforementioned let, but
excess misuse of the Beni SUBLIST can make users suspicious of using

> 4. There's already a body of material on how to handle indentation-based
> languages, which tend to follow Python approaches and specifically do NOT
> differentiate between "indent 3 spaces" and "indent 1 space", just INDENT.
> We leave better-understood parsing theory if we do this.  I want to have it
> easily implemented, with many reasons to be *confident* it is
> well-designed... the more we leave established theory, the harder it is to
> do that.

Well, my formulation of Beni's formulation removes SUBLIST and SPLIT
(\\-inline) handling from the hands of the indentation parser and puts
it into the hands of the indentation preprocessor.  It could even
remove GROUP (\\-at-start) handling from the indentation parser and
keep it in the preprocessor, as long as the indentation parser can
handle two INDENT's in sequence.

> Let me speak to the last point.  If we *did* go this way (and I'm dubious
> right now), we need to make sure that this construct is clearly and
> unambiguously defined as part of some well-checked BNF grammar.  Turning
> every space into an INDENT, and reduced space into a DEDENT, seems to make
> this much worse.   I don't know of anyone who handles indent/dedent
> processing this way; people normally tokenize indentation to make parsing
> easier.  I want to stick to better-understood ground where we can, so we
> avoid any surprise disasters.
> So if we went this way, I suspect it would be better to model this by adding
> a new indentation token, DEDENT_PARTIAL, in addition to DEDENT.  A DEDENT
> undents back to the previous parent level; a DEDENT_PARTIAL undents back to
> something consistent with the parent and the current indent, but is
> (strictly) between them.  The indent parser would have to change to generate
> a DEDENT_PARTIAL, and the BNF would have to change to support
> DEDENT_PARTIAL.  That way, we at least continue to tokenize indentation
> changes.  I don't know if the BNF change would be easy or hard; if it's
> hard, I'm *really* disinclined.

I think a different approach is better.  The problem is that
DEDENT_PARTIAL cannot give information about *how many* ? exist on the
indent stack.

Instead, I think this calls for a more complicated indentation preprocessor:

1.  If you encounter a SUBLIST, emit an INDENT (or EOL-INDENT since
that seems to be your preferred formulation) and push ? on the indent
2.  If you encounter a GROUP/SPLIT that is inline (SPLIT meaning):
2.1.  If there is at least one ? on the indent stack top, pop off all
? until you reach a non-? item; emit a DEDENT for each ? popped.
2.2.  Otherwise, emit SAME (or just EOL, since that is how the current
BNF works).
3.  If you encounter an EOL, slurp the indentation, then:
3.1.  If the topmost non-? stack item is less than the indentation,
push the indentation on the stack and emit INDENT.
3.2.  If the topmost non-? stack item is equal to the indentation:
; comment: 3.2.1 and 3.2.2 are copies of 2.1 and 2.2, respectively
3.2.1.  If there is at least one ? on the indent stack top, pop off
all ? until you reach a non-? item; emit a DEDENT for each ? popped.
3.2.2.  Otherwise, emit SAME (or just EOL, since that is how the
current BNF works).
3.3.  Otherwise, the topmost non-? stack item is greater than the
indentation, so:
3.3.1.  Pop off stack items until the topmost non-? stack item is less
than or equal to the indentation; emit a DEDENT for each.
3.3.2.  If the topmost non-? stack item is equal to the indentation,
pop off all ? items on the stack top; emit a DEDENT for each.
3.3.3.  If the topmost stack item is ?, pop it off and push the
indentation (i.e. replace the stack top with the indentation)
3.3.4.  Otherwise, inconsistent indent, signal an error!

Basically, the formulation would remove all mention of GROUP_SPLIT and
SUBLIST (and all branches where they occur) but complicate the
indentation preprocessor.

The main problem is that this moves a chunk of code from the parser to
the indentation preprocessor, and ANTLR completely ignores the
indentation preprocessor (it just copies the indentation preprocessor
directly to the output).  There's no need for a DEDENT_PARTIAL, as the
indentation processor (which is the one that handles the indentation
stack) will emit correct pairs of INDENT and DEDENT properly.

The indentation preprocessor is not implemented in ANTLR, so it
*can't* be proven automatically by ANTLR (or whatever ANTLR can do to
the parser spec).

FWIW, the parser ends up being *nearer* to the standard
indentation-parsing lore, as GROUP_SPLIT and SUBLIST are innovations
currently used only in readable-discuss.

So, the problems with accepting this are:

1.  The new syntax is complicated to explain informally.
2.  It's easier to misuse.  You have to be a bit more careful of your
indentation after the line that you use SUBLIST on.
3.  It's not clear that the benefits are worth it - there seems little gain.
4.  It removes a bunch of code from the parser and places it into the
indentation preprocessor, whose code we cannot prove in ANTLR.

> Anyone want to try out trying to define this meaning in BNF using
> DEDENT_PARTIAL?  Seeing what it would mean, as a BNF, might make it much
> easier to understand its pros and cons.

Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
Readable-discuss mailing list

Reply via email to