Re: Multiline comments in Perl6

Jonathan Lang Wed, 02 Jan 2008 22:17:47 -0800

I've been putting a fair amount of thought into this.  Here's what
I've come up with:


Perl 6 has several instances where whitespace is required or forbidden
in order to better facilitate "Do What I Mean" programming: for
instance, by having the presence or absence of whitespace before curly
braces affect their meaning, you're allowed to omit the parentheses
around the parameters of the various control structures: e.g., 'if $x
{ ... }' is now valid, whereas in Perl 5 you would have had to say 'if
($x) { ... }'.  Likewise, the same technique lets you provide an
unambiguous distinction between an infix operator and a postfix
operator (IIRC).  So it isn't much of a stretch to require the use of
whitespace in order to distinguish between a standard "line comment"
and an embedded comment.

Except that that isn't what Perl 6 is doing.  All that it does is to
say "there's this one case where there's some ambiguity between line
comments and embedded comments; it's up to the programmer to remove
the ambiguity, through whatever means he sees fit."  In many ways,
this is the opposite of the above cases, and is more akin to how role
composition must be explicitly disambiguated by the programmer,
instead of having the compiler take a best guess.

I must admit: as nice as it is to be able to create an embedded
comment by wrapping the actual comment in brackets, the existence of
that one point of ambiguity is troubling.

--

What I like about the current embedded comments:

1. They're short.  You need a single leading character (the '#'),
followed by content wrapped in as little as a pair of bracketing
characters.  That's three characters in addition to the content
itself.

2. They're flexible.  By going with user-specified bracketing
characters, the programmer can choose an intuitive "closing sequence"
that won't conflict with content that he's commenting out.

Of the two features, the second one is the more important one.

Likewise, the central most important feature concerning line comments
is that you can initiate them using a single character, allowing you
to reliably comment out a set of lines through a straightforward - and
short - sequence of keystrokes.

The problem arises from the fact that embedded comments start with the
same character that line comments start with.  This means that the
second character is the one that gets used to distinguish between line
comments and embedded comments, which at best interferes with the main
benefit of line comments described above, and at worst leads to an
extra round of debugging as the programmer is forced to go through and
add whitespace (or other characters) to disambiguate the two.

--

The solution, then, would be to change embedded comments so that
they're initiated by something that doesn't begin with a '#'.

Ideally, you'd still use a single character (plus the brackets) to
define an embedded comment.  Unfortunately, I'm pretty sure that we've
got a severe shortage of such characters available for use.  Assume,
then, that a two-character initializer is going to be needed, and that
the first character cannot be a '#'.

--

I'm leery about making the first character be a '=', as there's the
potential for conflict with POD sections.  IIRC, there's a (currently
unspoken?) design goal involving POD sections that says that any line
beginning with a '=' ought to be able to be stripped out of the
program without affecting the code.  Those with more familiarity with
POD design philosophy can please speak up?

OTOH, it might be possible that '=#[ ... ]' could be used to add
versatility to the POD syntax.  Consider the possibility of saying
that '=#[blah blah blah]' at the start of a line is equivalent to
'=blah blah blah', except that the POD header ends at the ']' instead
of the first '\n'.  This could be used to wrap a POD header over
several lines, and/or to put the first line of POD content on the same
line as a short POD header.  So:

  =#[for comment
  <params>] text

  foo;

would be equivalent to:

  =for comment <params>
  text

  foo;

...or not; this could lead to the same sort of trouble that we
currently have with line comments vs. embedded comments.  If we were
to go this route, I'd be inclined to say that '=#[ ... ]' isn't just
an embedded comment; it's an "embedded POD header".  This removes all
ambiguity regarding what it is, at the expense of forcing the POD
Parser to look at more than just the first character of each line to
determine whether or not it's meaningful.  The expense may be too
great.  At the very least, it opens up a whole new can of worms.

--

OK; so let's assume a two-character sequence prior to the opening
bracket of an embedded comment, with the first character being
something other than '#' or '='.  It's perfectly acceptable (and,
IMHO, desirable) for the second character to be a '#'.

How about '~#', meaning something along the lines of "string-like
comment"?  The idea is that the syntax that follows this would conform
closely to that of string literals (i.e., quotes).  We might even
consider loosening the restrictions on delimiter characters, allowing
the full versatility of quoting delimiters, since there'd no longer be
any danger of confusing this with a line comment.  So:

  ~#[ comment ]
  ~#( comment )
  ~#{ comment }
  ~#< comment >
  ~#/ comment /
  ~#" comment "

...and so on.

Again, my preference would be for a single character instead of '~#';
but I could live with the latter.

-- 
Jonathan "Dataweaver" Lang

Re: Multiline comments in Perl6

Reply via email to