I've been putting a fair amount of thought into this. Here's what I've come up with:
Perl 6 has several instances where whitespace is required or forbidden in order to better facilitate "Do What I Mean" programming: for instance, by having the presence or absence of whitespace before curly braces affect their meaning, you're allowed to omit the parentheses around the parameters of the various control structures: e.g., 'if $x { ... }' is now valid, whereas in Perl 5 you would have had to say 'if ($x) { ... }'. Likewise, the same technique lets you provide an unambiguous distinction between an infix operator and a postfix operator (IIRC). So it isn't much of a stretch to require the use of whitespace in order to distinguish between a standard "line comment" and an embedded comment. Except that that isn't what Perl 6 is doing. All that it does is to say "there's this one case where there's some ambiguity between line comments and embedded comments; it's up to the programmer to remove the ambiguity, through whatever means he sees fit." In many ways, this is the opposite of the above cases, and is more akin to how role composition must be explicitly disambiguated by the programmer, instead of having the compiler take a best guess. I must admit: as nice as it is to be able to create an embedded comment by wrapping the actual comment in brackets, the existence of that one point of ambiguity is troubling. -- What I like about the current embedded comments: 1. They're short. You need a single leading character (the '#'), followed by content wrapped in as little as a pair of bracketing characters. That's three characters in addition to the content itself. 2. They're flexible. By going with user-specified bracketing characters, the programmer can choose an intuitive "closing sequence" that won't conflict with content that he's commenting out. Of the two features, the second one is the more important one. Likewise, the central most important feature concerning line comments is that you can initiate them using a single character, allowing you to reliably comment out a set of lines through a straightforward - and short - sequence of keystrokes. The problem arises from the fact that embedded comments start with the same character that line comments start with. This means that the second character is the one that gets used to distinguish between line comments and embedded comments, which at best interferes with the main benefit of line comments described above, and at worst leads to an extra round of debugging as the programmer is forced to go through and add whitespace (or other characters) to disambiguate the two. -- The solution, then, would be to change embedded comments so that they're initiated by something that doesn't begin with a '#'. Ideally, you'd still use a single character (plus the brackets) to define an embedded comment. Unfortunately, I'm pretty sure that we've got a severe shortage of such characters available for use. Assume, then, that a two-character initializer is going to be needed, and that the first character cannot be a '#'. -- I'm leery about making the first character be a '=', as there's the potential for conflict with POD sections. IIRC, there's a (currently unspoken?) design goal involving POD sections that says that any line beginning with a '=' ought to be able to be stripped out of the program without affecting the code. Those with more familiarity with POD design philosophy can please speak up? OTOH, it might be possible that '=#[ ... ]' could be used to add versatility to the POD syntax. Consider the possibility of saying that '=#[blah blah blah]' at the start of a line is equivalent to '=blah blah blah', except that the POD header ends at the ']' instead of the first '\n'. This could be used to wrap a POD header over several lines, and/or to put the first line of POD content on the same line as a short POD header. So: =#[for comment <params>] text foo; would be equivalent to: =for comment <params> text foo; ...or not; this could lead to the same sort of trouble that we currently have with line comments vs. embedded comments. If we were to go this route, I'd be inclined to say that '=#[ ... ]' isn't just an embedded comment; it's an "embedded POD header". This removes all ambiguity regarding what it is, at the expense of forcing the POD Parser to look at more than just the first character of each line to determine whether or not it's meaningful. The expense may be too great. At the very least, it opens up a whole new can of worms. -- OK; so let's assume a two-character sequence prior to the opening bracket of an embedded comment, with the first character being something other than '#' or '='. It's perfectly acceptable (and, IMHO, desirable) for the second character to be a '#'. How about '~#', meaning something along the lines of "string-like comment"? The idea is that the syntax that follows this would conform closely to that of string literals (i.e., quotes). We might even consider loosening the restrictions on delimiter characters, allowing the full versatility of quoting delimiters, since there'd no longer be any danger of confusing this with a line comment. So: ~#[ comment ] ~#( comment ) ~#{ comment } ~#< comment > ~#/ comment / ~#" comment " ...and so on. Again, my preference would be for a single character instead of '~#'; but I could live with the latter. -- Jonathan "Dataweaver" Lang