> As far as mixed whitespace, I think the only sane thing to do would be to 
> only allow leading tabs *or* spaces.  Mixing tabs and spaces in the leading 
> whitespace would be a syntax error.  All lines in the string would need to 
> use tabs or all lines use spaces, you could not have one line with tabs and 
> another with spaces.  This would keep the compiler out of the business of 
> making any assumptions or guesses, would not be a problem often, and would be 
> very easy to fix if it ever happens accidentally.

The sane thing to do would be to require every line be prefixed with *exactly* 
the same sequence of characters as the closing delimiter line. Anything else 
(except perhaps a completely blank line, to permit whitespace trimming) would 
be a syntax error.

But take a moment to consider the downsides before you leap to adopt this 
solution.

1. You have introduced tab-space confusion into the equation.

2. You have introduced trailing-newline confusion into the equation.

3. The #escaped and #marginStripped keywords are now specific to multiline 
strings; #escaped in particular will be attractive there for tasks like 
regexes. You will have to invent a different syntax for it there.

4. This form of `"""` is not useful for not having to escape `"` in a 
single-line string; you now have to invent a separate mechanism for that.

5. You can't necessarily look at a line and tell whether it's code or string. 
And—especially with the #escaped-style constructs—the delimiters don't 
necessarily "pop" visually; they're too small and easy to miss compared to the 
text they contain. In extremis, you actually have to look at the entire file 
from top to bottom, counting the `"""`s to figure out whether you're in a 
string or not. Granted, you *usually* can tell from context, but it's a far cry 
from what continuation quotes offer.

6. You are now forcing *any* string literal of more than one line to include 
two extra lines devoted wholly to the quoting syntax. In my Swift-generating 
example, that would change shorter snippets like this:

code +=      "    
             "    static var messages: [HTTPStatus: String] = [
             ""

Into things like this:

code +=      """
                 
                 static var messages: [HTTPStatus: String] = [
                            
             """

To my mind, the second syntax is actually *heavier*, despite not requiring 
every line be marked, because it takes two extra lines and additional 
punctuation.

7. You are also introducing visual ambiguity into the equation—in the above 
example, the left margin is now ambiguous to the eye (even if it's not 
ambiguous to the compiler). You could recover it by permitting non-whitespace 
prefix characters:

code +=      """
            |    
            |    static var messages: [HTTPStatus: String] = [
            |
            |"""

...but then we're back to annotating every line, *plus* we have the leading and 
trailing `"""` lines. Worst of both worlds.

8. In longer examples, you are dividing the expression in half in a way that 
makes it difficult to read. For instance, consider this code:

        socket.send( 
            """ #escaped #marginStripped 
            <?xml version="1.0"?>
            <catalog>
               <book id="bk101" empty="">
                   <author>\(author)</author>
                   <title>XML Developer's Guide</title>
                   <genre>Computer</genre>
                   <price>44.95</price>
                   <publish_date>2000-10-01</publish_date>
                   <description>An in-depth look at creating applications with 
XML.</description>
               </book>
            </catalog>
            """.data(using: NSUTF8StringEncoding))

The effect—particularly with even larger literals than this—is not unlike 
pausing in the middle of reading an article to watch a movie. What were we 
talking about again?

This problem is neatly avoided by a heredoc syntax, which keeps the expression 
together and then collects the string below it:

        socket.send(""".data(using: NSUTF8StringEncoding))
            <?xml version="1.0"?>
            <catalog>
               <book id="bk101" empty="">
                   <author>\(author)</author>
                   <title>XML Developer's Guide</title>
                   <genre>Computer</genre>
                   <price>44.95</price>
                   <publish_date>2000-10-01</publish_date>
                   <description>An in-depth look at creating applications with 
XML.</description>
               </book>
            </catalog>
            """

(I'm assuming there's no need for #escaped or #marginStripped; they're both 
enabled by default.)

* * *

Let's actually talk about heredocs. Leaving aside indentation (which can be 
applied to either feature) and the traditional token choices (which can be 
changed), I think these are the pros of heredocs compared to Python 
triple-quotes:

H1: Doesn't break up expressions, as discussed above.
H2: Literal content formatting is completely unaffected by code formatting, 
including the first and last lines.

Here are the pros of Python triple-quotes compared to heredocs:

P1: Simpler to explain: "like a string literal, but really big".
P2: Lighter syntactic weight, enough to make`"""` usable as a single-line 
syntax.
P3: Less trailing-newline confusion.

(There is one other difference: `"""` is simpler to parse, so we might be able 
to get it in Swift 3, whereas heredocs probably have to wait for Swift 4. But I 
don't think we should pick one feature over another merely so we can get it 
sooner. It's one thing if you plan to eventually introduce both features, as I 
plan to eventually have both continuation quotes and heredocs, to introduce 
each of them as soon as you can; it's another to actually choose one feature 
over another specifically to get something you can implement sooner.)

But the design you're discussing trades P2 and P3—and frankly, with the 
mandatory newlines, part of P1—away in an attempt to get H2. So we end up 
deciding between these two selling points:

* This triple-quotes design: Simpler to explain.
* Heredocs: Doesn't break up expressions.

Simplicity is good, but I really like the code reading benefits of heredocs. 
Your code is your code and your text is your text. The interface between them 
is a bit funky, but within their separate worlds, they're both pretty nice.

* * *

Either way, heredocs or multiline-only triple quotes could be tweaked to 
support indentation by using the indentation of the end delimiter. But as I 
explained above, I don't think that's a great idea for either triple quotes 
*or* heredocs—the edge of the indentation is not visually well defined enough.

That's why I came to the conclusion that trying to cram every multiline literal 
into one syntax is trying to cram too many peg shapes into one hole shape. 
Indentation should *only* be supported by a dedicated syntax which is also 
designed for the smallest multiline strings, where indentation support is most 
useful. A separate feature without indentation support should handle longer 
strings, where the length alone is so disruptive to the flow of your code that 
there's just no point even trying to indent them to match (and the break with 
normal indentation itself assists you in finding the end of the string).

And I think that the best choice for the first feature is continuation quotes, 
and for the second is heredocs. Triple-quote syntaxes—either Python's or this 
modification—are jacks of all trades, but masters of none.

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to