@Brent, I suggest to rename the proposal to be clear that it is not trying to solve the problem with char escaping i.e. with text *as-is*, that it is just removes the \n"+ from the end of the string. I think many can think of "as-is" text feature when starting to read your proposal or will ask questions like "why multi-line proposal does not include proposal for as-is multi-line", I feel like the title is too generic.

Regarding the proposal itself. I'm ready to support it (in case you'll add 'specification' of your multi-line feature in the title like "multi-line with support of escaping and interpolation", so we can then have another proposal like "multi-line without escaping, with text as-is")

One question: what about trailing spaces/tabs in the end of each line? IMO there should be one strict rule to prevent any hard-to-find bugs/errors : your feature must trim all trailing spaces, or should have an explicit marker when to do this or not.

On 29.04.2016 0:56, Brent Royal-Gordon via swift-evolution wrote:
Awesome.  Some specific suggestions below, but feel free to iterate in a
pull request if you prefer that.

I've adopted these suggestions in some form, though I also ended up
rewriting the explanation of why the feature was designed as it is and
fusing it with material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking
John, Tyler, and maybe Chris? Who's supposed to go there?)


  Multiline string literals

  * Proposal: SE-NNNN
    
<https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
  * Author(s): Brent Royal-Gordon <https://github.com/brentdax>
  * Status: *Second Draft*
  * Review manager: TBD


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction

In Swift 2.2, the only means to insert a newline into a string literal is
the |\n| escape. String literals specified in this way are generally ugly
and unreadable. We propose a multiline string feature inspired by English
punctuation which is a straightforward extension of our existing string
literals.

This proposal is one step in a larger plan to improve how string literals
address various challenging use cases. It is not meant to solve all
problems with escaping, nor to serve all use cases involving very long
string literals. See the "Future directions for string literals in general"
section for a sketch of the problems we ultimately want to address and some
ideas of how we might do so.

Swift-evolution threads: multi-line string literals. (April)
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>,
 multi-line
string literals (December)
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft
    Notes

  *

    Removes the comment feature, which was felt to be an unnecessary
    complication. This and the backslash feature have been listed as future
    directions.

  *

    Loosens the specification of diagnostics, suggesting instead of
    requiring fix-its.

  *

    Splits a "Rationale" section out of the "Proposed solution" section.

  *

    Adds extensive discussion of other features which wold combine with
    this one.

  *

    I've listed only myself as an author because I don't want to put anyone
    else's name to a document they haven't seen, but there are others who
    deserve to be listed (John Holdsworth at least). Let me know if you
    think you should be included.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation

As Swift begins to move into roles beyond app development, code which needs
to generate text becomes a more important use case. Consider, for instance,
generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"

The string is practically unreadable, its structure drowned in escapes and
run-together lines; it looks like little more than line noise. We can
improve its readability somewhat by concatenating separate strings for each
line and using real tabs instead of |\t| escapes:

let xml = "<?xml version=\"1.0\"?>\n" +
          "<catalog>\n" +
          " <book id=\"bk101\" empty=\"\">\n" +
          " <author>\(author)</author>\n" +
          " </book>\n" +
          "</catalog>"

However, this creates a more complex expression for the type checker, and
there's still far more punctuation than ought to be necessary. If the most
important goal of Swift is making code readable, this kind of code falls
far short of that goal.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed
    solution

We propose that, when Swift is parsing a string literal, if it reaches the
end of the line without encountering an end quote, it should look at the
next line. If it sees a quote at the beginning (a "continuation quote"),
the string literal contains a newline and then continues on that line.
Otherwise, the string literal is unterminated and syntactically invalid.

Our sample above could thus be written as:

|let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
empty=\"\"> " <author>\(author)</author> " </book> "</catalog>" |

If the second or subsequent lines had not begun with a quotation mark, or
the trailing quotation mark after the |</catalog>|tag had not been
included, Swift would have emitted an error.


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale

This design is rather unusual, and it's worth pausing a moment to explain
why it has been chosen.

The traditional design for this feature, seen in languages like Perl and
Python, simply places one delimiter at the beginning of the literal and
another at the end. Individual lines in the literal are not marked in any way.

We think continuation quotes offer several important advantages over the
traditional design:

 1.

    *They help the compiler pinpoint errors in string literal
    delimiting.* Traditional multiline strings have a serious weakness: if
    you forget the closing quote, the compiler has no idea where you wanted
    the literal to end. It simply continues on until the compiler
    encounters another quote (or the end of the file). If you're lucky, the
    text after that quote is not valid code, and the resulting error will
    at least point you to the next string literal in the file. If you're
    unlucky, you'll get a seemingly unrelated error several literals later,
    an unbalanced brace error at the end of the file, or perhaps even code
    that compiles but does something totally wrong.

    (This is not a minor concern. Many popular languages, including C and
    Swift 2, specifically reject newlines in string literals to prevent
    this from happening.)

    Continuation quotes provide the compiler with redundant information
    about your intent. If you forget a closing quote, the continuation
    quotes give the compiler a very good idea of where you meant to put it.
    The compiler can point you to (or at least very near) the /end/ of the
    literal, where you want to insert the quote, rather than showing you
    the /beginning/ of the literal or even some unrelated error later in
    the file that was caused by the missing quote.

 2.

    *Temporarily unclosed literals don't make editors go haywire.* The
    syntax highlighter has the same trouble parsing half-written, unclosed
    traditional quotes that the compiler does: It can't tell where the
    literal is supposed to end and the code should begin. It must either
    apply heuristics to try to guess where the literal ends, or incorrectly
    color everything between the opening quote and the next closing quote
    as a string literal. This can cause the file's coloring to alternate
    distractingly between "string literal" and "running code".

    Continuation quotes give the syntax highlighter enough context to guess
    at the correct coloration, even when the string isn't complete yet.
    Lines with a continuation quote are literals; lines without are code.
    At worst, the syntax highlighter might incorrectly color a few
    characters at the end of a line, rather than the remainder of the file.

 3.

    They separate indentation from the string's contents. Traditional
    multiline strings usually include all of the content between the start
    and end delimiters, including leading whitespace. This means that it's
    usually impossible to indent a multiline string, so including one
    breaks up the flow of the surrounding code, making it less readable.
    Some languages apply heuristics or mode switches to try to remove
    indentation, but like all heuristics, these are mistake-prone and murky.

    Continuation quotes neatly avoid this problem. Whitespace before the
    continuation quote is indentation used to format the source code;
    whitespace after the continuation quote is part of the string literal.
    The interpretation of the code is perfectly clear to both compiler and
    programmer.

 4.

    They improve the ability to quickly recognize the literal. Traditional
    multiline strings don't provide much visual help. To find the end, you
    must visually scan until you find the matching delimiter, which may be
    only one or a few characters long. When looking at a random line of
    source, it can be hard to tell at a glance whether it's code or
    literal. Syntax highlighting can help with these issues, but it's often
    unreliable, especially with advanced, idiosyncratic string literal
    features like multiline strings.

    Continuation quotes solve these problems. To find the end of the
    literal, just scan down the column of continuation characters until
    they end. To figure out if a given line of source is part of a literal,
    just see if it starts with a quote mark. The meaning of the source
    becomes obvious at a glance.

Nevertheless, the traditional design /does/ has a few advantages:

 1.

    *It is simpler.* Although continuation quotes are more complex, we
    believe that the advantages listed above pay for that complexity.

 2.

    *There is no need to edit the intervening lines to add continuation
    quotes.* While the additional effort required to insert continuation
    quotes is an important downside, we believe that tool support,
    including both compiler fix-its and perhaps editor support for commands
    like "Paste as String Literal", can address this issue. In some
    editors, new features aren't even necessary; TextMate, for instance,
    lets you insert a character on several lines simultaneously. And new
    tool features could also address other issues like escaping embedded
    quotes.

 3.

    *Naïve syntax highlighters may have trouble understanding this
    syntax.* This is true, but naïve syntax highlighters generally have
    terrible trouble with advanced string literal constructs; some struggle
    with even basic ones. While there are some designs (like
    Python's |"""| strings) which trick some syntax highlighters into
    working some of the time with some contents, we don't think this
    occasional, accidental compatibility is a big enough gain to justify
    changing the design.

 4.

    *It looks funny—quotes should always be in matched pairs.* We aren't
    aware of another programming language which uses unbalanced quotes in
    string literals, but there /is/ one very important precedent for this
    kind of formatting: natural languages. English, for instance, uses a
    very similar format for quoting multiple lines of dialog by the same
    speaker. As an English Stack Exchange answer illustrates
    <http://english.stackexchange.com/a/96613/64636>:

        “That seems like an odd way to use punctuation,” Tom said. “What
        harm would there be in using quotation marks at the end of every
        paragraph?”

        “Oh, that’s not all that complicated,” J.R. answered. “If you
        closed quotes at the end of every paragraph, then you would need to
        reidentify the speaker with every subsequent paragraph.

        “Say a narrative was describing two or three people engaged in a
        lengthy conversation. If you closed the quotation marks in the
        previous paragraph, then a reader wouldn’t be able to easily tell
        if the previous speaker was extending his point, or if someone else
        in the room had picked up the conversation. By leaving the previous
        paragraph’s quote unclosed, the reader knows that the previous
        speaker is still the one talking.”

        “Oh, that makes sense. Thanks!”

    In English, omitting the ending quotation mark tells the text's reader
    that the quote continues on the next line, while including a quotation
    mark at the beginning of the next line reminds the reader that they're
    in the middle of a quote.

    Similarly, in this proposal, omitting the ending quotation mark tells
    the code's reader (and compiler) that the string literal continues on
    the next line, while including a quotation mark at the beginning of the
    next line reminds the reader (and compiler) that they're in the middle
    of a string literal.

On balance, we think continuation quotes are the best design for this problem.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed
    design

When Swift is parsing a string literal and reaches the end of a line
without finding a closing quote, it examines the next line, applying the
following rules:

 1.

    If the next line begins with whitespace followed by a continuation
    quote, then the string literal contains a newline followed by the
    contents of the string literal starting on that line. (This line may
    itself have no closing quote, in which case the same rules apply to the
    line which follows.)

 2.

    If the next line contains anything else, Swift raises a syntax error
    for an unterminated string literal.

The exact error messages and diagnostics provided are left to the
implementers to determine, but we believe it should be possible to provide
two fix-its which will help users learn the syntax and correct string
literal mistakes:

  *

    Insert |"| at the end of the current line to terminate the quote.

  *

    Insert |"| at the beginning of the next line (with some indentation
    heuristics) to continue the quote on the next line.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact
    on existing code

Failing to close a string literal before the end of the line is currently a
syntax error, so no valid Swift code should be affected by this change.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future
    directions for multiline string literals

  *

    We could permit comments before encountering a continuation quote to be
    counted as whitespace, and permit empty lines in the middle of string
    literals. This would allow you to comment out whole lines in the literal.

  *

    We could allow you to put a trailing backslash on a line to indicate
    that the newline isn't "real" and should be omitted from the literal's
    contents.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future
    directions for string literals in general

There are other issues with Swift's string handling which this proposal
intentionally does not address:

  *

    Reducing the amount of double-backslashing needed when working with
    regular expression libraries, Windows paths, source code generation,
    and other tasks where backslashes are part of the data.

  *

    Alternate delimiters or other strategies for writing strings
    with |"| characters in them.

  *

    Accommodating code formatting concerns like hard wrapping and commenting.

  *

    String literals consisting of very long pieces of text which are best
    represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might address
these issues. Combined, we believe they would address most of the string
literal use cases which Swift is currently not very good at.

Please note that these are simply sketches of hypothetical future designs;
they may radically change before proposal, and some may never be proposed
at all. Many, perhaps most, will not be proposed for Swift 3. We are
sketching these designs not to propose and refine these features
immediately, but merely to show how we think they might be solved in ways
which complement this proposal.


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String
      literal modifiers

A string literal modifier is a cluster of identifier characters which goes
before a string literal and adjusts the way it is parsed. Modifers only
alter the interpretation of the text in the literal, not the type of data
it produces; for instance, there will never be something like the
UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a
feature; lowercase characters disable a feature.

Modifiers can be attached to both single-line and multiline literals, and
could also be attached to other literal syntaxes which might be introduced
in the future. When used with multiline strings, only the starting quote
needs to carry the modifiers, not the continuation quotes.

Modifiers are an extremely flexible feature which can be used for many
proposes. Of the ideas listed below, we believe the |e| modifier is an
urgent addition which should be included in Swift 3 if at all possible; the
others are less urgent and most of them could be deferred, or at least
added later if time allows.

  *

    *Escape disabling*: |e"\\\"| (string with three backslash characters)

  *

    *Fine-grained escape disabling*: |i"\(foo)\n"| (the
    string |\(foo)| followed by a newline); |eI"\(foo)\n"| (the contents
    of |foo| followed by the string |\n|), |b"\w+\n"| (the
    string |\w+| followed by a newline)

  *

    *Alternate delimiters*: |_| has no lowercase form, so it could be used
    to allow strings with internal quotes: |_"print("Hello,
    world!")"_|, |__"print("Hello, world!")"__|, etc.

  *

    *Whitespace normalization*: changes all runs of whitespace in the
    literal to single space characters; this would allow you to use
    multiline strings purely to improve code formatting.

    |alert.informativeText = W"\(appName) could not typeset the element
    “\(title)” because "it includes a link to an element that has been
    removed from this "book." |

  *

    *Localization*:

    |alert.informativeText = LW"\(appName) could not typeset the element
    “\(title)” because "it includes a link to an element that has been
    removed from this "book." |

  *

    *Comments*: Embedding comments in string literals might be useful for
    literals containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift,
perhaps as part of a hygienic macro system. It might also become possible
to change the default modifiers applied to literals in a particular file or
scope.


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs
      or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the middle
of a file full of Swift source code. Maybe the file is essentially a
template and the literals are a majority of the code's contents, or maybe
you're writing a code generator and just want to get string data into it
with minimal fuss, or maybe people unfamiliar with Swift need to be able to
edit the literals. Whatever the reason, the normal string literal syntax is
just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put a
placeholder for a literal on one line; the contents of the literal begin on
the next line, running up to some delimiter. It would be possible to put
multiple placeholders in a single line, and to apply string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when
---
the Swift core team invented the \(interpolation) syntax.
END

Another possible approach would be to support traditional multiline string
literals bounded by a different delimiter, like |"""|. This might look like:

print("""
It was a dark and stormy \(timeOfDay) when
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")

Although heredocs could make a good addition to Swift eventually, there are
good reasons to defer them for now. Please see the "Alternatives
considered" section for details.


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First-class
      regular expressions

Members of the core team are interested in regular expressions, but they
don't want to just build a literal that wraps PCRE or libicu; rather, they
aim to integrate regexes into the pattern matching system and give them a
deep, Perl 6-style rethink. This would be a major effort, far beyond the
scope of Swift 3.

In the meantime, the |e| modifier and perhaps other string literal
modifiers will make it easier to specify regular expressions in string
literals for use with |NSRegularExpression| and other libraries accessible
from Swift.


    
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives
    considered


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring
      no continuation character

The main alternative is to not require a continuation quote, and simply
extend the string literal from the starting quote to the ending quote,
including all newlines between them. For example:

let xml = "<?xml version=\"1.0\"?>
<catalog>
<book id=\"bk101\" empty=\"\">
<author>\(author)</author>
</book>
</catalog>"

This alternative is extensively discussed in the "Rationale" section above.


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip
      multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution, such
as generated code or code which is mostly literals with a little Swift
sprinkled around. On the other hand, there are also cases where multiline
strings are better: short strings in code which is meant to be read. If a
single feature can't handle them both well, there's no shame in supporting
the two features separately.

It makes sense to support multiline strings first because:

  *

    They extend existing syntax instead of introducing new syntax.

  *

    They are much easier to parse; heredocs require some kind of mode in
    the parser which kicks in at the start of the next line, whereas
    multiline string literals can be handled in the lexer.

  *

    As discussed in "Rationale", they offer better diagnostics, code
    formatting, and visual scannability.


      
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use
      a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a different
delimiter, |"""|, at the beginning and end of the string, with no
continuation characters between. Like heredocs, this might be a good
alternative for certain use cases, but it has the same basic flaws as the
"no continuation character" solution.

--
Brent Royal-Gordon
Architechies



_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to