Awesome. Some specific suggestions below, but feel free to iterate in a
pull request if you prefer that.
I've adopted these suggestions in some form, though I also ended up
rewriting the explanation of why the feature was designed as it is and
fusing it with material from "Alternatives considered".
(Still not sure who I should list as a co-author. I'm currently thinking
John, Tyler, and maybe Chris? Who's supposed to go there?)
Multiline string literals
* Proposal: SE-NNNN
<https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
* Author(s): Brent Royal-Gordon <https://github.com/brentdax>
* Status: *Second Draft*
* Review manager: TBD
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction
In Swift 2.2, the only means to insert a newline into a string literal is
the |\n| escape. String literals specified in this way are generally ugly
and unreadable. We propose a multiline string feature inspired by English
punctuation which is a straightforward extension of our existing string
literals.
This proposal is one step in a larger plan to improve how string literals
address various challenging use cases. It is not meant to solve all
problems with escaping, nor to serve all use cases involving very long
string literals. See the "Future directions for string literals in general"
section for a sketch of the problems we ultimately want to address and some
ideas of how we might do so.
Swift-evolution threads: multi-line string literals. (April)
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>,
multi-line
string literals (December)
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft
Notes
*
Removes the comment feature, which was felt to be an unnecessary
complication. This and the backslash feature have been listed as future
directions.
*
Loosens the specification of diagnostics, suggesting instead of
requiring fix-its.
*
Splits a "Rationale" section out of the "Proposed solution" section.
*
Adds extensive discussion of other features which wold combine with
this one.
*
I've listed only myself as an author because I don't want to put anyone
else's name to a document they haven't seen, but there are others who
deserve to be listed (John Holdsworth at least). Let me know if you
think you should be included.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation
As Swift begins to move into roles beyond app development, code which needs
to generate text becomes a more important use case. Consider, for instance,
generating even a small XML string:
let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes and
run-together lines; it looks like little more than line noise. We can
improve its readability somewhat by concatenating separate strings for each
line and using real tabs instead of |\t| escapes:
let xml = "<?xml version=\"1.0\"?>\n" +
"<catalog>\n" +
" <book id=\"bk101\" empty=\"\">\n" +
" <author>\(author)</author>\n" +
" </book>\n" +
"</catalog>"
However, this creates a more complex expression for the type checker, and
there's still far more punctuation than ought to be necessary. If the most
important goal of Swift is making code readable, this kind of code falls
far short of that goal.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed
solution
We propose that, when Swift is parsing a string literal, if it reaches the
end of the line without encountering an end quote, it should look at the
next line. If it sees a quote at the beginning (a "continuation quote"),
the string literal contains a newline and then continues on that line.
Otherwise, the string literal is unterminated and syntactically invalid.
Our sample above could thus be written as:
|let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
empty=\"\"> " <author>\(author)</author> " </book> "</catalog>" |
If the second or subsequent lines had not begun with a quotation mark, or
the trailing quotation mark after the |</catalog>|tag had not been
included, Swift would have emitted an error.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale
This design is rather unusual, and it's worth pausing a moment to explain
why it has been chosen.
The traditional design for this feature, seen in languages like Perl and
Python, simply places one delimiter at the beginning of the literal and
another at the end. Individual lines in the literal are not marked in any way.
We think continuation quotes offer several important advantages over the
traditional design:
1.
*They help the compiler pinpoint errors in string literal
delimiting.* Traditional multiline strings have a serious weakness: if
you forget the closing quote, the compiler has no idea where you wanted
the literal to end. It simply continues on until the compiler
encounters another quote (or the end of the file). If you're lucky, the
text after that quote is not valid code, and the resulting error will
at least point you to the next string literal in the file. If you're
unlucky, you'll get a seemingly unrelated error several literals later,
an unbalanced brace error at the end of the file, or perhaps even code
that compiles but does something totally wrong.
(This is not a minor concern. Many popular languages, including C and
Swift 2, specifically reject newlines in string literals to prevent
this from happening.)
Continuation quotes provide the compiler with redundant information
about your intent. If you forget a closing quote, the continuation
quotes give the compiler a very good idea of where you meant to put it.
The compiler can point you to (or at least very near) the /end/ of the
literal, where you want to insert the quote, rather than showing you
the /beginning/ of the literal or even some unrelated error later in
the file that was caused by the missing quote.
2.
*Temporarily unclosed literals don't make editors go haywire.* The
syntax highlighter has the same trouble parsing half-written, unclosed
traditional quotes that the compiler does: It can't tell where the
literal is supposed to end and the code should begin. It must either
apply heuristics to try to guess where the literal ends, or incorrectly
color everything between the opening quote and the next closing quote
as a string literal. This can cause the file's coloring to alternate
distractingly between "string literal" and "running code".
Continuation quotes give the syntax highlighter enough context to guess
at the correct coloration, even when the string isn't complete yet.
Lines with a continuation quote are literals; lines without are code.
At worst, the syntax highlighter might incorrectly color a few
characters at the end of a line, rather than the remainder of the file.
3.
They separate indentation from the string's contents. Traditional
multiline strings usually include all of the content between the start
and end delimiters, including leading whitespace. This means that it's
usually impossible to indent a multiline string, so including one
breaks up the flow of the surrounding code, making it less readable.
Some languages apply heuristics or mode switches to try to remove
indentation, but like all heuristics, these are mistake-prone and murky.
Continuation quotes neatly avoid this problem. Whitespace before the
continuation quote is indentation used to format the source code;
whitespace after the continuation quote is part of the string literal.
The interpretation of the code is perfectly clear to both compiler and
programmer.
4.
They improve the ability to quickly recognize the literal. Traditional
multiline strings don't provide much visual help. To find the end, you
must visually scan until you find the matching delimiter, which may be
only one or a few characters long. When looking at a random line of
source, it can be hard to tell at a glance whether it's code or
literal. Syntax highlighting can help with these issues, but it's often
unreliable, especially with advanced, idiosyncratic string literal
features like multiline strings.
Continuation quotes solve these problems. To find the end of the
literal, just scan down the column of continuation characters until
they end. To figure out if a given line of source is part of a literal,
just see if it starts with a quote mark. The meaning of the source
becomes obvious at a glance.
Nevertheless, the traditional design /does/ has a few advantages:
1.
*It is simpler.* Although continuation quotes are more complex, we
believe that the advantages listed above pay for that complexity.
2.
*There is no need to edit the intervening lines to add continuation
quotes.* While the additional effort required to insert continuation
quotes is an important downside, we believe that tool support,
including both compiler fix-its and perhaps editor support for commands
like "Paste as String Literal", can address this issue. In some
editors, new features aren't even necessary; TextMate, for instance,
lets you insert a character on several lines simultaneously. And new
tool features could also address other issues like escaping embedded
quotes.
3.
*Naïve syntax highlighters may have trouble understanding this
syntax.* This is true, but naïve syntax highlighters generally have
terrible trouble with advanced string literal constructs; some struggle
with even basic ones. While there are some designs (like
Python's |"""| strings) which trick some syntax highlighters into
working some of the time with some contents, we don't think this
occasional, accidental compatibility is a big enough gain to justify
changing the design.
4.
*It looks funny—quotes should always be in matched pairs.* We aren't
aware of another programming language which uses unbalanced quotes in
string literals, but there /is/ one very important precedent for this
kind of formatting: natural languages. English, for instance, uses a
very similar format for quoting multiple lines of dialog by the same
speaker. As an English Stack Exchange answer illustrates
<http://english.stackexchange.com/a/96613/64636>:
“That seems like an odd way to use punctuation,” Tom said. “What
harm would there be in using quotation marks at the end of every
paragraph?”
“Oh, that’s not all that complicated,” J.R. answered. “If you
closed quotes at the end of every paragraph, then you would need to
reidentify the speaker with every subsequent paragraph.
“Say a narrative was describing two or three people engaged in a
lengthy conversation. If you closed the quotation marks in the
previous paragraph, then a reader wouldn’t be able to easily tell
if the previous speaker was extending his point, or if someone else
in the room had picked up the conversation. By leaving the previous
paragraph’s quote unclosed, the reader knows that the previous
speaker is still the one talking.”
“Oh, that makes sense. Thanks!”
In English, omitting the ending quotation mark tells the text's reader
that the quote continues on the next line, while including a quotation
mark at the beginning of the next line reminds the reader that they're
in the middle of a quote.
Similarly, in this proposal, omitting the ending quotation mark tells
the code's reader (and compiler) that the string literal continues on
the next line, while including a quotation mark at the beginning of the
next line reminds the reader (and compiler) that they're in the middle
of a string literal.
On balance, we think continuation quotes are the best design for this problem.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed
design
When Swift is parsing a string literal and reaches the end of a line
without finding a closing quote, it examines the next line, applying the
following rules:
1.
If the next line begins with whitespace followed by a continuation
quote, then the string literal contains a newline followed by the
contents of the string literal starting on that line. (This line may
itself have no closing quote, in which case the same rules apply to the
line which follows.)
2.
If the next line contains anything else, Swift raises a syntax error
for an unterminated string literal.
The exact error messages and diagnostics provided are left to the
implementers to determine, but we believe it should be possible to provide
two fix-its which will help users learn the syntax and correct string
literal mistakes:
*
Insert |"| at the end of the current line to terminate the quote.
*
Insert |"| at the beginning of the next line (with some indentation
heuristics) to continue the quote on the next line.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact
on existing code
Failing to close a string literal before the end of the line is currently a
syntax error, so no valid Swift code should be affected by this change.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future
directions for multiline string literals
*
We could permit comments before encountering a continuation quote to be
counted as whitespace, and permit empty lines in the middle of string
literals. This would allow you to comment out whole lines in the literal.
*
We could allow you to put a trailing backslash on a line to indicate
that the newline isn't "real" and should be omitted from the literal's
contents.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future
directions for string literals in general
There are other issues with Swift's string handling which this proposal
intentionally does not address:
*
Reducing the amount of double-backslashing needed when working with
regular expression libraries, Windows paths, source code generation,
and other tasks where backslashes are part of the data.
*
Alternate delimiters or other strategies for writing strings
with |"| characters in them.
*
Accommodating code formatting concerns like hard wrapping and commenting.
*
String literals consisting of very long pieces of text which are best
represented completely verbatim, with minimal alteration.
This section briefly outlines some future proposals which might address
these issues. Combined, we believe they would address most of the string
literal use cases which Swift is currently not very good at.
Please note that these are simply sketches of hypothetical future designs;
they may radically change before proposal, and some may never be proposed
at all. Many, perhaps most, will not be proposed for Swift 3. We are
sketching these designs not to propose and refine these features
immediately, but merely to show how we think they might be solved in ways
which complement this proposal.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String
literal modifiers
A string literal modifier is a cluster of identifier characters which goes
before a string literal and adjusts the way it is parsed. Modifers only
alter the interpretation of the text in the literal, not the type of data
it produces; for instance, there will never be something like the
UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a
feature; lowercase characters disable a feature.
Modifiers can be attached to both single-line and multiline literals, and
could also be attached to other literal syntaxes which might be introduced
in the future. When used with multiline strings, only the starting quote
needs to carry the modifiers, not the continuation quotes.
Modifiers are an extremely flexible feature which can be used for many
proposes. Of the ideas listed below, we believe the |e| modifier is an
urgent addition which should be included in Swift 3 if at all possible; the
others are less urgent and most of them could be deferred, or at least
added later if time allows.
*
*Escape disabling*: |e"\\\"| (string with three backslash characters)
*
*Fine-grained escape disabling*: |i"\(foo)\n"| (the
string |\(foo)| followed by a newline); |eI"\(foo)\n"| (the contents
of |foo| followed by the string |\n|), |b"\w+\n"| (the
string |\w+| followed by a newline)
*
*Alternate delimiters*: |_| has no lowercase form, so it could be used
to allow strings with internal quotes: |_"print("Hello,
world!")"_|, |__"print("Hello, world!")"__|, etc.
*
*Whitespace normalization*: changes all runs of whitespace in the
literal to single space characters; this would allow you to use
multiline strings purely to improve code formatting.
|alert.informativeText = W"\(appName) could not typeset the element
“\(title)” because "it includes a link to an element that has been
removed from this "book." |
*
*Localization*:
|alert.informativeText = LW"\(appName) could not typeset the element
“\(title)” because "it includes a link to an element that has been
removed from this "book." |
*
*Comments*: Embedding comments in string literals might be useful for
literals containing regular expressions or other code.
Eventually, user-specified string modifiers could be added to Swift,
perhaps as part of a hygienic macro system. It might also become possible
to change the default modifiers applied to literals in a particular file or
scope.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs
or other "verbatim string literal" features
Sometimes it really is best to just splat something else down in the middle
of a file full of Swift source code. Maybe the file is essentially a
template and the literals are a majority of the code's contents, or maybe
you're writing a code generator and just want to get string data into it
with minimal fuss, or maybe people unfamiliar with Swift need to be able to
edit the literals. Whatever the reason, the normal string literal syntax is
just too burdensome.
One approach to this problem is heredocs. A heredoc allows you to put a
placeholder for a literal on one line; the contents of the literal begin on
the next line, running up to some delimiter. It would be possible to put
multiple placeholders in a single line, and to apply string modifiers to them.
In Swift, this might look like:
print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when
---
the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline string
literals bounded by a different delimiter, like |"""|. This might look like:
print("""
It was a dark and stormy \(timeOfDay) when
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")
Although heredocs could make a good addition to Swift eventually, there are
good reasons to defer them for now. Please see the "Alternatives
considered" section for details.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First-class
regular expressions
Members of the core team are interested in regular expressions, but they
don't want to just build a literal that wraps PCRE or libicu; rather, they
aim to integrate regexes into the pattern matching system and give them a
deep, Perl 6-style rethink. This would be a major effort, far beyond the
scope of Swift 3.
In the meantime, the |e| modifier and perhaps other string literal
modifiers will make it easier to specify regular expressions in string
literals for use with |NSRegularExpression| and other libraries accessible
from Swift.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives
considered
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring
no continuation character
The main alternative is to not require a continuation quote, and simply
extend the string literal from the starting quote to the ending quote,
including all newlines between them. For example:
let xml = "<?xml version=\"1.0\"?>
<catalog>
<book id=\"bk101\" empty=\"\">
<author>\(author)</author>
</book>
</catalog>"
This alternative is extensively discussed in the "Rationale" section above.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip
multiline strings and just support heredocs
There are definitely cases where a heredoc would be a better solution, such
as generated code or code which is mostly literals with a little Swift
sprinkled around. On the other hand, there are also cases where multiline
strings are better: short strings in code which is meant to be read. If a
single feature can't handle them both well, there's no shame in supporting
the two features separately.
It makes sense to support multiline strings first because:
*
They extend existing syntax instead of introducing new syntax.
*
They are much easier to parse; heredocs require some kind of mode in
the parser which kicks in at the start of the next line, whereas
multiline string literals can be handled in the lexer.
*
As discussed in "Rationale", they offer better diagnostics, code
formatting, and visual scannability.
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use
a different delimiter for multiline strings
The initial suggestion was that multiline strings should use a different
delimiter, |"""|, at the beginning and end of the string, with no
continuation characters between. Like heredocs, this might be a good
alternative for certain use cases, but it has the same basic flaws as the
"no continuation character" solution.
--
Brent Royal-Gordon
Architechies
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution