Am 28.04.2016 um 23:56 schrieb Brent Royal-Gordon via swift-evolution
<[email protected]>:
Awesome. Some specific suggestions below, but feel free to iterate
in a pull request if you prefer that.
I've adopted these suggestions in some form, though I also ended up
rewriting the explanation of why the feature was designed as it is and
fusing it with material from "Alternatives considered".
(Still not sure who I should list as a co-author. I'm currently
thinking John, Tyler, and maybe Chris? Who's supposed to go there?)
Multiline string literals
• Proposal: SE-NNNN • Author(s): Brent Royal-Gordon • Status: Second
Draft • Review manager: TBD Introduction
In Swift 2.2, the only means to insert a newline into a string literal
is the \n escape. String literals specified in this way are generally
ugly and unreadable. We propose a multiline string feature inspired by
English punctuation which is a straightforward extension of our
existing string literals.
This proposal is one step in a larger plan to improve how string
literals address various challenging use cases. It is not meant to
solve all problems with escaping, nor to serve all use cases involving
very long string literals. See the "Future directions for string
literals in general" section for a sketch of the problems we
ultimately want to address and some ideas of how we might do so.
Swift-evolution threads: multi-line string literals. (April),
multi-line string literals (December)
Draft Notes
• Removes the comment feature, which was felt to be an unnecessary
complication. This and the backslash feature have been listed as
future directions.
• Loosens the specification of diagnostics, suggesting instead of
requiring fix-its.
• Splits a "Rationale" section out of the "Proposed solution"
section.
• Adds extensive discussion of other features which wold combine with
this one.
• I've listed only myself as an author because I don't want to put
anyone else's name to a document they haven't seen, but there are
others who deserve to be listed (John Holdsworth at least). Let me
know if you think you should be included.
Motivation
As Swift begins to move into roles beyond app development, code which
needs to generate text becomes a more important use case. Consider,
for instance, generating even a small XML string:
let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes
and run-together lines; it looks like little more than line noise. We
can improve its readability somewhat by concatenating separate strings
for each line and using real tabs instead of \t escapes:
let xml = "<?xml version=\"1.0\"?>\n" +
"<catalog>\n" +
" <book id=\"bk101\" empty=\"\">\n" +
" <author>\(author)</author>\n" +
" </book>\n" +
"</catalog>" However, this creates a more complex expression for the
type checker, and there's still far more punctuation than ought to be
necessary. If the most important goal of Swift is making code
readable, this kind of code falls far short of that goal.
Proposed solution
We propose that, when Swift is parsing a string literal, if it reaches
the end of the line without encountering an end quote, it should look
at the next line. If it sees a quote at the beginning (a "continuation
quote"), the string literal contains a newline and then continues on
that line. Otherwise, the string literal is unterminated and
syntactically invalid.
Our sample above could thus be written as:
let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
empty=\"\"> " <author>\(author)</author> " </book> "</catalog>"
If the second or subsequent lines had not begun with a quotation mark,
or the trailing quotation mark after the </catalog>tag had not been
included, Swift would have emitted an error.
Rationale
This design is rather unusual, and it's worth pausing a moment to
explain why it has been chosen.
The traditional design for this feature, seen in languages like Perl
and Python, simply places one delimiter at the beginning of the
literal and another at the end. Individual lines in the literal are
not marked in any way.
We think continuation quotes offer several important advantages over
the traditional design:
• They help the compiler pinpoint errors in string literal delimiting.
Traditional multiline strings have a serious weakness: if you forget
the closing quote, the compiler has no idea where you wanted the
literal to end. It simply continues on until the compiler encounters
another quote (or the end of the file). If you're lucky, the text
after that quote is not valid code, and the resulting error will at
least point you to the next string literal in the file. If you're
unlucky, you'll get a seemingly unrelated error several literals
later, an unbalanced brace error at the end of the file, or perhaps
even code that compiles but does something totally wrong.
(This is not a minor concern. Many popular languages, including C and
Swift 2, specifically reject newlines in string literals to prevent
this from happening.)
Continuation quotes provide the compiler with redundant information
about your intent. If you forget a closing quote, the continuation
quotes give the compiler a very good idea of where you meant to put
it. The compiler can point you to (or at least very near) the end of
the literal, where you want to insert the quote, rather than showing
you the beginning of the literal or even some unrelated error later in
the file that was caused by the missing quote.
• Temporarily unclosed literals don't make editors go haywire. The
syntax highlighter has the same trouble parsing half-written, unclosed
traditional quotes that the compiler does: It can't tell where the
literal is supposed to end and the code should begin. It must either
apply heuristics to try to guess where the literal ends, or
incorrectly color everything between the opening quote and the next
closing quote as a string literal. This can cause the file's coloring
to alternate distractingly between "string literal" and "running
code".
Continuation quotes give the syntax highlighter enough context to
guess at the correct coloration, even when the string isn't complete
yet. Lines with a continuation quote are literals; lines without are
code. At worst, the syntax highlighter might incorrectly color a few
characters at the end of a line, rather than the remainder of the
file.
• They separate indentation from the string's contents. Traditional
multiline strings usually include all of the content between the start
and end delimiters, including leading whitespace. This means that it's
usually impossible to indent a multiline string, so including one
breaks up the flow of the surrounding code, making it less readable.
Some languages apply heuristics or mode switches to try to remove
indentation, but like all heuristics, these are mistake-prone and
murky.
Continuation quotes neatly avoid this problem. Whitespace before the
continuation quote is indentation used to format the source code;
whitespace after the continuation quote is part of the string literal.
The interpretation of the code is perfectly clear to both compiler and
programmer.
• They improve the ability to quickly recognize the literal.
Traditional multiline strings don't provide much visual help. To find
the end, you must visually scan until you find the matching delimiter,
which may be only one or a few characters long. When looking at a
random line of source, it can be hard to tell at a glance whether it's
code or literal. Syntax highlighting can help with these issues, but
it's often unreliable, especially with advanced, idiosyncratic string
literal features like multiline strings.
Continuation quotes solve these problems. To find the end of the
literal, just scan down the column of continuation characters until
they end. To figure out if a given line of source is part of a
literal, just see if it starts with a quote mark. The meaning of the
source becomes obvious at a glance.
Nevertheless, the traditional design does has a few advantages:
• It is simpler. Although continuation quotes are more complex, we
believe that the advantages listed above pay for that complexity.
• There is no need to edit the intervening lines to add continuation
quotes. While the additional effort required to insert continuation
quotes is an important downside, we believe that tool support,
including both compiler fix-its and perhaps editor support for
commands like "Paste as String Literal", can address this issue. In
some editors, new features aren't even necessary; TextMate, for
instance, lets you insert a character on several lines simultaneously.
And new tool features could also address other issues like escaping
embedded quotes.
• Naïve syntax highlighters may have trouble understanding this
syntax. This is true, but naïve syntax highlighters generally have
terrible trouble with advanced string literal constructs; some
struggle with even basic ones. While there are some designs (like
Python's """ strings) which trick some syntax highlighters into
working some of the time with some contents, we don't think this
occasional, accidental compatibility is a big enough gain to justify
changing the design.
• It looks funny—quotes should always be in matched pairs. We aren't
aware of another programming language which uses unbalanced quotes in
string literals, but there is one very important precedent for this
kind of formatting: natural languages. English, for instance, uses a
very similar format for quoting multiple lines of dialog by the same
speaker. As an English Stack Exchange answer illustrates:
“That seems like an odd way to use punctuation,” Tom said. “What harm
would there be in using quotation marks at the end of every
paragraph?”
“Oh, that’s not all that complicated,” J.R. answered. “If you closed
quotes at the end of every paragraph, then you would need to
reidentify the speaker with every subsequent paragraph.
“Say a narrative was describing two or three people engaged in a
lengthy conversation. If you closed the quotation marks in the
previous paragraph, then a reader wouldn’t be able to easily tell if
the previous speaker was extending his point, or if someone else in
the room had picked up the conversation. By leaving the previous
paragraph’s quote unclosed, the reader knows that the previous speaker
is still the one talking.”
“Oh, that makes sense. Thanks!” In English, omitting the ending
quotation mark tells the text's reader that the quote continues on the
next line, while including a quotation mark at the beginning of the
next line reminds the reader that they're in the middle of a quote.
Similarly, in this proposal, omitting the ending quotation mark tells
the code's reader (and compiler) that the string literal continues on
the next line, while including a quotation mark at the beginning of
the next line reminds the reader (and compiler) that they're in the
middle of a string literal.
On balance, we think continuation quotes are the best design for this
problem.
Detailed design
When Swift is parsing a string literal and reaches the end of a line
without finding a closing quote, it examines the next line, applying
the following rules:
• If the next line begins with whitespace followed by a continuation
quote, then the string literal contains a newline followed by the
contents of the string literal starting on that line. (This line may
itself have no closing quote, in which case the same rules apply to
the line which follows.)
• If the next line contains anything else, Swift raises a syntax error
for an unterminated string literal.
The exact error messages and diagnostics provided are left to the
implementers to determine, but we believe it should be possible to
provide two fix-its which will help users learn the syntax and correct
string literal mistakes:
• Insert " at the end of the current line to terminate the quote.
• Insert " at the beginning of the next line (with some indentation
heuristics) to continue the quote on the next line.
Impact on existing code
Failing to close a string literal before the end of the line is
currently a syntax error, so no valid Swift code should be affected by
this change.
Future directions for multiline string literals
• We could permit comments before encountering a continuation quote to
be counted as whitespace, and permit empty lines in the middle of
string literals. This would allow you to comment out whole lines in
the literal.
• We could allow you to put a trailing backslash on a line to indicate
that the newline isn't "real" and should be omitted from the literal's
contents.
Future directions for string literals in general
There are other issues with Swift's string handling which this
proposal intentionally does not address:
• Reducing the amount of double-backslashing needed when working with
regular expression libraries, Windows paths, source code generation,
and other tasks where backslashes are part of the data.
• Alternate delimiters or other strategies for writing strings with "
characters in them.
• Accommodating code formatting concerns like hard wrapping and
commenting.
• String literals consisting of very long pieces of text which are
best represented completely verbatim, with minimal alteration.
This section briefly outlines some future proposals which might
address these issues. Combined, we believe they would address most of
the string literal use cases which Swift is currently not very good
at.
Please note that these are simply sketches of hypothetical future
designs; they may radically change before proposal, and some may never
be proposed at all. Many, perhaps most, will not be proposed for Swift
3. We are sketching these designs not to propose and refine these
features immediately, but merely to show how we think they might be
solved in ways which complement this proposal.
String literal modifiers
A string literal modifier is a cluster of identifier characters which
goes before a string literal and adjusts the way it is parsed.
Modifers only alter the interpretation of the text in the literal, not
the type of data it produces; for instance, there will never be
something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++.
Uppercase characters enable a feature; lowercase characters disable a
feature.
Modifiers can be attached to both single-line and multiline literals,
and could also be attached to other literal syntaxes which might be
introduced in the future. When used with multiline strings, only the
starting quote needs to carry the modifiers, not the continuation
quotes.
Modifiers are an extremely flexible feature which can be used for many
proposes. Of the ideas listed below, we believe the e modifier is an
urgent addition which should be included in Swift 3 if at all
possible; the others are less urgent and most of them could be
deferred, or at least added later if time allows.
• Escape disabling: e"\\\" (string with three backslash characters)
• Fine-grained escape disabling: i"\(foo)\n" (the string \(foo)
followed by a newline); eI"\(foo)\n" (the contents of foo followed by
the string \n), b"\w+\n" (the string \w+ followed by a newline)
• Alternate delimiters: _ has no lowercase form, so it could be used
to allow strings with internal quotes: _"print("Hello, world!")"_,
__"print("Hello, world!")"__, etc.
• Whitespace normalization: changes all runs of whitespace in the
literal to single space characters; this would allow you to use
multiline strings purely to improve code formatting.
alert.informativeText = W"\(appName) could not typeset the element
“\(title)” because "it includes a link to an element that has been
removed from this "book."
• Localization:
alert.informativeText = LW"\(appName) could not typeset the element
“\(title)” because "it includes a link to an element that has been
removed from this "book."
• Comments: Embedding comments in string literals might be useful for
literals containing regular expressions or other code.
Eventually, user-specified string modifiers could be added to Swift,
perhaps as part of a hygienic macro system. It might also become
possible to change the default modifiers applied to literals in a
particular file or scope.
Heredocs or other "verbatim string literal" features
Sometimes it really is best to just splat something else down in the
middle of a file full of Swift source code. Maybe the file is
essentially a template and the literals are a majority of the code's
contents, or maybe you're writing a code generator and just want to
get string data into it with minimal fuss, or maybe people unfamiliar
with Swift need to be able to edit the literals. Whatever the reason,
the normal string literal syntax is just too burdensome.
One approach to this problem is heredocs. A heredoc allows you to put
a placeholder for a literal on one line; the contents of the literal
begin on the next line, running up to some delimiter. It would be
possible to put multiple placeholders in a single line, and to apply
string modifiers to them.
In Swift, this might look like:
print(#to("---") + e#to("END" )) It was a dark and stormy \(timeOfDay)
when --- the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline
string literals bounded by a different delimiter, like """. This might
look like:
print(""" It was a dark and stormy \(timeOfDay) when """ + e""" the
Swift core team invented the \(interpolation) syntax. """) Although
heredocs could make a good addition to Swift eventually, there are
good reasons to defer them for now. Please see the "Alternatives
considered" section for details.
First-class regular expressions
Members of the core team are interested in regular expressions, but
they don't want to just build a literal that wraps PCRE or libicu;
rather, they aim to integrate regexes into the pattern matching system
and give them a deep, Perl 6-style rethink. This would be a major
effort, far beyond the scope of Swift 3.
In the meantime, the e modifier and perhaps other string literal
modifiers will make it easier to specify regular expressions in string
literals for use with NSRegularExpression and other libraries
accessible from Swift.
Alternatives considered
Requiring no continuation character
The main alternative is to not require a continuation quote, and
simply extend the string literal from the starting quote to the ending
quote, including all newlines between them. For example:
let xml = "<?xml version=\"1.0\"?> <catalog> <book id=\"bk101\"
empty=\"\"> <author>\(author)</author> </book> </catalog>" This
alternative is extensively discussed in the "Rationale" section
above.
Skip multiline strings and just support heredocs
There are definitely cases where a heredoc would be a better solution,
such as generated code or code which is mostly literals with a little
Swift sprinkled around. On the other hand, there are also cases where
multiline strings are better: short strings in code which is meant to
be read. If a single feature can't handle them both well, there's no
shame in supporting the two features separately.
It makes sense to support multiline strings first because:
• They extend existing syntax instead of introducing new syntax.
• They are much easier to parse; heredocs require some kind of mode in
the parser which kicks in at the start of the next line, whereas
multiline string literals can be handled in the lexer.
• As discussed in "Rationale", they offer better diagnostics, code
formatting, and visual scannability.
Use a different delimiter for multiline strings
The initial suggestion was that multiline strings should use a
different delimiter, """, at the beginning and end of the string, with
no continuation characters between. Like heredocs, this might be a
good alternative for certain use cases, but it has the same basic
flaws as the "no continuation character" solution.
-- Brent Royal-Gordon Architechies
_______________________________________________ swift-evolution
mailing list [email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution