Re: [swift-evolution] multi-line string literals.

Michael Peternell via swift-evolution Thu, 28 Apr 2016 15:33:12 -0700

is it just me who would prefer a multiline string literal to not require a 
\backslash before each "double quote"?


Did you ever really use multiline string literals before? I did, and it's 
mostly for quick hacks where I wrote a script or tried something out quickly. 
And maybe I needed to put an XML snippet into a unit test case to see if my 
parser correctly parses or correctly rejects the snippet. The current proposal 
doesn't help this use case in any way. I cannot see which use case inspires 
multiline string literals which require double quotes to be escaped... I 
wouldn't use them if they were available. I'd become an Android developer 
instead ;)

-Michael

> Am 28.04.2016 um 23:56 schrieb Brent Royal-Gordon via swift-evolution 
> <[email protected]>:
> 
>> Awesome.  Some specific suggestions below, but feel free to iterate in a 
>> pull request if you prefer that.
> 
> I've adopted these suggestions in some form, though I also ended up rewriting 
> the explanation of why the feature was designed as it is and fusing it with 
> material from "Alternatives considered".
> 
> (Still not sure who I should list as a co-author. I'm currently thinking 
> John, Tyler, and maybe Chris? Who's supposed to go there?)
> 
> Multiline string literals
> 
>       • Proposal: SE-NNNN
>       • Author(s): Brent Royal-Gordon
>       • Status: Second Draft
>       • Review manager: TBD
> Introduction
> 
> In Swift 2.2, the only means to insert a newline into a string literal is the 
> \n escape. String literals specified in this way are generally ugly and 
> unreadable. We propose a multiline string feature inspired by English 
> punctuation which is a straightforward extension of our existing string 
> literals.
> 
> This proposal is one step in a larger plan to improve how string literals 
> address various challenging use cases. It is not meant to solve all problems 
> with escaping, nor to serve all use cases involving very long string 
> literals. See the "Future directions for string literals in general" section 
> for a sketch of the problems we ultimately want to address and some ideas of 
> how we might do so.
> 
> Swift-evolution threads: multi-line string literals. (April), multi-line 
> string literals (December)
> 
> Draft Notes
> 
>       • Removes the comment feature, which was felt to be an unnecessary 
> complication. This and the backslash feature have been listed as future 
> directions. 
> 
>       • Loosens the specification of diagnostics, suggesting instead of 
> requiring fix-its.
> 
>       • Splits a "Rationale" section out of the "Proposed solution" section.
> 
>       • Adds extensive discussion of other features which wold combine with 
> this one.
> 
>       • I've listed only myself as an author because I don't want to put 
> anyone else's name to a document they haven't seen, but there are others who 
> deserve to be listed (John Holdsworth at least). Let me know if you think you 
> should be included.
> 
> Motivation
> 
> As Swift begins to move into roles beyond app development, code which needs 
> to generate text becomes a more important use case. Consider, for instance, 
> generating even a small XML string:
> 
> let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\" 
> empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
> The string is practically unreadable, its structure drowned in escapes and 
> run-together lines; it looks like little more than line noise. We can improve 
> its readability somewhat by concatenating separate strings for each line and 
> using real tabs instead of \t escapes:
> 
> let xml = "<?xml version=\"1.0\"?>\n" +
>  
>           
> "<catalog>\n" +
>  
>           
> " <book id=\"bk101\" empty=\"\">\n" +
>  
>           
> "     <author>\(author)</author>\n" +
>  
>           
> " </book>\n" +
>  
>           
> "</catalog>"
> However, this creates a more complex expression for the type checker, and 
> there's still far more punctuation than ought to be necessary. If the most 
> important goal of Swift is making code readable, this kind of code falls far 
> short of that goal.
> 
> Proposed solution
> 
> We propose that, when Swift is parsing a string literal, if it reaches the 
> end of the line without encountering an end quote, it should look at the next 
> line. If it sees a quote at the beginning (a "continuation quote"), the 
> string literal contains a newline and then continues on that line. Otherwise, 
> the string literal is unterminated and syntactically invalid.
> 
> Our sample above could thus be written as:
> 
> let xml = "<?xml version=\"1.0\"?>
>           "<catalog>
>           " <book id=\"bk101\" empty=\"\">
>           "     <author>\(author)</author>
>           " </book>
>           "</catalog>"
> 
> If the second or subsequent lines had not begun with a quotation mark, or the 
> trailing quotation mark after the </catalog>tag had not been included, Swift 
> would have emitted an error.
> 
> Rationale
> 
> This design is rather unusual, and it's worth pausing a moment to explain why 
> it has been chosen.
> 
> The traditional design for this feature, seen in languages like Perl and 
> Python, simply places one delimiter at the beginning of the literal and 
> another at the end. Individual lines in the literal are not marked in any 
> way. 
> 
> We think continuation quotes offer several important advantages over the 
> traditional design:
> 
>       • They help the compiler pinpoint errors in string literal delimiting. 
> Traditional multiline strings have a serious weakness: if you forget the 
> closing quote, the compiler has no idea where you wanted the literal to end. 
> It simply continues on until the compiler encounters another quote (or the 
> end of the file). If you're lucky, the text after that quote is not valid 
> code, and the resulting error will at least point you to the next string 
> literal in the file. If you're unlucky, you'll get a seemingly unrelated 
> error several literals later, an unbalanced brace error at the end of the 
> file, or perhaps even code that compiles but does something totally wrong.
> 
> (This is not a minor concern. Many popular languages, including C and Swift 
> 2, specifically reject newlines in string literals to prevent this from 
> happening.)
> 
> Continuation quotes provide the compiler with redundant information about 
> your intent. If you forget a closing quote, the continuation quotes give the 
> compiler a very good idea of where you meant to put it. The compiler can 
> point you to (or at least very near) the end of the literal, where you want 
> to insert the quote, rather than showing you the beginning of the literal or 
> even some unrelated error later in the file that was caused by the missing 
> quote.
> 
>       • Temporarily unclosed literals don't make editors go haywire. The 
> syntax highlighter has the same trouble parsing half-written, unclosed 
> traditional quotes that the compiler does: It can't tell where the literal is 
> supposed to end and the code should begin. It must either apply heuristics to 
> try to guess where the literal ends, or incorrectly color everything between 
> the opening quote and the next closing quote as a string literal. This can 
> cause the file's coloring to alternate distractingly between "string literal" 
> and "running code".
> 
> Continuation quotes give the syntax highlighter enough context to guess at 
> the correct coloration, even when the string isn't complete yet. Lines with a 
> continuation quote are literals; lines without are code. At worst, the syntax 
> highlighter might incorrectly color a few characters at the end of a line, 
> rather than the remainder of the file.
> 
>       • They separate indentation from the string's contents. Traditional 
> multiline strings usually include all of the content between the start and 
> end delimiters, including leading whitespace. This means that it's usually 
> impossible to indent a multiline string, so including one breaks up the flow 
> of the surrounding code, making it less readable. Some languages apply 
> heuristics or mode switches to try to remove indentation, but like all 
> heuristics, these are mistake-prone and murky.
> 
> Continuation quotes neatly avoid this problem. Whitespace before the 
> continuation quote is indentation used to format the source code; whitespace 
> after the continuation quote is part of the string literal. The 
> interpretation of the code is perfectly clear to both compiler and programmer.
> 
>       • They improve the ability to quickly recognize the literal. 
> Traditional multiline strings don't provide much visual help. To find the 
> end, you must visually scan until you find the matching delimiter, which may 
> be only one or a few characters long. When looking at a random line of 
> source, it can be hard to tell at a glance whether it's code or literal. 
> Syntax highlighting can help with these issues, but it's often unreliable, 
> especially with advanced, idiosyncratic string literal features like 
> multiline strings.
> 
> Continuation quotes solve these problems. To find the end of the literal, 
> just scan down the column of continuation characters until they end. To 
> figure out if a given line of source is part of a literal, just see if it 
> starts with a quote mark. The meaning of the source becomes obvious at a 
> glance.
> 
> Nevertheless, the traditional design does has a few advantages:
> 
>       • It is simpler. Although continuation quotes are more complex, we 
> believe that the advantages listed above pay for that complexity.
> 
>       • There is no need to edit the intervening lines to add continuation 
> quotes. While the additional effort required to insert continuation quotes is 
> an important downside, we believe that tool support, including both compiler 
> fix-its and perhaps editor support for commands like "Paste as String 
> Literal", can address this issue. In some editors, new features aren't even 
> necessary; TextMate, for instance, lets you insert a character on several 
> lines simultaneously. And new tool features could also address other issues 
> like escaping embedded quotes.
> 
>       • Naïve syntax highlighters may have trouble understanding this syntax. 
> This is true, but naïve syntax highlighters generally have terrible trouble 
> with advanced string literal constructs; some struggle with even basic ones. 
> While there are some designs (like Python's """ strings) which trick some 
> syntax highlighters into working some of the time with some contents, we 
> don't think this occasional, accidental compatibility is a big enough gain to 
> justify changing the design.
> 
>       • It looks funny—quotes should always be in matched pairs. We aren't 
> aware of another programming language which uses unbalanced quotes in string 
> literals, but there is one very important precedent for this kind of 
> formatting: natural languages. English, for instance, uses a very similar 
> format for quoting multiple lines of dialog by the same speaker. As an 
> English Stack Exchange answer illustrates:
> 
> “That seems like an odd way to use punctuation,” Tom said. “What harm would 
> there be in using quotation marks at the end of every paragraph?”
> 
> “Oh, that’s not all that complicated,” J.R. answered. “If you closed quotes 
> at the end of every paragraph, then you would need to reidentify the speaker 
> with every subsequent paragraph.
> 
> “Say a narrative was describing two or three people engaged in a lengthy 
> conversation. If you closed the quotation marks in the previous paragraph, 
> then a reader wouldn’t be able to easily tell if the previous speaker was 
> extending his point, or if someone else in the room had picked up the 
> conversation. By leaving the previous paragraph’s quote unclosed, the reader 
> knows that the previous speaker is still the one talking.”
> 
> “Oh, that makes sense. Thanks!”
> In English, omitting the ending quotation mark tells the text's reader that 
> the quote continues on the next line, while including a quotation mark at the 
> beginning of the next line reminds the reader that they're in the middle of a 
> quote.
> 
> Similarly, in this proposal, omitting the ending quotation mark tells the 
> code's reader (and compiler) that the string literal continues on the next 
> line, while including a quotation mark at the beginning of the next line 
> reminds the reader (and compiler) that they're in the middle of a string 
> literal.
> 
> On balance, we think continuation quotes are the best design for this problem.
> 
> Detailed design
> 
> When Swift is parsing a string literal and reaches the end of a line without 
> finding a closing quote, it examines the next line, applying the following 
> rules:
> 
>       • If the next line begins with whitespace followed by a continuation 
> quote, then the string literal contains a newline followed by the contents of 
> the string literal starting on that line. (This line may itself have no 
> closing quote, in which case the same rules apply to the line which follows.)
> 
>       • If the next line contains anything else, Swift raises a syntax error 
> for an unterminated string literal. 
> 
> The exact error messages and diagnostics provided are left to the 
> implementers to determine, but we believe it should be possible to provide 
> two fix-its which will help users learn the syntax and correct string literal 
> mistakes:
> 
>       • Insert " at the end of the current line to terminate the quote.
> 
>       • Insert " at the beginning of the next line (with some indentation 
> heuristics) to continue the quote on the next line.
> 
> Impact on existing code
> 
> Failing to close a string literal before the end of the line is currently a 
> syntax error, so no valid Swift code should be affected by this change.
> 
> Future directions for multiline string literals
> 
>       • We could permit comments before encountering a continuation quote to 
> be counted as whitespace, and permit empty lines in the middle of string 
> literals. This would allow you to comment out whole lines in the literal.
> 
>       • We could allow you to put a trailing backslash on a line to indicate 
> that the newline isn't "real" and should be omitted from the literal's 
> contents.
> 
> Future directions for string literals in general
> 
> There are other issues with Swift's string handling which this proposal 
> intentionally does not address:
> 
>       • Reducing the amount of double-backslashing needed when working with 
> regular expression libraries, Windows paths, source code generation, and 
> other tasks where backslashes are part of the data.
> 
>       • Alternate delimiters or other strategies for writing strings with " 
> characters in them.
> 
>       • Accommodating code formatting concerns like hard wrapping and 
> commenting.
> 
>       • String literals consisting of very long pieces of text which are best 
> represented completely verbatim, with minimal alteration.
> 
> This section briefly outlines some future proposals which might address these 
> issues. Combined, we believe they would address most of the string literal 
> use cases which Swift is currently not very good at.
> 
> Please note that these are simply sketches of hypothetical future designs; 
> they may radically change before proposal, and some may never be proposed at 
> all. Many, perhaps most, will not be proposed for Swift 3. We are sketching 
> these designs not to propose and refine these features immediately, but 
> merely to show how we think they might be solved in ways which complement 
> this proposal.
> 
> String literal modifiers
> 
> A string literal modifier is a cluster of identifier characters which goes 
> before a string literal and adjusts the way it is parsed. Modifers only alter 
> the interpretation of the text in the literal, not the type of data it 
> produces; for instance, there will never be something like the 
> UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a 
> feature; lowercase characters disable a feature.
> 
> Modifiers can be attached to both single-line and multiline literals, and 
> could also be attached to other literal syntaxes which might be introduced in 
> the future. When used with multiline strings, only the starting quote needs 
> to carry the modifiers, not the continuation quotes.
> 
> Modifiers are an extremely flexible feature which can be used for many 
> proposes. Of the ideas listed below, we believe the e modifier is an urgent 
> addition which should be included in Swift 3 if at all possible; the others 
> are less urgent and most of them could be deferred, or at least added later 
> if time allows.
> 
>       • Escape disabling: e"\\\" (string with three backslash characters)
> 
>       • Fine-grained escape disabling: i"\(foo)\n" (the string \(foo) 
> followed by a newline); eI"\(foo)\n" (the contents of foo followed by the 
> string \n), b"\w+\n" (the string \w+ followed by a newline)
> 
>       • Alternate delimiters: _ has no lowercase form, so it could be used to 
> allow strings with internal quotes: _"print("Hello, world!")"_, 
> __"print("Hello, world!")"__, etc.
> 
>       • Whitespace normalization: changes all runs of whitespace in the 
> literal to single space characters; this would allow you to use multiline 
> strings purely to improve code formatting.
> 
> alert.informativeText =
>     W"\(appName) could not typeset the element “\(title)” because 
>      "it includes a link to an element that has been removed from this 
>      "book."
> 
>       • Localization: 
> 
> alert.informativeText =
>     LW"\(appName) could not typeset the element “\(title)” because 
>       "it includes a link to an element that has been removed from this 
>       "book."
> 
>       • Comments: Embedding comments in string literals might be useful for 
> literals containing regular expressions or other code.
> 
> Eventually, user-specified string modifiers could be added to Swift, perhaps 
> as part of a hygienic macro system. It might also become possible to change 
> the default modifiers applied to literals in a particular file or scope.
> 
> Heredocs or other "verbatim string literal" features
> 
> Sometimes it really is best to just splat something else down in the middle 
> of a file full of Swift source code. Maybe the file is essentially a template 
> and the literals are a majority of the code's contents, or maybe you're 
> writing a code generator and just want to get string data into it with 
> minimal fuss, or maybe people unfamiliar with Swift need to be able to edit 
> the literals. Whatever the reason, the normal string literal syntax is just 
> too burdensome.
> 
> One approach to this problem is heredocs. A heredoc allows you to put a 
> placeholder for a literal on one line; the contents of the literal begin on 
> the next line, running up to some delimiter. It would be possible to put 
> multiple placeholders in a single line, and to apply string modifiers to them.
> 
> In Swift, this might look like:
> 
> print(#to("---") + e#to("END"
> ))
> It was a dark and stormy \(timeOfDay) when 
> ---
> the Swift core team invented the \(interpolation) syntax.
> END
> 
> Another possible approach would be to support traditional multiline string 
> literals bounded by a different delimiter, like """. This might look like:
> 
> print("""
> It was a dark and stormy \(timeOfDay) when 
> """ + e"""
> the Swift core team invented the \(interpolation) syntax.
> """)
> Although heredocs could make a good addition to Swift eventually, there are 
> good reasons to defer them for now. Please see the "Alternatives considered" 
> section for details.
> 
> First-class regular expressions
> 
> Members of the core team are interested in regular expressions, but they 
> don't want to just build a literal that wraps PCRE or libicu; rather, they 
> aim to integrate regexes into the pattern matching system and give them a 
> deep, Perl 6-style rethink. This would be a major effort, far beyond the 
> scope of Swift 3.
> 
> In the meantime, the e modifier and perhaps other string literal modifiers 
> will make it easier to specify regular expressions in string literals for use 
> with NSRegularExpression and other libraries accessible from Swift.
> 
> Alternatives considered
> 
> Requiring no continuation character
> 
> The main alternative is to not require a continuation quote, and simply 
> extend the string literal from the starting quote to the ending quote, 
> including all newlines between them. For example:
> 
> let xml = "<?xml version=\"1.0\"?>
> <catalog>
>     <book id=\"bk101\" empty=\"\">
>         <author>\(author)</author>
>     </book>
> </catalog>"
> This alternative is extensively discussed in the "Rationale" section above.
> 
> Skip multiline strings and just support heredocs
> 
> There are definitely cases where a heredoc would be a better solution, such 
> as generated code or code which is mostly literals with a little Swift 
> sprinkled around. On the other hand, there are also cases where multiline 
> strings are better: short strings in code which is meant to be read. If a 
> single feature can't handle them both well, there's no shame in supporting 
> the two features separately.
> 
> It makes sense to support multiline strings first because:
> 
>       • They extend existing syntax instead of introducing new syntax.
> 
>       • They are much easier to parse; heredocs require some kind of mode in 
> the parser which kicks in at the start of the next line, whereas multiline 
> string literals can be handled in the lexer.
> 
>       • As discussed in "Rationale", they offer better diagnostics, code 
> formatting, and visual scannability.
> 
> Use a different delimiter for multiline strings
> 
> The initial suggestion was that multiline strings should use a different 
> delimiter, """, at the beginning and end of the string, with no continuation 
> characters between. Like heredocs, this might be a good alternative for 
> certain use cases, but it has the same basic flaws as the "no continuation 
> character" solution.
> 
> -- 
> Brent Royal-Gordon
> Architechies
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] multi-line string literals.

Reply via email to