I've written up a proposal for multi-line string literals. Before proposing it 
officially, I would like to get some informal feedback.

How can I propose it officially? Do I have to convert it to Markdown? I have no 
idea how to create a Markdown version of this, with all the quotes and funny 
characters in it ;)

-Michael

***

MULTI-LINE STRING LITERALS

- Proposal: SE-xxxx
- Author: Michael Peternell
- Status:
- Review manager:

INTRODUCTION

Multi-line string literals allow text that may be multiple lines long, to be 
included verbatim into a string literal. The string may even contain quote 
characters (" or '), and they don't have to be specially escaped.

MOTIVATION

Including many lines of text in a program often looks not so well, e.g. a 
JSON-string where ever quote needs to be escaped: 
"{\"response\":{\"result\":\"OK\"}}". With multi-line string literals, we can 
write """{"response":{"result":"OK"}}""" - note that every valid JSON can be 
pasted as-is into a """3-quote string literal""", because 3 quotes (""") cannot 
appear in a valid JSON. (Why would you want to have a JSON-string in a program? 
Maybe you are writing unit tests for a JSON parser.) Another usage example is 
below.

Some people had concerns that a string block may break the indentation of the 
code. E.g.

                // some deeply indented code
                doSomeStuff(2, 33.1)
                print("""Usage: \(program_name) <PARAM-X> <PARAM-Y> filename
Example: \(program_name) 3 1 countries.csv
This will print the 1st column of the 3rd non-empty non-header line from
countries.csv
"""             )
                exit(2)

That's the reason why there is also a HEREDOC-syntax in the proposal that can 
solve this problem. The example can be rewritten as:

                // some deeply indented code
                doSomeStuff(2, 33.1)
                print(<<USAGE_END)
                    Example: \(program_name) 3 1 countries.csv
                    This will print the 1st column of the 3rd non-empty
                    non-header line from countries.csv

                    USAGE_END
                exit(2)

This works unambiguously, as long as you don't mix tabs and spaces in your 
source code file.

PROPOSED SOLUTION

This proposal introduces three new forms of a String literal:

let INTERPOLATION = "String interpolation"

1. The """Python-style string literal. 3 Quotes (") at the beginning, 3 Quotes 
at the end, and Swift \(INTERPOLATION) is possible."""

2. The <<HERE_DOC, the string literal starts on the next line:
    A hereDoc may contain multiple lines. Leading space on each
    line is automatically truncated if the HERE_DOC delimiter
    is also indented. \(INTERPOLATION) is possible.
    HERE_DOC

3. A <<'HERE_DOC' with single quotes around them.
This is almost the same as a heredoc without single quotes, but text is 
included as-is.
You may include \ or " or ' or whatever (\") is just a backslash followed by a 
double quote.
The leading space rule is the same as for the other HERE_DOC.
Swift String interpolation is not possible here.
HERE_DOC

DETAILED DESIGN

The first type of String (the """Python-style multiline string""") behaves 
exactly like the "ordinary string literal", except for a few differences:
- a line-break doesn't result in an error, but is normally integrated into the 
strings value
- an included " doesn't end the string and does not need to be quoted.
- If you want to include """ in the string, you have to write ""\". This is a 
rare use-case, and if you really need to do that, you may as well use one of 
the HERE_DOC-styles instead.

The second type of String (the <<HERE_DOC with string-interpolation) include 
all lines after the line where HERE_DOC appears, until the HERE_DOC delimiter 
line. The last newline before the HERE_DOC delimiter line is automatically 
truncated from the string; otherwise it would not be possible to create a 
HERE_DOC string literal that does not end with a newline character. If you want 
to end the string literal with a newline character, you need an empty line 
before the HERE_DOC delimiter line (as in the "usage"-example above). The 
HERE_DOC delimiter line contains optional whitespace at the beginning, followed 
by the HERE_DOC token. If the line contains leading whitespace, all lines 
within the literal have to contain exactly the same amount of leading 
whitespace. E.g. if the HERE_DOC-line contains 4 spaces, followed by 
"HERE_DOC", each line in the string literal has to start with 4 spaces as well 
(using one tab instead, or less white space, would be a parse error.) Empty 
lines within the string literal are exempt from this requirement. They just 
translate to "\n". (Fineprint: if the HEREDOC delimiter line is "\t\tHEREDOC" 
and one of the lines in the string literal are just "    " then it is not 
decidable wether the line should translate to "\n" (if "\t" is like "  " or 
larger) or to "  \n" (if "\t" is like " "), so this would also result in a 
parse error. The whitespace before the HERE_DOC on the HERE_DOC delimiter line 
must contain only spaces or only tabs, but not a mixture of both. These rules 
are a bit complicated for the language implementor, but for the user of the 
feature, they have an important advantage: if the code compiles, the string 
literal will behave as expected. Just don't mix tabs and spaces and you'll be 
fine.)

The third type of string is exactly the same as the second type, with the only 
difference that the <<HERE_DOC syntax is changed to <<'HERE_DOC', and that all 
string interpolation and escape sequences are disabled within the literal. The 
end token is still HERE_DOC without single quotes, and not 'HERE_DOC'. The 
rules about leading whitespace on the HERE_DOC delimiter line are the same as 
for the second type.

For the HERE_DOC token, everything that is a valid variable name is allowed, so 
<<hello, <<END_OF_XML are all valid, but <<2442 is not. Furthermore, ever token 
that matches /[a-zA-Z]+/ is also valid, so <<class should be okay as well. (The 
usual practice is to use SCREAMING_SNAKE_CASE tokens as delimiters.)

IMPACT ON EXISTING CODE

This is an add-on feature. Code that uses these multi-line string literals 
didn't even compile with previous versions of Swift, so no existing code can 
break because of this change.

ALTERNATIVES CONSIDERED

1. Just copy all String-handling rules from Perl ;)

2. String literals of the form

    _"text text
    "text text"_

I don't like the continuation quote, and so it doesn't solve the problem that I 
am trying to solve with this proposal. The same if true for a string literal 
where you would have to start each line with \\ .

3. eXML"a string literal that starts with e, followed by some token, and that 
ends with a quote (") followed by the same token"XML. This has the advantage, 
that you can put anything between the start and the end, and that you can 
choose a delimiter. It's a flexible solution. I prefer HERE_DOC's though, 
because they are an already well-known programming language construct.

4. Do nothing, and just use string concatenation: "this string\n"+
"with newlines in it\n" works well. Maybe the optimizer can optimize this away 
anyways, so there wouldn't even be a performance cost.

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to