One feature common to many programming languages that Rust lacks is "raw" string literals. Specifically, these are string literals that don't interpret backslash-escapes. There are three obvious applications at the moment: regular expressions, windows file paths, and format!() strings that want to embed { and } chars. I'm sure there are more as well, such as large string literals that contain things like HTML text.
I took a look at 3 programming languages to see what solutions they had: D, C++11, and Python. I've reproduced their syntax below, plus one more custom syntax, along with pros & cons. I'm hoping we can come up with a syntax that makes sense for Rust. ## Python syntax: Python supports an "r" or "R" prefix on any string literal (both "short" strings, delimited with a single quote, or "long" strings, delimited with 3 quotes). The "r" or "R" prefix denotes a "raw string", and has the effect of disabling backslash-escapes within the string. For the most part. It actually gets a bit weird: if a sequence of backslashes of an odd length occurs prior to a quote (of the appropriate quote type for the string), then the quote is considered to be escaped, but the backslashes are left in the string. This means r"foo\"" evaluates to the string `foo\"`, and similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the string `foo\\`. Pros: * Simple syntax * Allows for embedding the closing quote character in the raw string Cons: * Handling of backslashes is very bizarre, and the closing quote character can only be embedded if you want to have a backslash before it. ## C++11 syntax: C++11 allows for raw strings using a sequence of the form R"seq(raw text)seq". In this construct, `seq` is any sequence of (zero or more) characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form looks like R"(raw text)", which allows for anything in the raw text except for the sequence `)"`. The addition of the delimiter sequence allows for constructing a raw string containing any sequence at all (as the delimiter sequence can be adjusted based on the represented text). Pros: * Allows for embedding any character at all (representable in the source file encoding), including the closing quote. * Reasonably straightforward Cons: * Syntax is slightly complicated ## D syntax: D supports three different forms of raw strings. The first two are similar, being r"raw text" and `raw text`. Besides the choice of delimiters, they behave identically, in that the raw text may contain anything except for the appropriate quote character. The third syntax is a slightly more complicated form of C++11's syntax, and is called a delimited string. It takes two forms. The first looks like q"(raw text)" where the ( may be any non-identifier non-whitespace character. If the character is one of [(<{ then it is a "nesting delimiter", and the close delimiter must be the matching ])>} character, otherwise the close delimiter is the same as the open. Furthermore, nesting delimiters do exactly what their name says: they nest. If the nesting delimiter is (), then any ( in the raw text must be balanced with a ) in the raw text. In other words, q"(foo(bar))" evaluates to "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal. The second uses any identifier as the delimiter. In this case, the identifier must immediately be followed by a newline, and in order to close the string, the close delimiter must be preceded by a newline. This looks like q"delim this is some raw text delim" It's essentially a heredoc. Note that the first newline is not part of the string, but the final newline is, so this evaluates to "this is some raw text\n". Pros: * Flexible * Allows for constructing a raw string that contains any desired sequence of characters (representable in the source file's encoding) Cons: * Overly complicated ## Custom syntax There's another approach that none of these three languages take, which is to merely allow for doubling up the quote character in order to embed a quote. This would look like R"raw string literal ""with embedded quotes"".", which becomes `raw string literal "with embedded quotes"`. Pros: * Very simple * Allows for embedding the close quote character, and therefore, any character (representable in the source file encoding) Cons: * Slightly odd to read ## Conclusion Of the three existing syntaxes examined here, I think C++11's is the best. It ties with D's syntax for being the most powerful, but is simpler than D's. The custom syntax is just as powerful though. The benefit of the C++11 syntax over the custom syntax is it's slightly easier to read the C++11 syntax, as the raw text has a 1-to-one mapping with the resulting string. The custom syntax is a bit more confusing to read, especially if you want to add multiple quotes. As a pathological case, let's try representing a Python triple-quoted docstring using both syntaxes: C++11: R"("""this is a python docstring""")" Custom: R"""""""this is a python docstring""""""" Based on this examination, I'm leaning towards saying Rust should support C++11's raw string literal syntax. I welcome any comments, criticisms, or suggestions. -Kevin _______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev