How complicated would it be to use R"" but with arbitrary paired
delimiters (the way, for instance, ruby does it)? It's very handy to
pick a delimiter you know does not appear in the string, e.g. if you
had a string containing ')' you could use R{this is a string with a )
in it} or R|this is a string with a ) in it|.

martin

On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <ke...@sb.org> wrote:
> One feature common to many programming languages that Rust lacks is "raw" 
> string literals. Specifically, these are string literals that don't interpret 
> backslash-escapes. There are three obvious applications at the moment: 
> regular expressions, windows file paths, and format!() strings that want to 
> embed { and } chars. I'm sure there are more as well, such as large string 
> literals that contain things like HTML text.
>
> I took a look at 3 programming languages to see what solutions they had: D, 
> C++11, and Python. I've reproduced their syntax below, plus one more custom 
> syntax, along with pros & cons. I'm hoping we can come up with a syntax that 
> makes sense for Rust.
>
> ## Python syntax:
>
> Python supports an "r" or "R" prefix on any string literal (both "short" 
> strings, delimited with a single quote, or "long" strings, delimited with 3 
> quotes). The "r" or "R" prefix denotes a "raw string", and has the effect of 
> disabling backslash-escapes within the string. For the most part. It actually 
> gets a bit weird: if a sequence of backslashes of an odd length occurs prior 
> to a quote (of the appropriate quote type for the string), then the quote is 
> considered to be escaped, but the backslashes are left in the string. This 
> means r"foo\"" evaluates to the string `foo\"`, and similarly r"foo\\\"" is 
> `foo\\\"`, but r"foo\\" is merely the string `foo\\`.
>
> Pros:
> * Simple syntax
> * Allows for embedding the closing quote character in the raw string
>
> Cons:
> * Handling of backslashes is very bizarre, and the closing quote character 
> can only be embedded if you want to have a backslash before it.
>
> ## C++11 syntax:
>
> C++11 allows for raw strings using a sequence of the form R"seq(raw 
> text)seq". In this construct, `seq` is any sequence of (zero or more) 
> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
> looks like R"(raw text)", which allows for anything in the raw text except 
> for the sequence `)"`. The addition of the delimiter sequence allows for 
> constructing a raw string containing any sequence at all (as the delimiter 
> sequence can be adjusted based on the represented text).
>
> Pros:
> * Allows for embedding any character at all (representable in the source file 
> encoding), including the closing quote.
> * Reasonably straightforward
>
> Cons:
> * Syntax is slightly complicated
>
> ## D syntax:
>
> D supports three different forms of raw strings. The first two are similar, 
> being r"raw text" and `raw text`. Besides the choice of delimiters, they 
> behave identically, in that the raw text may contain anything except for the 
> appropriate quote character. The third syntax is a slightly more complicated 
> form of C++11's syntax, and is called a delimited string. It takes two forms.
>
> The first looks like q"(raw text)" where the ( may be any non-identifier 
> non-whitespace character. If the character is one of [(<{ then it is a 
> "nesting delimiter", and the close delimiter must be the matching ])>} 
> character, otherwise the close delimiter is the same as the open. 
> Furthermore, nesting delimiters do exactly what their name says: they nest. 
> If the nesting delimiter is (), then any ( in the raw text must be balanced 
> with a ) in the raw text. In other words, q"(foo(bar))" evaluates to 
> "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal.
>
> The second uses any identifier as the delimiter. In this case, the identifier 
> must immediately be followed by a newline, and in order to close the string, 
> the close delimiter must be preceded by a newline. This looks like
>
> q"delim
> this is some raw text
> delim"
>
> It's essentially a heredoc. Note that the first newline is not part of the 
> string, but the final newline is, so this evaluates to "this is some raw 
> text\n".
>
> Pros:
> * Flexible
> * Allows for constructing a raw string that contains any desired sequence of 
> characters (representable in the source file's encoding)
>
> Cons:
> * Overly complicated
>
> ## Custom syntax
>
> There's another approach that none of these three languages take, which is to 
> merely allow for doubling up the quote character in order to embed a quote. 
> This would look like R"raw string literal ""with embedded quotes"".", which 
> becomes `raw string literal "with embedded quotes"`.
>
> Pros:
> * Very simple
> * Allows for embedding the close quote character, and therefore, any 
> character (representable in the source file encoding)
>
> Cons:
> * Slightly odd to read
>
> ## Conclusion
>
> Of the three existing syntaxes examined here, I think C++11's is the best. It 
> ties with D's syntax for being the most powerful, but is simpler than D's. 
> The custom syntax is just as powerful though. The benefit of the C++11 syntax 
> over the custom syntax is it's slightly easier to read the C++11 syntax, as 
> the raw text has a 1-to-one mapping with the resulting string. The custom 
> syntax is a bit more confusing to read, especially if you want to add 
> multiple quotes. As a pathological case, let's try representing a Python 
> triple-quoted docstring using both syntaxes:
>
> C++11: R"("""this is a python docstring""")"
> Custom: R"""""""this is a python docstring"""""""
>
> Based on this examination, I'm leaning towards saying Rust should support 
> C++11's raw string literal syntax.
>
> I welcome any comments, criticisms, or suggestions.
>
> -Kevin
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to