I didn't look at Ruby's syntax, but what you just described sounds a little too 
free-form to me. I believe Ruby at least requires a % as part of the syntax, 
e.g. %q{test}. But I don't think %R{test} is a good idea for rust, as it would 
conflict with the % operator. I don't think other punctuation would work well 
either.

-Kevin

On Sep 19, 2013, at 2:10 PM, Martin DeMello <martindeme...@gmail.com> wrote:

> How complicated would it be to use R"" but with arbitrary paired
> delimiters (the way, for instance, ruby does it)? It's very handy to
> pick a delimiter you know does not appear in the string, e.g. if you
> had a string containing ')' you could use R{this is a string with a )
> in it} or R|this is a string with a ) in it|.
> 
> martin
> 
> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <ke...@sb.org> wrote:
>> One feature common to many programming languages that Rust lacks is "raw" 
>> string literals. Specifically, these are string literals that don't 
>> interpret backslash-escapes. There are three obvious applications at the 
>> moment: regular expressions, windows file paths, and format!() strings that 
>> want to embed { and } chars. I'm sure there are more as well, such as large 
>> string literals that contain things like HTML text.
>> 
>> I took a look at 3 programming languages to see what solutions they had: D, 
>> C++11, and Python. I've reproduced their syntax below, plus one more custom 
>> syntax, along with pros & cons. I'm hoping we can come up with a syntax that 
>> makes sense for Rust.
>> 
>> ## Python syntax:
>> 
>> Python supports an "r" or "R" prefix on any string literal (both "short" 
>> strings, delimited with a single quote, or "long" strings, delimited with 3 
>> quotes). The "r" or "R" prefix denotes a "raw string", and has the effect of 
>> disabling backslash-escapes within the string. For the most part. It 
>> actually gets a bit weird: if a sequence of backslashes of an odd length 
>> occurs prior to a quote (of the appropriate quote type for the string), then 
>> the quote is considered to be escaped, but the backslashes are left in the 
>> string. This means r"foo\"" evaluates to the string `foo\"`, and similarly 
>> r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the string `foo\\`.
>> 
>> Pros:
>> * Simple syntax
>> * Allows for embedding the closing quote character in the raw string
>> 
>> Cons:
>> * Handling of backslashes is very bizarre, and the closing quote character 
>> can only be embedded if you want to have a backslash before it.
>> 
>> ## C++11 syntax:
>> 
>> C++11 allows for raw strings using a sequence of the form R"seq(raw 
>> text)seq". In this construct, `seq` is any sequence of (zero or more) 
>> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
>> looks like R"(raw text)", which allows for anything in the raw text except 
>> for the sequence `)"`. The addition of the delimiter sequence allows for 
>> constructing a raw string containing any sequence at all (as the delimiter 
>> sequence can be adjusted based on the represented text).
>> 
>> Pros:
>> * Allows for embedding any character at all (representable in the source 
>> file encoding), including the closing quote.
>> * Reasonably straightforward
>> 
>> Cons:
>> * Syntax is slightly complicated
>> 
>> ## D syntax:
>> 
>> D supports three different forms of raw strings. The first two are similar, 
>> being r"raw text" and `raw text`. Besides the choice of delimiters, they 
>> behave identically, in that the raw text may contain anything except for the 
>> appropriate quote character. The third syntax is a slightly more complicated 
>> form of C++11's syntax, and is called a delimited string. It takes two forms.
>> 
>> The first looks like q"(raw text)" where the ( may be any non-identifier 
>> non-whitespace character. If the character is one of [(<{ then it is a 
>> "nesting delimiter", and the close delimiter must be the matching ])>} 
>> character, otherwise the close delimiter is the same as the open. 
>> Furthermore, nesting delimiters do exactly what their name says: they nest. 
>> If the nesting delimiter is (), then any ( in the raw text must be balanced 
>> with a ) in the raw text. In other words, q"(foo(bar))" evaluates to 
>> "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal.
>> 
>> The second uses any identifier as the delimiter. In this case, the 
>> identifier must immediately be followed by a newline, and in order to close 
>> the string, the close delimiter must be preceded by a newline. This looks 
>> like
>> 
>> q"delim
>> this is some raw text
>> delim"
>> 
>> It's essentially a heredoc. Note that the first newline is not part of the 
>> string, but the final newline is, so this evaluates to "this is some raw 
>> text\n".
>> 
>> Pros:
>> * Flexible
>> * Allows for constructing a raw string that contains any desired sequence of 
>> characters (representable in the source file's encoding)
>> 
>> Cons:
>> * Overly complicated
>> 
>> ## Custom syntax
>> 
>> There's another approach that none of these three languages take, which is 
>> to merely allow for doubling up the quote character in order to embed a 
>> quote. This would look like R"raw string literal ""with embedded quotes"".", 
>> which becomes `raw string literal "with embedded quotes"`.
>> 
>> Pros:
>> * Very simple
>> * Allows for embedding the close quote character, and therefore, any 
>> character (representable in the source file encoding)
>> 
>> Cons:
>> * Slightly odd to read
>> 
>> ## Conclusion
>> 
>> Of the three existing syntaxes examined here, I think C++11's is the best. 
>> It ties with D's syntax for being the most powerful, but is simpler than 
>> D's. The custom syntax is just as powerful though. The benefit of the C++11 
>> syntax over the custom syntax is it's slightly easier to read the C++11 
>> syntax, as the raw text has a 1-to-one mapping with the resulting string. 
>> The custom syntax is a bit more confusing to read, especially if you want to 
>> add multiple quotes. As a pathological case, let's try representing a Python 
>> triple-quoted docstring using both syntaxes:
>> 
>> C++11: R"("""this is a python docstring""")"
>> Custom: R"""""""this is a python docstring"""""""
>> 
>> Based on this examination, I'm leaning towards saying Rust should support 
>> C++11's raw string literal syntax.
>> 
>> I welcome any comments, criticisms, or suggestions.
>> 
>> -Kevin
>> _______________________________________________
>> Rust-dev mailing list
>> Rust-dev@mozilla.org
>> https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to