Ah, good point. You could fix it by having a very small whitelist of
acceptable delimiters, but that probably takes it into overcomplex
territory.

martin

On Thu, Sep 19, 2013 at 2:46 PM, Kevin Ballard <ke...@sb.org> wrote:
> As I just responded to Masklinn, this is ambiguous. How do you lex `do 
> R{foo()}`?
>
> -Kevin
>
> On Sep 19, 2013, at 2:41 PM, Martin DeMello <martindeme...@gmail.com> wrote:
>
>> Yes, I figured R followed by a non-alphabetical character could serve
>> the same purpose as ruby's %<char>.
>>
>> martin
>>
>> On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard <ke...@sb.org> wrote:
>>> I didn't look at Ruby's syntax, but what you just described sounds a little 
>>> too free-form to me. I believe Ruby at least requires a % as part of the 
>>> syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for rust, 
>>> as it would conflict with the % operator. I don't think other punctuation 
>>> would work well either.
>>>
>>> -Kevin
>>>
>>> On Sep 19, 2013, at 2:10 PM, Martin DeMello <martindeme...@gmail.com> wrote:
>>>
>>>> How complicated would it be to use R"" but with arbitrary paired
>>>> delimiters (the way, for instance, ruby does it)? It's very handy to
>>>> pick a delimiter you know does not appear in the string, e.g. if you
>>>> had a string containing ')' you could use R{this is a string with a )
>>>> in it} or R|this is a string with a ) in it|.
>>>>
>>>> martin
>>>>
>>>> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <ke...@sb.org> wrote:
>>>>> One feature common to many programming languages that Rust lacks is "raw" 
>>>>> string literals. Specifically, these are string literals that don't 
>>>>> interpret backslash-escapes. There are three obvious applications at the 
>>>>> moment: regular expressions, windows file paths, and format!() strings 
>>>>> that want to embed { and } chars. I'm sure there are more as well, such 
>>>>> as large string literals that contain things like HTML text.
>>>>>
>>>>> I took a look at 3 programming languages to see what solutions they had: 
>>>>> D, C++11, and Python. I've reproduced their syntax below, plus one more 
>>>>> custom syntax, along with pros & cons. I'm hoping we can come up with a 
>>>>> syntax that makes sense for Rust.
>>>>>
>>>>> ## Python syntax:
>>>>>
>>>>> Python supports an "r" or "R" prefix on any string literal (both "short" 
>>>>> strings, delimited with a single quote, or "long" strings, delimited with 
>>>>> 3 quotes). The "r" or "R" prefix denotes a "raw string", and has the 
>>>>> effect of disabling backslash-escapes within the string. For the most 
>>>>> part. It actually gets a bit weird: if a sequence of backslashes of an 
>>>>> odd length occurs prior to a quote (of the appropriate quote type for the 
>>>>> string), then the quote is considered to be escaped, but the backslashes 
>>>>> are left in the string. This means r"foo\"" evaluates to the string 
>>>>> `foo\"`, and similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely 
>>>>> the string `foo\\`.
>>>>>
>>>>> Pros:
>>>>> * Simple syntax
>>>>> * Allows for embedding the closing quote character in the raw string
>>>>>
>>>>> Cons:
>>>>> * Handling of backslashes is very bizarre, and the closing quote 
>>>>> character can only be embedded if you want to have a backslash before it.
>>>>>
>>>>> ## C++11 syntax:
>>>>>
>>>>> C++11 allows for raw strings using a sequence of the form R"seq(raw 
>>>>> text)seq". In this construct, `seq` is any sequence of (zero or more) 
>>>>> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
>>>>> looks like R"(raw text)", which allows for anything in the raw text 
>>>>> except for the sequence `)"`. The addition of the delimiter sequence 
>>>>> allows for constructing a raw string containing any sequence at all (as 
>>>>> the delimiter sequence can be adjusted based on the represented text).
>>>>>
>>>>> Pros:
>>>>> * Allows for embedding any character at all (representable in the source 
>>>>> file encoding), including the closing quote.
>>>>> * Reasonably straightforward
>>>>>
>>>>> Cons:
>>>>> * Syntax is slightly complicated
>>>>>
>>>>> ## D syntax:
>>>>>
>>>>> D supports three different forms of raw strings. The first two are 
>>>>> similar, being r"raw text" and `raw text`. Besides the choice of 
>>>>> delimiters, they behave identically, in that the raw text may contain 
>>>>> anything except for the appropriate quote character. The third syntax is 
>>>>> a slightly more complicated form of C++11's syntax, and is called a 
>>>>> delimited string. It takes two forms.
>>>>>
>>>>> The first looks like q"(raw text)" where the ( may be any non-identifier 
>>>>> non-whitespace character. If the character is one of [(<{ then it is a 
>>>>> "nesting delimiter", and the close delimiter must be the matching ])>} 
>>>>> character, otherwise the close delimiter is the same as the open. 
>>>>> Furthermore, nesting delimiters do exactly what their name says: they 
>>>>> nest. If the nesting delimiter is (), then any ( in the raw text must be 
>>>>> balanced with a ) in the raw text. In other words, q"(foo(bar))" 
>>>>> evaluates to "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both 
>>>>> illegal.
>>>>>
>>>>> The second uses any identifier as the delimiter. In this case, the 
>>>>> identifier must immediately be followed by a newline, and in order to 
>>>>> close the string, the close delimiter must be preceded by a newline. This 
>>>>> looks like
>>>>>
>>>>> q"delim
>>>>> this is some raw text
>>>>> delim"
>>>>>
>>>>> It's essentially a heredoc. Note that the first newline is not part of 
>>>>> the string, but the final newline is, so this evaluates to "this is some 
>>>>> raw text\n".
>>>>>
>>>>> Pros:
>>>>> * Flexible
>>>>> * Allows for constructing a raw string that contains any desired sequence 
>>>>> of characters (representable in the source file's encoding)
>>>>>
>>>>> Cons:
>>>>> * Overly complicated
>>>>>
>>>>> ## Custom syntax
>>>>>
>>>>> There's another approach that none of these three languages take, which 
>>>>> is to merely allow for doubling up the quote character in order to embed 
>>>>> a quote. This would look like R"raw string literal ""with embedded 
>>>>> quotes"".", which becomes `raw string literal "with embedded quotes"`.
>>>>>
>>>>> Pros:
>>>>> * Very simple
>>>>> * Allows for embedding the close quote character, and therefore, any 
>>>>> character (representable in the source file encoding)
>>>>>
>>>>> Cons:
>>>>> * Slightly odd to read
>>>>>
>>>>> ## Conclusion
>>>>>
>>>>> Of the three existing syntaxes examined here, I think C++11's is the 
>>>>> best. It ties with D's syntax for being the most powerful, but is simpler 
>>>>> than D's. The custom syntax is just as powerful though. The benefit of 
>>>>> the C++11 syntax over the custom syntax is it's slightly easier to read 
>>>>> the C++11 syntax, as the raw text has a 1-to-one mapping with the 
>>>>> resulting string. The custom syntax is a bit more confusing to read, 
>>>>> especially if you want to add multiple quotes. As a pathological case, 
>>>>> let's try representing a Python triple-quoted docstring using both 
>>>>> syntaxes:
>>>>>
>>>>> C++11: R"("""this is a python docstring""")"
>>>>> Custom: R"""""""this is a python docstring"""""""
>>>>>
>>>>> Based on this examination, I'm leaning towards saying Rust should support 
>>>>> C++11's raw string literal syntax.
>>>>>
>>>>> I welcome any comments, criticisms, or suggestions.
>>>>>
>>>>> -Kevin
>>>>> _______________________________________________
>>>>> Rust-dev mailing list
>>>>> Rust-dev@mozilla.org
>>>>> https://mail.mozilla.org/listinfo/rust-dev
>>>
>
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to