You always have to have some exceptional case, though, don't you?  What if
you have a string literal that contains every single character?  Or what if
you have literals in procedurally generated code that might contain any
unknown character?  There's always a possibility that a given delimiter
(sequence of) character(s) might be duplicated inside the literal.  Isn't
there?

Carl Eastlund

On Sat, Sep 21, 2013 at 7:24 AM, Felix Klock <pnkfe...@mozilla.com> wrote:

> Kevin (cc'ing rust-dev)-
>
> Of the choices listed here, I prefer the C++11 syntax.
>
> Whatever syntax we choose, I would prefer one that has user-selected
> delimiting character sequences (as illustrated by the cases of D and
> C++11).  From my point-of-view, that is the only way to get a "raw string"
> that really means raw string; otherwise, you end up having to select some
> exceptional case (e.g. the backslashes, doubled-up quotes, etc of the other
> options Kevin described).
>
> Cheers,
> -Felix
>
> ----- Original Message -----
> From: "Kevin Ballard" <ke...@sb.org>
> To: rust-dev@mozilla.org
> Sent: Thursday, September 19, 2013 10:36:39 PM
> Subject: [rust-dev] RFC: Syntax for "raw" string literals
>
> One feature common to many programming languages that Rust lacks is "raw"
> string literals. Specifically, these are string literals that don't
> interpret backslash-escapes. There are three obvious applications at the
> moment: regular expressions, windows file paths, and format!() strings that
> want to embed { and } chars. I'm sure there are more as well, such as large
> string literals that contain things like HTML text.
>
> I took a look at 3 programming languages to see what solutions they had:
> D, C++11, and Python. I've reproduced their syntax below, plus one more
> custom syntax, along with pros & cons. I'm hoping we can come up with a
> syntax that makes sense for Rust.
>
> ## Python syntax:
>
> Python supports an "r" or "R" prefix on any string literal (both "short"
> strings, delimited with a single quote, or "long" strings, delimited with 3
> quotes). The "r" or "R" prefix denotes a "raw string", and has the effect
> of disabling backslash-escapes within the string. For the most part. It
> actually gets a bit weird: if a sequence of backslashes of an odd length
> occurs prior to a quote (of the appropriate quote type for the string),
> then the quote is considered to be escaped, but the backslashes are left in
> the string. This means r"foo\"" evaluates to the string `foo\"`, and
> similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the string
> `foo\\`.
>
> Pros:
> * Simple syntax
> * Allows for embedding the closing quote character in the raw string
>
> Cons:
> * Handling of backslashes is very bizarre, and the closing quote character
> can only be embedded if you want to have a backslash before it.
>
> ## C++11 syntax:
>
> C++11 allows for raw strings using a sequence of the form R"seq(raw
> text)seq". In this construct, `seq` is any sequence of (zero or more)
> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form
> looks like R"(raw text)", which allows for anything in the raw text except
> for the sequence `)"`. The addition of the delimiter sequence allows for
> constructing a raw string containing any sequence at all (as the delimiter
> sequence can be adjusted based on the represented text).
>
> Pros:
> * Allows for embedding any character at all (representable in the source
> file encoding), including the closing quote.
> * Reasonably straightforward
>
> Cons:
> * Syntax is slightly complicated
>
> ## D syntax:
>
> D supports three different forms of raw strings. The first two are
> similar, being r"raw text" and `raw text`. Besides the choice of
> delimiters, they behave identically, in that the raw text may contain
> anything except for the appropriate quote character. The third syntax is a
> slightly more complicated form of C++11's syntax, and is called a delimited
> string. It takes two forms.
>
> The first looks like q"(raw text)" where the ( may be any non-identifier
> non-whitespace character. If the character is one of [(<{ then it is a
> "nesting delimiter", and the close delimiter must be the matching ])>}
> character, otherwise the close delimiter is the same as the open.
> Furthermore, nesting delimiters do exactly what their name says: they nest.
> If the nesting delimiter is (), then any ( in the raw text must be balanced
> with a ) in the raw text. In other words, q"(foo(bar))" evaluates to
> "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal.
>
> The second uses any identifier as the delimiter. In this case, the
> identifier must immediately be followed by a newline, and in order to close
> the string, the close delimiter must be preceded by a newline. This looks
> like
>
> q"delim
> this is some raw text
> delim"
>
> It's essentially a heredoc. Note that the first newline is not part of the
> string, but the final newline is, so this evaluates to "this is some raw
> text\n".
>
> Pros:
> * Flexible
> * Allows for constructing a raw string that contains any desired sequence
> of characters (representable in the source file's encoding)
>
> Cons:
> * Overly complicated
>
> ## Custom syntax
>
> There's another approach that none of these three languages take, which is
> to merely allow for doubling up the quote character in order to embed a
> quote. This would look like R"raw string literal ""with embedded
> quotes"".", which becomes `raw string literal "with embedded quotes"`.
>
> Pros:
> * Very simple
> * Allows for embedding the close quote character, and therefore, any
> character (representable in the source file encoding)
>
> Cons:
> * Slightly odd to read
>
> ## Conclusion
>
> Of the three existing syntaxes examined here, I think C++11's is the best.
> It ties with D's syntax for being the most powerful, but is simpler than
> D's. The custom syntax is just as powerful though. The benefit of the C++11
> syntax over the custom syntax is it's slightly easier to read the C++11
> syntax, as the raw text has a 1-to-one mapping with the resulting string.
> The custom syntax is a bit more confusing to read, especially if you want
> to add multiple quotes. As a pathological case, let's try representing a
> Python triple-quoted docstring using both syntaxes:
>
> C++11: R"("""this is a python docstring""")"
> Custom: R"""""""this is a python docstring"""""""
>
> Based on this examination, I'm leaning towards saying Rust should support
> C++11's raw string literal syntax.
>
> I welcome any comments, criticisms, or suggestions.
>
> -Kevin
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to