The delimiter can be whatever you want in C++11 syntax (well, with restrictions on the charset, but among that charset it's freeform). You can _always_ pick a delimiter that isn't found in the text.
If you're procedurally generating the text, surely you can also write an algorithm to pick a delimiter. It's not very hard to do so. -Kevin On Sep 21, 2013, at 1:52 PM, Carl Eastlund <c...@ccs.neu.edu> wrote: > You always have to have some exceptional case, though, don't you? What if > you have a string literal that contains every single character? Or what if > you have literals in procedurally generated code that might contain any > unknown character? There's always a possibility that a given delimiter > (sequence of) character(s) might be duplicated inside the literal. Isn't > there? > > Carl Eastlund > > On Sat, Sep 21, 2013 at 7:24 AM, Felix Klock <pnkfe...@mozilla.com> wrote: > Kevin (cc'ing rust-dev)- > > Of the choices listed here, I prefer the C++11 syntax. > > Whatever syntax we choose, I would prefer one that has user-selected > delimiting character sequences (as illustrated by the cases of D and C++11). > From my point-of-view, that is the only way to get a "raw string" that really > means raw string; otherwise, you end up having to select some exceptional > case (e.g. the backslashes, doubled-up quotes, etc of the other options Kevin > described). > > Cheers, > -Felix > > ----- Original Message ----- > From: "Kevin Ballard" <ke...@sb.org> > To: rust-dev@mozilla.org > Sent: Thursday, September 19, 2013 10:36:39 PM > Subject: [rust-dev] RFC: Syntax for "raw" string literals > > One feature common to many programming languages that Rust lacks is "raw" > string literals. Specifically, these are string literals that don't interpret > backslash-escapes. There are three obvious applications at the moment: > regular expressions, windows file paths, and format!() strings that want to > embed { and } chars. I'm sure there are more as well, such as large string > literals that contain things like HTML text. > > I took a look at 3 programming languages to see what solutions they had: D, > C++11, and Python. I've reproduced their syntax below, plus one more custom > syntax, along with pros & cons. I'm hoping we can come up with a syntax that > makes sense for Rust. > > ## Python syntax: > > Python supports an "r" or "R" prefix on any string literal (both "short" > strings, delimited with a single quote, or "long" strings, delimited with 3 > quotes). The "r" or "R" prefix denotes a "raw string", and has the effect of > disabling backslash-escapes within the string. For the most part. It actually > gets a bit weird: if a sequence of backslashes of an odd length occurs prior > to a quote (of the appropriate quote type for the string), then the quote is > considered to be escaped, but the backslashes are left in the string. This > means r"foo\"" evaluates to the string `foo\"`, and similarly r"foo\\\"" is > `foo\\\"`, but r"foo\\" is merely the string `foo\\`. > > Pros: > * Simple syntax > * Allows for embedding the closing quote character in the raw string > > Cons: > * Handling of backslashes is very bizarre, and the closing quote character > can only be embedded if you want to have a backslash before it. > > ## C++11 syntax: > > C++11 allows for raw strings using a sequence of the form R"seq(raw > text)seq". In this construct, `seq` is any sequence of (zero or more) > characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form > looks like R"(raw text)", which allows for anything in the raw text except > for the sequence `)"`. The addition of the delimiter sequence allows for > constructing a raw string containing any sequence at all (as the delimiter > sequence can be adjusted based on the represented text). > > Pros: > * Allows for embedding any character at all (representable in the source file > encoding), including the closing quote. > * Reasonably straightforward > > Cons: > * Syntax is slightly complicated > > ## D syntax: > > D supports three different forms of raw strings. The first two are similar, > being r"raw text" and `raw text`. Besides the choice of delimiters, they > behave identically, in that the raw text may contain anything except for the > appropriate quote character. The third syntax is a slightly more complicated > form of C++11's syntax, and is called a delimited string. It takes two forms. > > The first looks like q"(raw text)" where the ( may be any non-identifier > non-whitespace character. If the character is one of [(<{ then it is a > "nesting delimiter", and the close delimiter must be the matching ])>} > character, otherwise the close delimiter is the same as the open. > Furthermore, nesting delimiters do exactly what their name says: they nest. > If the nesting delimiter is (), then any ( in the raw text must be balanced > with a ) in the raw text. In other words, q"(foo(bar))" evaluates to > "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal. > > The second uses any identifier as the delimiter. In this case, the identifier > must immediately be followed by a newline, and in order to close the string, > the close delimiter must be preceded by a newline. This looks like > > q"delim > this is some raw text > delim" > > It's essentially a heredoc. Note that the first newline is not part of the > string, but the final newline is, so this evaluates to "this is some raw > text\n". > > Pros: > * Flexible > * Allows for constructing a raw string that contains any desired sequence of > characters (representable in the source file's encoding) > > Cons: > * Overly complicated > > ## Custom syntax > > There's another approach that none of these three languages take, which is to > merely allow for doubling up the quote character in order to embed a quote. > This would look like R"raw string literal ""with embedded quotes"".", which > becomes `raw string literal "with embedded quotes"`. > > Pros: > * Very simple > * Allows for embedding the close quote character, and therefore, any > character (representable in the source file encoding) > > Cons: > * Slightly odd to read > > ## Conclusion > > Of the three existing syntaxes examined here, I think C++11's is the best. It > ties with D's syntax for being the most powerful, but is simpler than D's. > The custom syntax is just as powerful though. The benefit of the C++11 syntax > over the custom syntax is it's slightly easier to read the C++11 syntax, as > the raw text has a 1-to-one mapping with the resulting string. The custom > syntax is a bit more confusing to read, especially if you want to add > multiple quotes. As a pathological case, let's try representing a Python > triple-quoted docstring using both syntaxes: > > C++11: R"("""this is a python docstring""")" > Custom: R"""""""this is a python docstring""""""" > > Based on this examination, I'm leaning towards saying Rust should support > C++11's raw string literal syntax. > > I welcome any comments, criticisms, or suggestions. > > -Kevin > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev >
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev