The way that Lua does raw strings is also fairly nifty.  Check out
http://www.lua.org/manual/5.2/manual.html, section 3.1, or, in short:

- Strings can be delimited by "[===[", with any number of equals signs.
 The corresponding closing delimiter must match the original number of
equals signs.
- No escaping is done.
- Any kind of end-of-line sequence (i.e. "\r" and "\n" in any order) is
converted to just a newline.
- It can run for multiple lines.

--Andrew D


On Thu, Sep 19, 2013 at 10:28 PM, Kevin Cantu <m...@kevincantu.org> wrote:

> I think designing good traits to support all these text implementations is
> far more important than whatever hungarian notation is preferred for
> literals.
>
>
> Kevin
>
>
> On Thu, Sep 19, 2013 at 2:50 PM, Martin DeMello 
> <martindeme...@gmail.com>wrote:
>
>> Ah, good point. You could fix it by having a very small whitelist of
>> acceptable delimiters, but that probably takes it into overcomplex
>> territory.
>>
>> martin
>>
>> On Thu, Sep 19, 2013 at 2:46 PM, Kevin Ballard <ke...@sb.org> wrote:
>> > As I just responded to Masklinn, this is ambiguous. How do you lex `do
>> R{foo()}`?
>> >
>> > -Kevin
>> >
>> > On Sep 19, 2013, at 2:41 PM, Martin DeMello <martindeme...@gmail.com>
>> wrote:
>> >
>> >> Yes, I figured R followed by a non-alphabetical character could serve
>> >> the same purpose as ruby's %<char>.
>> >>
>> >> martin
>> >>
>> >> On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard <ke...@sb.org> wrote:
>> >>> I didn't look at Ruby's syntax, but what you just described sounds a
>> little too free-form to me. I believe Ruby at least requires a % as part of
>> the syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for
>> rust, as it would conflict with the % operator. I don't think other
>> punctuation would work well either.
>> >>>
>> >>> -Kevin
>> >>>
>> >>> On Sep 19, 2013, at 2:10 PM, Martin DeMello <martindeme...@gmail.com>
>> wrote:
>> >>>
>> >>>> How complicated would it be to use R"" but with arbitrary paired
>> >>>> delimiters (the way, for instance, ruby does it)? It's very handy to
>> >>>> pick a delimiter you know does not appear in the string, e.g. if you
>> >>>> had a string containing ')' you could use R{this is a string with a )
>> >>>> in it} or R|this is a string with a ) in it|.
>> >>>>
>> >>>> martin
>> >>>>
>> >>>> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <ke...@sb.org> wrote:
>> >>>>> One feature common to many programming languages that Rust lacks is
>> "raw" string literals. Specifically, these are string literals that don't
>> interpret backslash-escapes. There are three obvious applications at the
>> moment: regular expressions, windows file paths, and format!() strings that
>> want to embed { and } chars. I'm sure there are more as well, such as large
>> string literals that contain things like HTML text.
>> >>>>>
>> >>>>> I took a look at 3 programming languages to see what solutions they
>> had: D, C++11, and Python. I've reproduced their syntax below, plus one
>> more custom syntax, along with pros & cons. I'm hoping we can come up with
>> a syntax that makes sense for Rust.
>> >>>>>
>> >>>>> ## Python syntax:
>> >>>>>
>> >>>>> Python supports an "r" or "R" prefix on any string literal (both
>> "short" strings, delimited with a single quote, or "long" strings,
>> delimited with 3 quotes). The "r" or "R" prefix denotes a "raw string", and
>> has the effect of disabling backslash-escapes within the string. For the
>> most part. It actually gets a bit weird: if a sequence of backslashes of an
>> odd length occurs prior to a quote (of the appropriate quote type for the
>> string), then the quote is considered to be escaped, but the backslashes
>> are left in the string. This means r"foo\"" evaluates to the string
>> `foo\"`, and similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the
>> string `foo\\`.
>> >>>>>
>> >>>>> Pros:
>> >>>>> * Simple syntax
>> >>>>> * Allows for embedding the closing quote character in the raw string
>> >>>>>
>> >>>>> Cons:
>> >>>>> * Handling of backslashes is very bizarre, and the closing quote
>> character can only be embedded if you want to have a backslash before it.
>> >>>>>
>> >>>>> ## C++11 syntax:
>> >>>>>
>> >>>>> C++11 allows for raw strings using a sequence of the form R"seq(raw
>> text)seq". In this construct, `seq` is any sequence of (zero or more)
>> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form
>> looks like R"(raw text)", which allows for anything in the raw text except
>> for the sequence `)"`. The addition of the delimiter sequence allows for
>> constructing a raw string containing any sequence at all (as the delimiter
>> sequence can be adjusted based on the represented text).
>> >>>>>
>> >>>>> Pros:
>> >>>>> * Allows for embedding any character at all (representable in the
>> source file encoding), including the closing quote.
>> >>>>> * Reasonably straightforward
>> >>>>>
>> >>>>> Cons:
>> >>>>> * Syntax is slightly complicated
>> >>>>>
>> >>>>> ## D syntax:
>> >>>>>
>> >>>>> D supports three different forms of raw strings. The first two are
>> similar, being r"raw text" and `raw text`. Besides the choice of
>> delimiters, they behave identically, in that the raw text may contain
>> anything except for the appropriate quote character. The third syntax is a
>> slightly more complicated form of C++11's syntax, and is called a delimited
>> string. It takes two forms.
>> >>>>>
>> >>>>> The first looks like q"(raw text)" where the ( may be any
>> non-identifier non-whitespace character. If the character is one of [(<{
>> then it is a "nesting delimiter", and the close delimiter must be the
>> matching ])>} character, otherwise the close delimiter is the same as the
>> open. Furthermore, nesting delimiters do exactly what their name says: they
>> nest. If the nesting delimiter is (), then any ( in the raw text must be
>> balanced with a ) in the raw text. In other words, q"(foo(bar))" evaluates
>> to "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal.
>> >>>>>
>> >>>>> The second uses any identifier as the delimiter. In this case, the
>> identifier must immediately be followed by a newline, and in order to close
>> the string, the close delimiter must be preceded by a newline. This looks
>> like
>> >>>>>
>> >>>>> q"delim
>> >>>>> this is some raw text
>> >>>>> delim"
>> >>>>>
>> >>>>> It's essentially a heredoc. Note that the first newline is not part
>> of the string, but the final newline is, so this evaluates to "this is some
>> raw text\n".
>> >>>>>
>> >>>>> Pros:
>> >>>>> * Flexible
>> >>>>> * Allows for constructing a raw string that contains any desired
>> sequence of characters (representable in the source file's encoding)
>> >>>>>
>> >>>>> Cons:
>> >>>>> * Overly complicated
>> >>>>>
>> >>>>> ## Custom syntax
>> >>>>>
>> >>>>> There's another approach that none of these three languages take,
>> which is to merely allow for doubling up the quote character in order to
>> embed a quote. This would look like R"raw string literal ""with embedded
>> quotes"".", which becomes `raw string literal "with embedded quotes"`.
>> >>>>>
>> >>>>> Pros:
>> >>>>> * Very simple
>> >>>>> * Allows for embedding the close quote character, and therefore,
>> any character (representable in the source file encoding)
>> >>>>>
>> >>>>> Cons:
>> >>>>> * Slightly odd to read
>> >>>>>
>> >>>>> ## Conclusion
>> >>>>>
>> >>>>> Of the three existing syntaxes examined here, I think C++11's is
>> the best. It ties with D's syntax for being the most powerful, but is
>> simpler than D's. The custom syntax is just as powerful though. The benefit
>> of the C++11 syntax over the custom syntax is it's slightly easier to read
>> the C++11 syntax, as the raw text has a 1-to-one mapping with the resulting
>> string. The custom syntax is a bit more confusing to read, especially if
>> you want to add multiple quotes. As a pathological case, let's try
>> representing a Python triple-quoted docstring using both syntaxes:
>> >>>>>
>> >>>>> C++11: R"("""this is a python docstring""")"
>> >>>>> Custom: R"""""""this is a python docstring"""""""
>> >>>>>
>> >>>>> Based on this examination, I'm leaning towards saying Rust should
>> support C++11's raw string literal syntax.
>> >>>>>
>> >>>>> I welcome any comments, criticisms, or suggestions.
>> >>>>>
>> >>>>> -Kevin
>> >>>>> _______________________________________________
>> >>>>> Rust-dev mailing list
>> >>>>> Rust-dev@mozilla.org
>> >>>>> https://mail.mozilla.org/listinfo/rust-dev
>> >>>
>> >
>> _______________________________________________
>> Rust-dev mailing list
>> Rust-dev@mozilla.org
>> https://mail.mozilla.org/listinfo/rust-dev
>>
>
>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
>
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to