Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-24 Thread Benjamin Striegel
 I think string literals should contain exactly what they contain in their
source form, without any additional processing. If you want to express
characters that are inconvenient to type, you can use control sequences and
a (standard) formatting library to produce them.

I'm actually very intrigued by the idea of eliminating escape characters
altogether in the default string literals. Would follow nicely from how we
allow newlines in string literals. We'd have to give up the optional
whitespace-chomping behavior around newlines, though, which would make me
pretty sad. And are you really willing to force everyone who wants to
include a quotation mark in a string to go through a syntax extension to do
it?

facetious

People, please! Using delimiters on string literals is tantamount to
checking for null to determine when you've reached the end of a string in
memory. We've graduated beyond those barbarous days by explicitly noting
the length of each string in the header, so let's just reuse that idea!
Behold, Rust's new string literals:

fn main() {
print(#7hello);
print(#2, );
print(#5world);
}

/facetious


On Sun, Sep 22, 2013 at 5:32 PM, Sebastian Sylvan 
sebastian.syl...@gmail.com wrote:




 On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:

 One feature common to many programming languages that Rust lacks is raw
 string literals.


 This is one of those things where I feel almost all languages get wrong,
 and probably mostly for historical reasons. IMO there should *only* be raw
 string literals on the syntax level. It seems extremely weird to me that
 languages have this second-level language that gets interpreted within a
 literal. That kind of higher level processing should be part of a
 formatting library (e.g. a macro like fmt), rather than an embedded
 language inside the literal syntax. So, I think string literals should
 contain exactly what they contain in their source form, without any
 additional processing. If you want to express characters that are
 inconvenient to type, you can use control sequences and a (standard)
 formatting library to produce them.

 --
 Sebastian Sylvan

 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-24 Thread Sebastian Sylvan
On Tue, Sep 24, 2013 at 12:44 PM, Benjamin Striegel
ben.strie...@gmail.comwrote:

  I think string literals should contain exactly what they contain in
 their source form, without any additional processing. If you want to
 express characters that are inconvenient to type, you can use control
 sequences and a (standard) formatting library to produce them.

 I'm actually very intrigued by the idea of eliminating escape characters
 altogether in the default string literals. Would follow nicely from how we
 allow newlines in string literals. We'd have to give up the optional
 whitespace-chomping behavior around newlines, though, which would make me
 pretty sad. And are you really willing to force everyone who wants to
 include a quotation mark in a string to go through a syntax extension to do
 it?


Yes! It seems to me that many/most string literals are used for in
conjunction with various formatting functions anyway, so I wouldn't think
it would be a big deal in practice. Throwing in a call to fmt isn't a big
burden, imo.


-- 
Sebastian Sylvan
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-23 Thread Steven Ashley
I also forgot to mention the possibility of putting a filename as the eos
string. I think its kind of neat.

r##index.html##
html
 ...
/html
##index.html##
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Steven Ashley
Hi everyone,

Have we considered syntax similar to Ruby style heredocs? I particularly
like the light looking syntax.

- The indentation of the block is determined by the indentation of the eos
marker. Keeping code flow natural.

eos
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud
eos

- Brackets in the eos marker are flipped to allow [[[raw]]]

- eoseos causes a literal eos to be inserted. For example a raw
string

My main concern is that  might be a common operator. Perhaps  would be
ok?

Thoughts?
On 21/09/2013 4:28 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric (with
  using [[ and nary a quote in sight).

 I think an important thing to keep in mind is that the main reason
 behind creating a new form of literal is for things like:

 * Escapes in format! strings
 * Possible regular expression syntax (this also may be a syntax extension)
 * Type literal windows paths (escaping \ is hard)
 * Otherwise long literals which may contain quotes (like html text)

 With those in mind, although Lua's syntax is sufficient, is it nice to
 use? If the first thing I saw as an introduction to Rust was:

 fn main() {
   println!([[Hello, {}!]], world);
 }

 I would be a little confused. Now the [[/]] aren't really necessary in
 this case, but I'm personally unsure of how usable [[/]] would be
 throughout the language. Raw literals in languages like C++ and Lua I
 think aren't intended to be used that often. Instead they should be
 used only when necessary, and you frequently don't see them in code.
 For rust, the use cases which are the cause of this discussion are
 actually fairly common, and I'm not sure that we'd want to see [[/]]
 all over the place, although of course that's just my opinion :)

 Skimming back, I haven't seen a suggestion of the backtick character
 as a delimiter. Go takes this approach, and I don't believe that in Go
 you can have a backtick anywhere in a backtick literal, and otherwise
 what you see is what you get. It's at least something to consider,
 though.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Steven Ashley
Oh right, that's fair enough. I think the indentation/escaping issues can
be fixed however the new line issues you mentioned will still exist for
strings split over multiple lines using this syntax.

Good luck!

Steven

On Monday, September 23, 2013, Kevin Ballard wrote:

 Heredocs are primarily intended for multiline strings. Raw strings are
 intended for strings that have no escapes. Raw strings typically allow
 newlines, but that is not their primary purpose (and in Rust, regular
 strings allow newlines anyway). Trying to use a heredoc syntax for raw
 strings is just a headache (because of indentation, and dealing with the
 first and/or trailing newline in the heredoc).

 -Kevin

 On Sep 22, 2013, at 11:52 AM, Artem Egorkine art...@gmail.com wrote:

 I must be missing something about ruby heredocs, but the indentation had
 always been a painful question about them (
 http://stackoverflow.com/questions/3772864/how-do-i-remove-leading-whitespace-chars-from-ruby-heredoc).
 Another thing, of course, it's that they are by no means raw (which of
 course doesn't stop rust from adopting their syntax for raw strings. I
 would just say that it would be nice to pick such syntax for raw strings
 that allows for both single line raw strings and multi-line raw strings to
 be represented easily.
 On Sep 22, 2013 1:00 PM, Steven Ashley ste...@ashley.net.nz wrote:

 Hi everyone,

 Have we considered syntax similar to Ruby style heredocs? I particularly
 like the light looking syntax.

 - The indentation of the block is determined by the indentation of the
 eos marker. Keeping code flow natural.

 eos
 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
 eiusmod tempor
 incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud
 eos

 - Brackets in the eos marker are flipped to allow [[[raw]]]

 - eoseos causes a literal eos to be inserted. For example a raw
 string

 My main concern is that  might be a common operator. Perhaps  would
 be ok?

 Thoughts?
 On 21/09/2013 4:28 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric
 (with
  using [[ and nary a quote in sight).

 I think an important thing to keep in mind is that the main reason
 behind creating a new form of literal is for things like:

 * Escapes in format! strings
 * Possible regular expression syntax (this also may be a syntax
 extension)
 * Type literal windows paths (escaping \ is hard)
 * Otherwise long literals which may contain quotes (like html text)

 With those in mind, although Lua's syntax is sufficient, is it nice to
 use? If the first thing I saw as an introduction to Rust was:

 fn main() {
   println!([[Hello, {}!]], world);
 }

 I would be a little confused. Now the [[/]] aren't really necessary in
 this case, but I'm personally unsure of how usable [[/]] would be
 throughout the language. Raw literals in languages like C++ and Lua I
 think aren't intended to be used that often. Instead they should be
 used only when necessary, and you frequently don't see them in code.
 For rust, the use cases which are the cause of this discussion are
 actually fairly common, and I'm not sure that we'd want to see [[/]]
 all over the place, although of course that's just my opinion :)

 Skimming back, I haven't seen a suggestion of the backtick character
 as a delimiter. Go takes this approach, and I don't believe that in Go
 you can have a backtick anywhere in a backtick literal, and otherwise
 what you see is what you get. It's at least something to consider,
 though.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev



___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Steven Ashley
I'm in favour of C++11 syntax.

On Monday, September 23, 2013, Steven Ashley wrote:

 Oh right, that's fair enough. I think the indentation/escaping issues can
 be fixed however the new line issues you mentioned will still exist for
 strings split over multiple lines using this syntax.

 Good luck!

 Steven

 On Monday, September 23, 2013, Kevin Ballard wrote:

 Heredocs are primarily intended for multiline strings. Raw strings are
 intended for strings that have no escapes. Raw strings typically allow
 newlines, but that is not their primary purpose (and in Rust, regular
 strings allow newlines anyway). Trying to use a heredoc syntax for raw
 strings is just a headache (because of indentation, and dealing with the
 first and/or trailing newline in the heredoc).

 -Kevin

 On Sep 22, 2013, at 11:52 AM, Artem Egorkine art...@gmail.com wrote:

 I must be missing something about ruby heredocs, but the indentation had
 always been a painful question about them (
 http://stackoverflow.com/questions/3772864/how-do-i-remove-leading-whitespace-chars-from-ruby-heredoc).
 Another thing, of course, it's that they are by no means raw (which of
 course doesn't stop rust from adopting their syntax for raw strings. I
 would just say that it would be nice to pick such syntax for raw strings
 that allows for both single line raw strings and multi-line raw strings to
 be represented easily.
 On Sep 22, 2013 1:00 PM, Steven Ashley ste...@ashley.net.nz wrote:

 Hi everyone,

 Have we considered syntax similar to Ruby style heredocs? I particularly
 like the light looking syntax.

 - The indentation of the block is determined by the indentation of the eos
 marker. Keeping code flow natural.

 eos
 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
 eiusmod tempor
 incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud
 eos

 - Brackets in the eos marker are flipped to allow [[[raw]]]

 - eoseos causes a literal eos to be inserted. For example a raw
 string

 My main concern is that  might be a common operator. Perhaps  would
 be ok?

 Thoughts?
 On 21/09/2013 4:28 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric (with
  using [[ and nary a quote in sight).

 I think an important thing to keep in mind is that the main reason
 behind creating a new form of literal is for things like:

 * Escapes in format! strings
 * Possible regular expression syntax (this also may be a syntax extension)
 * Type literal windows paths (escaping \ is hard)
 * Otherwise long literals which may contain quotes (like html text)

 With those in mind, although Lua's syntax is sufficient, is it nice to
 use? If the first thing I saw as an introduction to Rust was:

 fn main() {
   println!([[Hello, {}!]], world);
 }

 I would be a little confused. Now the [[/]] aren't really necessary in
 this case, but I'm personally unsure of how usable [[/]] would be
 throughout the language. Raw literals in languages like C++ and Lua I
 think aren't intended to be used that often. Instead they should be
 used only when necessary, and you frequently don't see them in code.
 For rust, the use cases which are the cause of this discussion are
 actually fairly common, and I'm not sure that we'd want to see [[/]]
 all over the place, although of course that's just my opinion :)

 Skimming back, I haven't seen a suggestion of the backtick character
 as a delimiter. Go takes this approach, and I don't believe that in Go
 you can have a backtick anywhere in a backtick literal, and otherwise
 what you see is what you get. It's at least something to consider,
 though.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


 

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Sebastian Sylvan
On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:

 One feature common to many programming languages that Rust lacks is raw
 string literals.


This is one of those things where I feel almost all languages get wrong,
and probably mostly for historical reasons. IMO there should *only* be raw
string literals on the syntax level. It seems extremely weird to me that
languages have this second-level language that gets interpreted within a
literal. That kind of higher level processing should be part of a
formatting library (e.g. a macro like fmt), rather than an embedded
language inside the literal syntax. So, I think string literals should
contain exactly what they contain in their source form, without any
additional processing. If you want to express characters that are
inconvenient to type, you can use control sequences and a (standard)
formatting library to produce them.

-- 
Sebastian Sylvan
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread SiegeLord

On 09/22/2013 05:40 PM, Kevin Ballard wrote:

I've filed a summary of this conversation as an RFC issue on the GitHub issue 
tracker.

https://github.com/mozilla/rust/issues/9411


I've used a variation of the option 10 for my own configuration format's 
raw strings:


delimraw textdelim

Where delim was an equivalent of an identifier.

If ` is a problem, then maybe using ' works too?

'delimraw textdelim'

'raw text'

-SL
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Kevin Ballard
' doesn't work because 'delim is parsed as a lifetime.

-Kevin

On Sep 22, 2013, at 3:41 PM, SiegeLord slab...@aim.com wrote:

 On 09/22/2013 05:40 PM, Kevin Ballard wrote:
 I've filed a summary of this conversation as an RFC issue on the GitHub 
 issue tracker.
 
 https://github.com/mozilla/rust/issues/9411
 
 I've used a variation of the option 10 for my own configuration format's raw 
 strings:
 
 delimraw textdelim
 
 Where delim was an equivalent of an identifier.
 
 If ` is a problem, then maybe using ' works too?
 
 'delimraw textdelim'
 
 'raw text'
 
 -SL
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread SiegeLord

On 09/22/2013 07:10 PM, Kevin Ballard wrote:

' doesn't work because 'delim is parsed as a lifetime.


The parser will have to be modified to support raw strings in any of 
their manifestations. Is it a fact that there is no possible parser than 
can differentiate between 'delim and 'delim ? I guess it'll give 
trouble to this current syntax 'fooblah, but it wouldn't be the first 
place in the grammar where a space was necessary to disambiguate between 
constructs (  comes to mind).


-SL
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Kevin Ballard
It would require changing the rules for lifetimes, with no benefit (and no 
clear new rule to use anyway). 'foodelim is perfectly legal today, and I see 
no reason to change that.

-Kevin

On Sep 22, 2013, at 4:26 PM, SiegeLord slab...@aim.com wrote:

 On 09/22/2013 07:10 PM, Kevin Ballard wrote:
 ' doesn't work because 'delim is parsed as a lifetime.
 
 The parser will have to be modified to support raw strings in any of their 
 manifestations. Is it a fact that there is no possible parser than can 
 differentiate between 'delim and 'delim ? I guess it'll give trouble to this 
 current syntax 'fooblah, but it wouldn't be the first place in the grammar 
 where a space was necessary to disambiguate between constructs (  comes to 
 mind).
 
 -SL

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread SiegeLord

On 09/22/2013 07:45 PM, Kevin Ballard wrote:

It would require changing the rules for lifetimes, with no benefit (and no clear new rule to 
use anyway). 'foodelim is perfectly legal today, and I see no reason to 
change that.

It's not as big a change as you make it out to be, but fair enough.

Looking at the parser right now, it seems to me that implementing the 
leading 'R' in C++'s syntax will be just as difficult/easy as doing my 
delimstuffdelim proposal so I'm sticking to that idea as my 'vote'.


If C++ way is chosen, I'd suggest the following permutation of the 
delimeters, as I think it looks lighter (by virtue of using smaller 
characters):


r'delimraw stringdelim'
r'raw string'

-SL
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Kevin Ballard
On Sep 22, 2013, at 5:27 PM, SiegeLord slab...@aim.com wrote:

 On 09/22/2013 07:45 PM, Kevin Ballard wrote:
 It would require changing the rules for lifetimes, with no benefit (and no 
 clear new rule to use anyway). 'foodelim is perfectly legal today, and I 
 see no reason to change that.
 It's not as big a change as you make it out to be, but fair enough.
 
 Looking at the parser right now, it seems to me that implementing the leading 
 'R' in C++'s syntax will be just as difficult/easy as doing my 
 delimstuffdelim proposal so I'm sticking to that idea as my 'vote'.

With C++11 syntax, `Rfoo` is very obviously the start of a raw string. With 
your syntax, what about `addfoo`? Is that obviously the start of a raw string, 
or did the user just forget to type the ( in their function call? They may look 
the same to a lexer, but I think that being very clear about what starts the 
raw string is beneficial for reading.

 If C++ way is chosen, I'd suggest the following permutation of the 
 delimeters, as I think it looks lighter (by virtue of using smaller 
 characters):
 
 r'delimraw stringdelim'
 r'raw string'

I'd really rather not overload the meaning of the ' character, if at all 
possible. Right now it's used for lifetimes, and character literals. Expanding 
it to also be used in string literals just feels like unnecessary overloading. 
We already have a perfectly good  that means string literal. I suppose you 
could flip that to rdelim'raw string'delim or r'raw string'. I just don't 
see why that's any better than Rdelim(raw string)delim or R(raw string). 
Especially in the r'raw string' case, having lots of little tick marks in a 
row takes more effort to visually distinguish. I suppose r(raw string) is an 
option, but if we're that close to C++11 we may as well just go whole hog and 
be consistent with their syntax.

-Kevin
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-21 Thread Jordi Boggiano
On 20.09.2013 22:35, Benjamin Striegel wrote:
 As usual, I'm highly resistant to use of the backtick because Markdown
 uses it pervasively. Not only would this make it very annoying to embed
 Markdown in strings, it can make it impossible to embed inline Rust code
 in Markdown editors. Let's leave the backtick as a metasyntactic symbol.

I am not so sure the markdown argument stands, because it is only an
issue in `inline code blocks` really. Blocks fenced with ``` or 4-space
indents can contain backticks just fine, and can typically do in bash
scripts.

In inline blocks, you can always escape them with \` which sure isn't as
nice, but I find it rare to have much more than alpha-numeric
identifiers in inline blocks.

Cheers

-- 
Jordi Boggiano
@seldaek - http://nelm.io/jordi
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-21 Thread Felix Klock
Kevin (cc'ing rust-dev)-

Of the choices listed here, I prefer the C++11 syntax.

Whatever syntax we choose, I would prefer one that has user-selected delimiting 
character sequences (as illustrated by the cases of D and C++11).  From my 
point-of-view, that is the only way to get a raw string that really means raw 
string; otherwise, you end up having to select some exceptional case (e.g. the 
backslashes, doubled-up quotes, etc of the other options Kevin described).

Cheers,
-Felix

- Original Message -
From: Kevin Ballard ke...@sb.org
To: rust-dev@mozilla.org
Sent: Thursday, September 19, 2013 10:36:39 PM
Subject: [rust-dev] RFC: Syntax for raw string literals

One feature common to many programming languages that Rust lacks is raw 
string literals. Specifically, these are string literals that don't interpret 
backslash-escapes. There are three obvious applications at the moment: regular 
expressions, windows file paths, and format!() strings that want to embed { and 
} chars. I'm sure there are more as well, such as large string literals that 
contain things like HTML text.

I took a look at 3 programming languages to see what solutions they had: D, 
C++11, and Python. I've reproduced their syntax below, plus one more custom 
syntax, along with pros  cons. I'm hoping we can come up with a syntax that 
makes sense for Rust.

## Python syntax:

Python supports an r or R prefix on any string literal (both short 
strings, delimited with a single quote, or long strings, delimited with 3 
quotes). The r or R prefix denotes a raw string, and has the effect of 
disabling backslash-escapes within the string. For the most part. It actually 
gets a bit weird: if a sequence of backslashes of an odd length occurs prior to 
a quote (of the appropriate quote type for the string), then the quote is 
considered to be escaped, but the backslashes are left in the string. This 
means rfoo\ evaluates to the string `foo\`, and similarly rfoo\\\ is 
`foo\\\`, but rfoo\\ is merely the string `foo\\`.

Pros:
* Simple syntax
* Allows for embedding the closing quote character in the raw string

Cons:
* Handling of backslashes is very bizarre, and the closing quote character can 
only be embedded if you want to have a backslash before it.

## C++11 syntax:

C++11 allows for raw strings using a sequence of the form Rseq(raw text)seq. 
In this construct, `seq` is any sequence of (zero or more) characters except 
for: space, (, ), \, \t, \v, \n, \r. The simplest form looks like R(raw 
text), which allows for anything in the raw text except for the sequence `)`. 
The addition of the delimiter sequence allows for constructing a raw string 
containing any sequence at all (as the delimiter sequence can be adjusted based 
on the represented text).

Pros:
* Allows for embedding any character at all (representable in the source file 
encoding), including the closing quote.
* Reasonably straightforward

Cons:
* Syntax is slightly complicated

## D syntax:

D supports three different forms of raw strings. The first two are similar, 
being rraw text and `raw text`. Besides the choice of delimiters, they behave 
identically, in that the raw text may contain anything except for the 
appropriate quote character. The third syntax is a slightly more complicated 
form of C++11's syntax, and is called a delimited string. It takes two forms.

The first looks like q(raw text) where the ( may be any non-identifier 
non-whitespace character. If the character is one of [({ then it is a nesting 
delimiter, and the close delimiter must be the matching ])} character, 
otherwise the close delimiter is the same as the open. Furthermore, nesting 
delimiters do exactly what their name says: they nest. If the nesting delimiter 
is (), then any ( in the raw text must be balanced with a ) in the raw text. In 
other words, q(foo(bar)) evaluates to foo(bar), but q(foo(bar) and 
q(foobar)) are both illegal.

The second uses any identifier as the delimiter. In this case, the identifier 
must immediately be followed by a newline, and in order to close the string, 
the close delimiter must be preceded by a newline. This looks like

qdelim
this is some raw text
delim

It's essentially a heredoc. Note that the first newline is not part of the 
string, but the final newline is, so this evaluates to this is some raw 
text\n.

Pros:
* Flexible
* Allows for constructing a raw string that contains any desired sequence of 
characters (representable in the source file's encoding)

Cons:
* Overly complicated

## Custom syntax

There's another approach that none of these three languages take, which is to 
merely allow for doubling up the quote character in order to embed a quote. 
This would look like Rraw string literal with embedded quotes., which 
becomes `raw string literal with embedded quotes`.

Pros:
* Very simple
* Allows for embedding the close quote character, and therefore, any character 
(representable in the source file encoding)

Cons:
* Slightly odd to read

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-21 Thread Carl Eastlund
You always have to have some exceptional case, though, don't you?  What if
you have a string literal that contains every single character?  Or what if
you have literals in procedurally generated code that might contain any
unknown character?  There's always a possibility that a given delimiter
(sequence of) character(s) might be duplicated inside the literal.  Isn't
there?

Carl Eastlund

On Sat, Sep 21, 2013 at 7:24 AM, Felix Klock pnkfe...@mozilla.com wrote:

 Kevin (cc'ing rust-dev)-

 Of the choices listed here, I prefer the C++11 syntax.

 Whatever syntax we choose, I would prefer one that has user-selected
 delimiting character sequences (as illustrated by the cases of D and
 C++11).  From my point-of-view, that is the only way to get a raw string
 that really means raw string; otherwise, you end up having to select some
 exceptional case (e.g. the backslashes, doubled-up quotes, etc of the other
 options Kevin described).

 Cheers,
 -Felix

 - Original Message -
 From: Kevin Ballard ke...@sb.org
 To: rust-dev@mozilla.org
 Sent: Thursday, September 19, 2013 10:36:39 PM
 Subject: [rust-dev] RFC: Syntax for raw string literals

 One feature common to many programming languages that Rust lacks is raw
 string literals. Specifically, these are string literals that don't
 interpret backslash-escapes. There are three obvious applications at the
 moment: regular expressions, windows file paths, and format!() strings that
 want to embed { and } chars. I'm sure there are more as well, such as large
 string literals that contain things like HTML text.

 I took a look at 3 programming languages to see what solutions they had:
 D, C++11, and Python. I've reproduced their syntax below, plus one more
 custom syntax, along with pros  cons. I'm hoping we can come up with a
 syntax that makes sense for Rust.

 ## Python syntax:

 Python supports an r or R prefix on any string literal (both short
 strings, delimited with a single quote, or long strings, delimited with 3
 quotes). The r or R prefix denotes a raw string, and has the effect
 of disabling backslash-escapes within the string. For the most part. It
 actually gets a bit weird: if a sequence of backslashes of an odd length
 occurs prior to a quote (of the appropriate quote type for the string),
 then the quote is considered to be escaped, but the backslashes are left in
 the string. This means rfoo\ evaluates to the string `foo\`, and
 similarly rfoo\\\ is `foo\\\`, but rfoo\\ is merely the string
 `foo\\`.

 Pros:
 * Simple syntax
 * Allows for embedding the closing quote character in the raw string

 Cons:
 * Handling of backslashes is very bizarre, and the closing quote character
 can only be embedded if you want to have a backslash before it.

 ## C++11 syntax:

 C++11 allows for raw strings using a sequence of the form Rseq(raw
 text)seq. In this construct, `seq` is any sequence of (zero or more)
 characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form
 looks like R(raw text), which allows for anything in the raw text except
 for the sequence `)`. The addition of the delimiter sequence allows for
 constructing a raw string containing any sequence at all (as the delimiter
 sequence can be adjusted based on the represented text).

 Pros:
 * Allows for embedding any character at all (representable in the source
 file encoding), including the closing quote.
 * Reasonably straightforward

 Cons:
 * Syntax is slightly complicated

 ## D syntax:

 D supports three different forms of raw strings. The first two are
 similar, being rraw text and `raw text`. Besides the choice of
 delimiters, they behave identically, in that the raw text may contain
 anything except for the appropriate quote character. The third syntax is a
 slightly more complicated form of C++11's syntax, and is called a delimited
 string. It takes two forms.

 The first looks like q(raw text) where the ( may be any non-identifier
 non-whitespace character. If the character is one of [({ then it is a
 nesting delimiter, and the close delimiter must be the matching ])}
 character, otherwise the close delimiter is the same as the open.
 Furthermore, nesting delimiters do exactly what their name says: they nest.
 If the nesting delimiter is (), then any ( in the raw text must be balanced
 with a ) in the raw text. In other words, q(foo(bar)) evaluates to
 foo(bar), but q(foo(bar) and q(foobar)) are both illegal.

 The second uses any identifier as the delimiter. In this case, the
 identifier must immediately be followed by a newline, and in order to close
 the string, the close delimiter must be preceded by a newline. This looks
 like

 qdelim
 this is some raw text
 delim

 It's essentially a heredoc. Note that the first newline is not part of the
 string, but the final newline is, so this evaluates to this is some raw
 text\n.

 Pros:
 * Flexible
 * Allows for constructing a raw string that contains any desired sequence
 of characters (representable in the source file's

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-21 Thread Daniel Micay
On Sat, Sep 21, 2013 at 4:52 PM, Carl Eastlund c...@ccs.neu.edu wrote:

 You always have to have some exceptional case, though, don't you?  What if
 you have a string literal that contains every single character?  Or what if
 you have literals in procedurally generated code that might contain any
 unknown character?  There's always a possibility that a given delimiter
 (sequence of) character(s) might be duplicated inside the literal.  Isn't
 there?

 Carl Eastlund


A shell script's here document or a C++11 raw string literal gives you the
ability to choose the sequence ending the literal. You can always pick an
appropriate end delimiter for a given string.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-21 Thread Kevin Ballard
The delimiter can be whatever you want in C++11 syntax (well, with restrictions 
on the charset, but among that charset it's freeform). You can _always_ pick a 
delimiter that isn't found in the text.

If you're procedurally generating the text, surely you can also write an 
algorithm to pick a delimiter. It's not very hard to do so.

-Kevin

On Sep 21, 2013, at 1:52 PM, Carl Eastlund c...@ccs.neu.edu wrote:

 You always have to have some exceptional case, though, don't you?  What if 
 you have a string literal that contains every single character?  Or what if 
 you have literals in procedurally generated code that might contain any 
 unknown character?  There's always a possibility that a given delimiter 
 (sequence of) character(s) might be duplicated inside the literal.  Isn't 
 there?
 
 Carl Eastlund
 
 On Sat, Sep 21, 2013 at 7:24 AM, Felix Klock pnkfe...@mozilla.com wrote:
 Kevin (cc'ing rust-dev)-
 
 Of the choices listed here, I prefer the C++11 syntax.
 
 Whatever syntax we choose, I would prefer one that has user-selected 
 delimiting character sequences (as illustrated by the cases of D and C++11).  
 From my point-of-view, that is the only way to get a raw string that really 
 means raw string; otherwise, you end up having to select some exceptional 
 case (e.g. the backslashes, doubled-up quotes, etc of the other options Kevin 
 described).
 
 Cheers,
 -Felix
 
 - Original Message -
 From: Kevin Ballard ke...@sb.org
 To: rust-dev@mozilla.org
 Sent: Thursday, September 19, 2013 10:36:39 PM
 Subject: [rust-dev] RFC: Syntax for raw string literals
 
 One feature common to many programming languages that Rust lacks is raw 
 string literals. Specifically, these are string literals that don't interpret 
 backslash-escapes. There are three obvious applications at the moment: 
 regular expressions, windows file paths, and format!() strings that want to 
 embed { and } chars. I'm sure there are more as well, such as large string 
 literals that contain things like HTML text.
 
 I took a look at 3 programming languages to see what solutions they had: D, 
 C++11, and Python. I've reproduced their syntax below, plus one more custom 
 syntax, along with pros  cons. I'm hoping we can come up with a syntax that 
 makes sense for Rust.
 
 ## Python syntax:
 
 Python supports an r or R prefix on any string literal (both short 
 strings, delimited with a single quote, or long strings, delimited with 3 
 quotes). The r or R prefix denotes a raw string, and has the effect of 
 disabling backslash-escapes within the string. For the most part. It actually 
 gets a bit weird: if a sequence of backslashes of an odd length occurs prior 
 to a quote (of the appropriate quote type for the string), then the quote is 
 considered to be escaped, but the backslashes are left in the string. This 
 means rfoo\ evaluates to the string `foo\`, and similarly rfoo\\\ is 
 `foo\\\`, but rfoo\\ is merely the string `foo\\`.
 
 Pros:
 * Simple syntax
 * Allows for embedding the closing quote character in the raw string
 
 Cons:
 * Handling of backslashes is very bizarre, and the closing quote character 
 can only be embedded if you want to have a backslash before it.
 
 ## C++11 syntax:
 
 C++11 allows for raw strings using a sequence of the form Rseq(raw 
 text)seq. In this construct, `seq` is any sequence of (zero or more) 
 characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
 looks like R(raw text), which allows for anything in the raw text except 
 for the sequence `)`. The addition of the delimiter sequence allows for 
 constructing a raw string containing any sequence at all (as the delimiter 
 sequence can be adjusted based on the represented text).
 
 Pros:
 * Allows for embedding any character at all (representable in the source file 
 encoding), including the closing quote.
 * Reasonably straightforward
 
 Cons:
 * Syntax is slightly complicated
 
 ## D syntax:
 
 D supports three different forms of raw strings. The first two are similar, 
 being rraw text and `raw text`. Besides the choice of delimiters, they 
 behave identically, in that the raw text may contain anything except for the 
 appropriate quote character. The third syntax is a slightly more complicated 
 form of C++11's syntax, and is called a delimited string. It takes two forms.
 
 The first looks like q(raw text) where the ( may be any non-identifier 
 non-whitespace character. If the character is one of [({ then it is a 
 nesting delimiter, and the close delimiter must be the matching ])} 
 character, otherwise the close delimiter is the same as the open. 
 Furthermore, nesting delimiters do exactly what their name says: they nest. 
 If the nesting delimiter is (), then any ( in the raw text must be balanced 
 with a ) in the raw text. In other words, q(foo(bar)) evaluates to 
 foo(bar), but q(foo(bar) and q(foobar)) are both illegal.
 
 The second uses any identifier as the delimiter. In this case, the identifier 
 must

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Andrew Dunham
The way that Lua does raw strings is also fairly nifty.  Check out
http://www.lua.org/manual/5.2/manual.html, section 3.1, or, in short:

- Strings can be delimited by [===[, with any number of equals signs.
 The corresponding closing delimiter must match the original number of
equals signs.
- No escaping is done.
- Any kind of end-of-line sequence (i.e. \r and \n in any order) is
converted to just a newline.
- It can run for multiple lines.

--Andrew D


On Thu, Sep 19, 2013 at 10:28 PM, Kevin Cantu m...@kevincantu.org wrote:

 I think designing good traits to support all these text implementations is
 far more important than whatever hungarian notation is preferred for
 literals.


 Kevin


 On Thu, Sep 19, 2013 at 2:50 PM, Martin DeMello 
 martindeme...@gmail.comwrote:

 Ah, good point. You could fix it by having a very small whitelist of
 acceptable delimiters, but that probably takes it into overcomplex
 territory.

 martin

 On Thu, Sep 19, 2013 at 2:46 PM, Kevin Ballard ke...@sb.org wrote:
  As I just responded to Masklinn, this is ambiguous. How do you lex `do
 R{foo()}`?
 
  -Kevin
 
  On Sep 19, 2013, at 2:41 PM, Martin DeMello martindeme...@gmail.com
 wrote:
 
  Yes, I figured R followed by a non-alphabetical character could serve
  the same purpose as ruby's %char.
 
  martin
 
  On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard ke...@sb.org wrote:
  I didn't look at Ruby's syntax, but what you just described sounds a
 little too free-form to me. I believe Ruby at least requires a % as part of
 the syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for
 rust, as it would conflict with the % operator. I don't think other
 punctuation would work well either.
 
  -Kevin
 
  On Sep 19, 2013, at 2:10 PM, Martin DeMello martindeme...@gmail.com
 wrote:
 
  How complicated would it be to use R but with arbitrary paired
  delimiters (the way, for instance, ruby does it)? It's very handy to
  pick a delimiter you know does not appear in the string, e.g. if you
  had a string containing ')' you could use R{this is a string with a )
  in it} or R|this is a string with a ) in it|.
 
  martin
 
  On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:
  One feature common to many programming languages that Rust lacks is
 raw string literals. Specifically, these are string literals that don't
 interpret backslash-escapes. There are three obvious applications at the
 moment: regular expressions, windows file paths, and format!() strings that
 want to embed { and } chars. I'm sure there are more as well, such as large
 string literals that contain things like HTML text.
 
  I took a look at 3 programming languages to see what solutions they
 had: D, C++11, and Python. I've reproduced their syntax below, plus one
 more custom syntax, along with pros  cons. I'm hoping we can come up with
 a syntax that makes sense for Rust.
 
  ## Python syntax:
 
  Python supports an r or R prefix on any string literal (both
 short strings, delimited with a single quote, or long strings,
 delimited with 3 quotes). The r or R prefix denotes a raw string, and
 has the effect of disabling backslash-escapes within the string. For the
 most part. It actually gets a bit weird: if a sequence of backslashes of an
 odd length occurs prior to a quote (of the appropriate quote type for the
 string), then the quote is considered to be escaped, but the backslashes
 are left in the string. This means rfoo\ evaluates to the string
 `foo\`, and similarly rfoo\\\ is `foo\\\`, but rfoo\\ is merely the
 string `foo\\`.
 
  Pros:
  * Simple syntax
  * Allows for embedding the closing quote character in the raw string
 
  Cons:
  * Handling of backslashes is very bizarre, and the closing quote
 character can only be embedded if you want to have a backslash before it.
 
  ## C++11 syntax:
 
  C++11 allows for raw strings using a sequence of the form Rseq(raw
 text)seq. In this construct, `seq` is any sequence of (zero or more)
 characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form
 looks like R(raw text), which allows for anything in the raw text except
 for the sequence `)`. The addition of the delimiter sequence allows for
 constructing a raw string containing any sequence at all (as the delimiter
 sequence can be adjusted based on the represented text).
 
  Pros:
  * Allows for embedding any character at all (representable in the
 source file encoding), including the closing quote.
  * Reasonably straightforward
 
  Cons:
  * Syntax is slightly complicated
 
  ## D syntax:
 
  D supports three different forms of raw strings. The first two are
 similar, being rraw text and `raw text`. Besides the choice of
 delimiters, they behave identically, in that the raw text may contain
 anything except for the appropriate quote character. The third syntax is a
 slightly more complicated form of C++11's syntax, and is called a delimited
 string. It takes two forms.
 
  The first looks like q(raw text) where the ( may 

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Masklinn
On 2013-09-19, at 23:45 , Kevin Ballard wrote:
 Yes I know, but in my (rather limited) experience with Python, triple-quoted 
 strings are typically used for docstrings. It was just an example anyway.

They're also commonly used for multiline strings as single-quoted strings don't 
require it.

 
 * The quote-escaping oddness is less of an issue in Python as you can
 also use single-quotes for delimiting, or use triple-quoted strings
 (if you need to embed both single and double quotes in rawstrings).
 
 If I need to embed both ''' and  in a string, I'm out of luck.

The chance of that is as remote as can be. I've never seen or heard of
it happen. And mind, the issue must happen *in a rawstring* which is
even more unlikely.

 Also,
 
 windows file paths
 
 windows paths can also use forward slashes so that's not a very
 interesting justification.
 
 Not always. UNC paths must start with \\ (in my testing, //foo/bar/baz is not 
 interpreted as a UNC path by the Windows File Explorer, but \\foo/bar/baz is).

True. Do you expect writing literal UNC paths in Rust to be a common occurrence?

 There's also paths that start with the verbatim prefix \\?\, which disables 
 interpretation of forward-slashes (among other things).

That's not really relevant to a rawstrings proposal, why would a
developer embed such a path literally?

 As I am actively engaged in writing a replacement for the path module, and am 
 currently expanding the test suite for Windows paths, raw strings would be 
 extremely useful to me.

I'd have thought it a better idea to use path builders (maybe macros)
and avoid embedding literal path separators in order to avoid
portability issues.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Kevin Ballard
On Sep 20, 2013, at 1:13 AM, Masklinn maskl...@masklinn.net wrote:

 Also,
 
 windows file paths
 
 windows paths can also use forward slashes so that's not a very
 interesting justification.
 
 Not always. UNC paths must start with \\ (in my testing, //foo/bar/baz is 
 not interpreted as a UNC path by the Windows File Explorer, but 
 \\foo/bar/baz is).
 
 True. Do you expect writing literal UNC paths in Rust to be a common 
 occurrence?

Maybe not for most people, but I've been writing them a _lot_ lately (I'm 
rewriting the path module).

Regular expressions is really the most common application here.

 There's also paths that start with the verbatim prefix \\?\, which disables 
 interpretation of forward-slashes (among other things).
 
 That's not really relevant to a rawstrings proposal, why would a
 developer embed such a path literally?

Perhaps they want to hard-code a path that refers to something that requires 
the \\?\ prefix (such as a path that contains / as part of a path component, or 
is longer than 255 characters).

But just in general, \ is the canonical Windows path separator. I don't think 
use / is particularly great advice. What if this string is intended for 
displaying?

 As I am actively engaged in writing a replacement for the path module, and 
 am currently expanding the test suite for Windows paths, raw strings would 
 be extremely useful to me.
 
 I'd have thought it a better idea to use path builders (maybe macros)
 and avoid embedding literal path separators in order to avoid
 portability issues.

People still use literal path separators in strings all the time in languages 
that support path-building methods.

-Kevin
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Marijn Haverbeke
 If I need to embed both ''' and  in a string, I'm out of luck.

 The chance of that is as remote as can be. I've never seen or heard of
 it happen. And mind, the issue must happen *in a rawstring* which is
 even more unlikely.

You should note that, as soon as you include something in the language
itself, that creates meaningful strings (programs in the language)
that include the token, which are not likely, at some point, to need
to be written as a multiline string in the language itself.

(As a related example, as someone writing JavaScript-analyzing code in
JavaScript, I've had several bugs caused by the fact that the
nonsense, no-one-is-ever-going-to-use-this word __proto__ has a very
hard to suppress special meaning, and you *are* going to use it when
analyzing the elements in another JavaScript program.)
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Masklinn
On 2013-09-20, at 10:26 , Marijn Haverbeke wrote:
 If I need to embed both ''' and  in a string, I'm out of luck.
 
 The chance of that is as remote as can be. I've never seen or heard of
 it happen. And mind, the issue must happen *in a rawstring* which is
 even more unlikely.
 
 You should note that, as soon as you include something in the language
 itself, that creates meaningful strings (programs in the language)
 that include the token, which are not likely, at some point, to need
 to be written as a multiline string in the language itself.

It's already noted, my objections are very much that this is highly
unlikely to be an issue as it only comes to a head when needing
*triple-quoted rawstrings* to include *their own* delimiters
(meaning a triple-quoted rawstring which needs to include both
triple-quoted delimiters at the same time).

Even unlikelier given python will concatenate string literals during
parsing.

On 2013-09-20, at 10:25 , Kevin Ballard wrote:
 Regular expressions is really the most common application here.

Right, which was just about all I was saying in the original message.

 People still use literal path separators in strings all the time in languages 
 that support path-building methods.

Something I don't believe should be encouraged.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Andres Osinski
Out of all the mentioned syntaxes, Python's seems simple and easy (and the
corner cases appear to be fairly unlikely for the actual use cases for raw
strings), Ruby's seems very powerful and if a couple of restrictions are
added could probably fit well, and Lua's seem very well designed by
allowing delimiters of arbitrary length.

As a user of higher-level languages, all of these seem appealing to me. I
don't really feel that rawstring should be complicated to use, and I don't
really think the limitations are bad so long as they areexplicitly
documented (which is how it should be).


On Fri, Sep 20, 2013 at 5:38 AM, Masklinn maskl...@masklinn.net wrote:

 On 2013-09-20, at 10:26 , Marijn Haverbeke wrote:
  If I need to embed both ''' and  in a string, I'm out of luck.
 
  The chance of that is as remote as can be. I've never seen or heard of
  it happen. And mind, the issue must happen *in a rawstring* which is
  even more unlikely.
 
  You should note that, as soon as you include something in the language
  itself, that creates meaningful strings (programs in the language)
  that include the token, which are not likely, at some point, to need
  to be written as a multiline string in the language itself.

 It's already noted, my objections are very much that this is highly
 unlikely to be an issue as it only comes to a head when needing
 *triple-quoted rawstrings* to include *their own* delimiters
 (meaning a triple-quoted rawstring which needs to include both
 triple-quoted delimiters at the same time).

 Even unlikelier given python will concatenate string literals during
 parsing.

 On 2013-09-20, at 10:25 , Kevin Ballard wrote:
  Regular expressions is really the most common application here.

 Right, which was just about all I was saying in the original message.

  People still use literal path separators in strings all the time in
 languages that support path-building methods.

 Something I don't believe should be encouraged.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev




-- 
Andrés Osinski
http://www.andresosinski.com.ar/
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Kevin Ballard
Python's has really stupid handling of backslashes, and I really don't like how 
it cannot represent all valid strings. I'd really prefer not to make that same 
mistake.

Ruby's syntax cannot be used because % lexes as an operator.

Of the 3, Lua's is probably the best, although it's a bit esoteric (with using 
[[ and nary a quote in sight). It seems roughly equivalent to C++11's syntax 
though, both in ease of use and flexibility.

-Kevin

On Sep 20, 2013, at 1:41 AM, Andres Osinski andres.osin...@gmail.com wrote:

 Out of all the mentioned syntaxes, Python's seems simple and easy (and the 
 corner cases appear to be fairly unlikely for the actual use cases for raw 
 strings), Ruby's seems very powerful and if a couple of restrictions are 
 added could probably fit well, and Lua's seem very well designed by allowing 
 delimiters of arbitrary length.
 
 As a user of higher-level languages, all of these seem appealing to me. I 
 don't really feel that rawstring should be complicated to use, and I don't 
 really think the limitations are bad so long as they areexplicitly documented 
 (which is how it should be).
 
 
 On Fri, Sep 20, 2013 at 5:38 AM, Masklinn maskl...@masklinn.net wrote:
 On 2013-09-20, at 10:26 , Marijn Haverbeke wrote:
  If I need to embed both ''' and  in a string, I'm out of luck.
 
  The chance of that is as remote as can be. I've never seen or heard of
  it happen. And mind, the issue must happen *in a rawstring* which is
  even more unlikely.
 
  You should note that, as soon as you include something in the language
  itself, that creates meaningful strings (programs in the language)
  that include the token, which are not likely, at some point, to need
  to be written as a multiline string in the language itself.
 
 It's already noted, my objections are very much that this is highly
 unlikely to be an issue as it only comes to a head when needing
 *triple-quoted rawstrings* to include *their own* delimiters
 (meaning a triple-quoted rawstring which needs to include both
 triple-quoted delimiters at the same time).
 
 Even unlikelier given python will concatenate string literals during
 parsing.
 
 On 2013-09-20, at 10:25 , Kevin Ballard wrote:
  Regular expressions is really the most common application here.
 
 Right, which was just about all I was saying in the original message.
 
  People still use literal path separators in strings all the time in 
  languages that support path-building methods.
 
 Something I don't believe should be encouraged.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev
 
 
 
 -- 
 Andrés Osinski
 http://www.andresosinski.com.ar/
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Thad Guidry
Does it HAVE to be a single typed char seen on the English 101 keyboard ?

History Lesson:
The industry in the very early, early days of printing, storing, and
processing characters, both English and non-English, came up with a
solution around the use of Control Characters.

ASCI Char 1 is known as Start Of Header, or abbreviated SOH.
ASCII Char 2 is known as Start of Text, or abbreviated STX.
ASCII Char 3 is known as End of Text, or abbreviated ETX.

It got me thinking of how various industries to this day still use Start of
Text and End of Text... what we are discussing as enclosing a String
verbatim.

Many data operations that I perform with conversion of string fields are
actually done by first wrapping with Control Chars [1] to enclose the
String LITERALLY.

Apple's Enterprise Partner Feed is an example that uses such basic Control
Chars to separate fields and interestingly uses multibyte EOL Control Chars
to retain even unicode contents (Foreign Language strings, that use quotes
of a different nature at times [2] and that sometimes appear in its fields
and that need to be retained inside a database field as well.)

I am wondering if doing something similar to that the industry does with
using Control Chars to represent a STX or ETX would not be even wiser to
subplant String Literal ?  i.e.  do not reinvent the fast spinning wheel
that also has built-in never go flat technology. :)

[1]
http://www.theasciicode.com.ar/ascii-control-characters/start-of-text-ascii-code-2.html
[2] http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks

Thoughts ?

-- 
-Thad
Thad on Freebase.com http://www.freebase.com/view/en/thad_guidry
Thad on LinkedIn http://www.linkedin.com/in/thadguidry/
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-20 Thread Benjamin Striegel
As usual, I'm highly resistant to use of the backtick because Markdown uses
it pervasively. Not only would this make it very annoying to embed Markdown
in strings, it can make it impossible to embed inline Rust code in Markdown
editors. Let's leave the backtick as a metasyntactic symbol.


On Fri, Sep 20, 2013 at 3:45 PM, Kevin Ballard ke...@sb.org wrote:

 I considered backtick as well. If that approach is used, I would suggest
 that a doubled-up backtick represent a single backtick in the string, i.e.
 `error: path ``{}' failed`. This is pretty much equivalent to just using
 r as the syntax, although backtick may be a slightly nicer syntax for it.

 -Kevin

 On Sep 20, 2013, at 9:27 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric (with
  using [[ and nary a quote in sight).
 
  I think an important thing to keep in mind is that the main reason
  behind creating a new form of literal is for things like:
 
  * Escapes in format! strings
  * Possible regular expression syntax (this also may be a syntax
 extension)
  * Type literal windows paths (escaping \ is hard)
  * Otherwise long literals which may contain quotes (like html text)
 
  With those in mind, although Lua's syntax is sufficient, is it nice to
  use? If the first thing I saw as an introduction to Rust was:
 
  fn main() {
   println!([[Hello, {}!]], world);
  }
 
  I would be a little confused. Now the [[/]] aren't really necessary in
  this case, but I'm personally unsure of how usable [[/]] would be
  throughout the language. Raw literals in languages like C++ and Lua I
  think aren't intended to be used that often. Instead they should be
  used only when necessary, and you frequently don't see them in code.
  For rust, the use cases which are the cause of this discussion are
  actually fairly common, and I'm not sure that we'd want to see [[/]]
  all over the place, although of course that's just my opinion :)
 
  Skimming back, I haven't seen a suggestion of the backtick character
  as a delimiter. Go takes this approach, and I don't believe that in Go
  you can have a backtick anywhere in a backtick literal, and otherwise
  what you see is what you get. It's at least something to consider,
  though.

 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Kevin Ballard
One feature common to many programming languages that Rust lacks is raw 
string literals. Specifically, these are string literals that don't interpret 
backslash-escapes. There are three obvious applications at the moment: regular 
expressions, windows file paths, and format!() strings that want to embed { and 
} chars. I'm sure there are more as well, such as large string literals that 
contain things like HTML text.

I took a look at 3 programming languages to see what solutions they had: D, 
C++11, and Python. I've reproduced their syntax below, plus one more custom 
syntax, along with pros  cons. I'm hoping we can come up with a syntax that 
makes sense for Rust.

## Python syntax:

Python supports an r or R prefix on any string literal (both short 
strings, delimited with a single quote, or long strings, delimited with 3 
quotes). The r or R prefix denotes a raw string, and has the effect of 
disabling backslash-escapes within the string. For the most part. It actually 
gets a bit weird: if a sequence of backslashes of an odd length occurs prior to 
a quote (of the appropriate quote type for the string), then the quote is 
considered to be escaped, but the backslashes are left in the string. This 
means rfoo\ evaluates to the string `foo\`, and similarly rfoo\\\ is 
`foo\\\`, but rfoo\\ is merely the string `foo\\`.

Pros:
* Simple syntax
* Allows for embedding the closing quote character in the raw string

Cons:
* Handling of backslashes is very bizarre, and the closing quote character can 
only be embedded if you want to have a backslash before it.

## C++11 syntax:

C++11 allows for raw strings using a sequence of the form Rseq(raw text)seq. 
In this construct, `seq` is any sequence of (zero or more) characters except 
for: space, (, ), \, \t, \v, \n, \r. The simplest form looks like R(raw 
text), which allows for anything in the raw text except for the sequence `)`. 
The addition of the delimiter sequence allows for constructing a raw string 
containing any sequence at all (as the delimiter sequence can be adjusted based 
on the represented text).

Pros:
* Allows for embedding any character at all (representable in the source file 
encoding), including the closing quote.
* Reasonably straightforward

Cons:
* Syntax is slightly complicated

## D syntax:

D supports three different forms of raw strings. The first two are similar, 
being rraw text and `raw text`. Besides the choice of delimiters, they behave 
identically, in that the raw text may contain anything except for the 
appropriate quote character. The third syntax is a slightly more complicated 
form of C++11's syntax, and is called a delimited string. It takes two forms.

The first looks like q(raw text) where the ( may be any non-identifier 
non-whitespace character. If the character is one of [({ then it is a nesting 
delimiter, and the close delimiter must be the matching ])} character, 
otherwise the close delimiter is the same as the open. Furthermore, nesting 
delimiters do exactly what their name says: they nest. If the nesting delimiter 
is (), then any ( in the raw text must be balanced with a ) in the raw text. In 
other words, q(foo(bar)) evaluates to foo(bar), but q(foo(bar) and 
q(foobar)) are both illegal.

The second uses any identifier as the delimiter. In this case, the identifier 
must immediately be followed by a newline, and in order to close the string, 
the close delimiter must be preceded by a newline. This looks like

qdelim
this is some raw text
delim

It's essentially a heredoc. Note that the first newline is not part of the 
string, but the final newline is, so this evaluates to this is some raw 
text\n.

Pros:
* Flexible
* Allows for constructing a raw string that contains any desired sequence of 
characters (representable in the source file's encoding)

Cons:
* Overly complicated

## Custom syntax

There's another approach that none of these three languages take, which is to 
merely allow for doubling up the quote character in order to embed a quote. 
This would look like Rraw string literal with embedded quotes., which 
becomes `raw string literal with embedded quotes`.

Pros:
* Very simple
* Allows for embedding the close quote character, and therefore, any character 
(representable in the source file encoding)

Cons:
* Slightly odd to read

## Conclusion

Of the three existing syntaxes examined here, I think C++11's is the best. It 
ties with D's syntax for being the most powerful, but is simpler than D's. The 
custom syntax is just as powerful though. The benefit of the C++11 syntax over 
the custom syntax is it's slightly easier to read the C++11 syntax, as the raw 
text has a 1-to-one mapping with the resulting string. The custom syntax is a 
bit more confusing to read, especially if you want to add multiple quotes. As a 
pathological case, let's try representing a Python triple-quoted docstring 
using both syntaxes:

C++11: R(this is a python docstring)
Custom: Rthis is a python docstring

Based on this 

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Masklinn
On 2013-09-19, at 22:36 , Kevin Ballard wrote:
 
 I welcome any comments, criticisms, or suggestions.

* C# also has rawstrings, which were not looked at. C#'s rawstrings
  disable escaping entirely but add a new one: doubling quotes will insert
  a single quote in the resulting string (similar to quote-escaping in
  SQL or Smalltalk).
* The docstring comment is incorrect, a docstring is a string in the
  first position of a module, a class statement or a function statement.
  A single-quoted string at these positions will yield a docstring.

  The triple-quoting is a string syntax embedding newlines (single-quoted
  strings can not contain literal newlines in Python, only escaped ones).
  Obviously, triple-quoted python string can be raw.
* The quote-escaping oddness is less of an issue in Python as you can
  also use single-quotes for delimiting, or use triple-quoted strings
  (if you need to embed both single and double quotes in rawstrings).
* Perl's quotes and quote-like operators would certainly deserve mention.

Also,

 windows file paths

windows paths can also use forward slashes so that's not a very
interesting justification.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Oren Ben-Kiki
Just to make sure - how does the C++ syntax behave in the presence of line
breaks? Specifically, what does it do with leading (and trailing) white
space of each line? My guess is that they would be included in the string,
is that correct?

At any rate, having some sort of here documents would be very nice. The C++
syntax is reasonable, though I really don't have a strong preference here.
It might be more Rust-ish to use a macro notation instead:
str!(delimiter.delimiter), or something like that.

BTW, I found myself creating (in several languages) an unindent string
function that would (1) if the string starts with a line break, remove it;
(2) remove the leading white space of the 1st line from all the lines.
Applying this to here documents allows indenting them together with the
code that includes them. In Rust, the downside of this approach is that the
result isn't 'static any more... Not that this warrants making such
complex functionality a built-in of the syntax, of course.

Oren.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Kevin Ballard
I didn't look at Ruby's syntax, but what you just described sounds a little too 
free-form to me. I believe Ruby at least requires a % as part of the syntax, 
e.g. %q{test}. But I don't think %R{test} is a good idea for rust, as it would 
conflict with the % operator. I don't think other punctuation would work well 
either.

-Kevin

On Sep 19, 2013, at 2:10 PM, Martin DeMello martindeme...@gmail.com wrote:

 How complicated would it be to use R but with arbitrary paired
 delimiters (the way, for instance, ruby does it)? It's very handy to
 pick a delimiter you know does not appear in the string, e.g. if you
 had a string containing ')' you could use R{this is a string with a )
 in it} or R|this is a string with a ) in it|.
 
 martin
 
 On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:
 One feature common to many programming languages that Rust lacks is raw 
 string literals. Specifically, these are string literals that don't 
 interpret backslash-escapes. There are three obvious applications at the 
 moment: regular expressions, windows file paths, and format!() strings that 
 want to embed { and } chars. I'm sure there are more as well, such as large 
 string literals that contain things like HTML text.
 
 I took a look at 3 programming languages to see what solutions they had: D, 
 C++11, and Python. I've reproduced their syntax below, plus one more custom 
 syntax, along with pros  cons. I'm hoping we can come up with a syntax that 
 makes sense for Rust.
 
 ## Python syntax:
 
 Python supports an r or R prefix on any string literal (both short 
 strings, delimited with a single quote, or long strings, delimited with 3 
 quotes). The r or R prefix denotes a raw string, and has the effect of 
 disabling backslash-escapes within the string. For the most part. It 
 actually gets a bit weird: if a sequence of backslashes of an odd length 
 occurs prior to a quote (of the appropriate quote type for the string), then 
 the quote is considered to be escaped, but the backslashes are left in the 
 string. This means rfoo\ evaluates to the string `foo\`, and similarly 
 rfoo\\\ is `foo\\\`, but rfoo\\ is merely the string `foo\\`.
 
 Pros:
 * Simple syntax
 * Allows for embedding the closing quote character in the raw string
 
 Cons:
 * Handling of backslashes is very bizarre, and the closing quote character 
 can only be embedded if you want to have a backslash before it.
 
 ## C++11 syntax:
 
 C++11 allows for raw strings using a sequence of the form Rseq(raw 
 text)seq. In this construct, `seq` is any sequence of (zero or more) 
 characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
 looks like R(raw text), which allows for anything in the raw text except 
 for the sequence `)`. The addition of the delimiter sequence allows for 
 constructing a raw string containing any sequence at all (as the delimiter 
 sequence can be adjusted based on the represented text).
 
 Pros:
 * Allows for embedding any character at all (representable in the source 
 file encoding), including the closing quote.
 * Reasonably straightforward
 
 Cons:
 * Syntax is slightly complicated
 
 ## D syntax:
 
 D supports three different forms of raw strings. The first two are similar, 
 being rraw text and `raw text`. Besides the choice of delimiters, they 
 behave identically, in that the raw text may contain anything except for the 
 appropriate quote character. The third syntax is a slightly more complicated 
 form of C++11's syntax, and is called a delimited string. It takes two forms.
 
 The first looks like q(raw text) where the ( may be any non-identifier 
 non-whitespace character. If the character is one of [({ then it is a 
 nesting delimiter, and the close delimiter must be the matching ])} 
 character, otherwise the close delimiter is the same as the open. 
 Furthermore, nesting delimiters do exactly what their name says: they nest. 
 If the nesting delimiter is (), then any ( in the raw text must be balanced 
 with a ) in the raw text. In other words, q(foo(bar)) evaluates to 
 foo(bar), but q(foo(bar) and q(foobar)) are both illegal.
 
 The second uses any identifier as the delimiter. In this case, the 
 identifier must immediately be followed by a newline, and in order to close 
 the string, the close delimiter must be preceded by a newline. This looks 
 like
 
 qdelim
 this is some raw text
 delim
 
 It's essentially a heredoc. Note that the first newline is not part of the 
 string, but the final newline is, so this evaluates to this is some raw 
 text\n.
 
 Pros:
 * Flexible
 * Allows for constructing a raw string that contains any desired sequence of 
 characters (representable in the source file's encoding)
 
 Cons:
 * Overly complicated
 
 ## Custom syntax
 
 There's another approach that none of these three languages take, which is 
 to merely allow for doubling up the quote character in order to embed a 
 quote. This would look like Rraw string literal with embedded quotes., 
 

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Kevin Ballard
On Sep 19, 2013, at 1:56 PM, Oren Ben-Kiki o...@ben-kiki.org wrote:

 Just to make sure - how does the C++ syntax behave in the presence of line 
 breaks? Specifically, what does it do with leading (and trailing) white space 
 of each line? My guess is that they would be included in the string, is that 
 correct?

It includes every single character that occurs in the source between the 
delimiters. So

cout  R(this is
a string);

will print this is, newline, horizontal tab, a string.

 At any rate, having some sort of here documents would be very nice. The C++ 
 syntax is reasonable, though I really don't have a strong preference here. It 
 might be more Rust-ish to use a macro notation instead: 
 str!(delimiter.delimiter), or something like that.

Not possible. This syntax needs to be part of the lexer, and macros/syntax 
extensions operate on token trees, not on raw source characters.

-Kevin

 BTW, I found myself creating (in several languages) an unindent string 
 function that would (1) if the string starts with a line break, remove it; 
 (2) remove the leading white space of the 1st line from all the lines. 
 Applying this to here documents allows indenting them together with the 
 code that includes them. In Rust, the downside of this approach is that the 
 result isn't 'static any more... Not that this warrants making such complex 
 functionality a built-in of the syntax, of course.
 
 Oren.

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Kevin Ballard
On Sep 19, 2013, at 2:13 PM, Masklinn maskl...@masklinn.net wrote:

 On 2013-09-19, at 22:36 , Kevin Ballard wrote:
 
 I welcome any comments, criticisms, or suggestions.
 
 * C# also has rawstrings, which were not looked at. C#'s rawstrings
  disable escaping entirely but add a new one: doubling quotes will insert
  a single quote in the resulting string (similar to quote-escaping in
  SQL or Smalltalk).

I've never touched C#. Your description sounds like the custom syntax I 
described. I figured there were existing languages that did this, but none came 
to mind (I should have known SQL did it though).

 * The docstring comment is incorrect, a docstring is a string in the
  first position of a module, a class statement or a function statement.
  A single-quoted string at these positions will yield a docstring.
 
  The triple-quoting is a string syntax embedding newlines (single-quoted
  strings can not contain literal newlines in Python, only escaped ones).
  Obviously, triple-quoted python string can be raw.

Yes I know, but in my (rather limited) experience with Python, triple-quoted 
strings are typically used for docstrings. It was just an example anyway.

 * The quote-escaping oddness is less of an issue in Python as you can
  also use single-quotes for delimiting, or use triple-quoted strings
  (if you need to embed both single and double quotes in rawstrings).

If I need to embed both ''' and  in a string, I'm out of luck. For example, 
I cannot represent the following:

Triple-quoted strings in Python use the delimiters ''' and .

 * Perl's quotes and quote-like operators would certainly deserve mention.

I'm not a Perl programmer, but IIRC they look like `q{string}`, right? I don't 
think this is suitable for Rust because how would you lex `do q{foo()}`? Is 
this the invalid construct `do some-string` or is it calling a function named q 
with a closure?

 Also,
 
 windows file paths
 
 windows paths can also use forward slashes so that's not a very
 interesting justification.

Not always. UNC paths must start with \\ (in my testing, //foo/bar/baz is not 
interpreted as a UNC path by the Windows File Explorer, but \\foo/bar/baz is). 
There's also paths that start with the verbatim prefix \\?\, which disables 
interpretation of forward-slashes (among other things).

As I am actively engaged in writing a replacement for the path module, and am 
currently expanding the test suite for Windows paths, raw strings would be 
extremely useful to me.

-Kevin
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Kevin Ballard
As I just responded to Masklinn, this is ambiguous. How do you lex `do 
R{foo()}`?

-Kevin

On Sep 19, 2013, at 2:41 PM, Martin DeMello martindeme...@gmail.com wrote:

 Yes, I figured R followed by a non-alphabetical character could serve
 the same purpose as ruby's %char.
 
 martin
 
 On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard ke...@sb.org wrote:
 I didn't look at Ruby's syntax, but what you just described sounds a little 
 too free-form to me. I believe Ruby at least requires a % as part of the 
 syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for rust, 
 as it would conflict with the % operator. I don't think other punctuation 
 would work well either.
 
 -Kevin
 
 On Sep 19, 2013, at 2:10 PM, Martin DeMello martindeme...@gmail.com wrote:
 
 How complicated would it be to use R but with arbitrary paired
 delimiters (the way, for instance, ruby does it)? It's very handy to
 pick a delimiter you know does not appear in the string, e.g. if you
 had a string containing ')' you could use R{this is a string with a )
 in it} or R|this is a string with a ) in it|.
 
 martin
 
 On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:
 One feature common to many programming languages that Rust lacks is raw 
 string literals. Specifically, these are string literals that don't 
 interpret backslash-escapes. There are three obvious applications at the 
 moment: regular expressions, windows file paths, and format!() strings 
 that want to embed { and } chars. I'm sure there are more as well, such as 
 large string literals that contain things like HTML text.
 
 I took a look at 3 programming languages to see what solutions they had: 
 D, C++11, and Python. I've reproduced their syntax below, plus one more 
 custom syntax, along with pros  cons. I'm hoping we can come up with a 
 syntax that makes sense for Rust.
 
 ## Python syntax:
 
 Python supports an r or R prefix on any string literal (both short 
 strings, delimited with a single quote, or long strings, delimited with 
 3 quotes). The r or R prefix denotes a raw string, and has the 
 effect of disabling backslash-escapes within the string. For the most 
 part. It actually gets a bit weird: if a sequence of backslashes of an odd 
 length occurs prior to a quote (of the appropriate quote type for the 
 string), then the quote is considered to be escaped, but the backslashes 
 are left in the string. This means rfoo\ evaluates to the string 
 `foo\`, and similarly rfoo\\\ is `foo\\\`, but rfoo\\ is merely the 
 string `foo\\`.
 
 Pros:
 * Simple syntax
 * Allows for embedding the closing quote character in the raw string
 
 Cons:
 * Handling of backslashes is very bizarre, and the closing quote character 
 can only be embedded if you want to have a backslash before it.
 
 ## C++11 syntax:
 
 C++11 allows for raw strings using a sequence of the form Rseq(raw 
 text)seq. In this construct, `seq` is any sequence of (zero or more) 
 characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
 looks like R(raw text), which allows for anything in the raw text except 
 for the sequence `)`. The addition of the delimiter sequence allows for 
 constructing a raw string containing any sequence at all (as the delimiter 
 sequence can be adjusted based on the represented text).
 
 Pros:
 * Allows for embedding any character at all (representable in the source 
 file encoding), including the closing quote.
 * Reasonably straightforward
 
 Cons:
 * Syntax is slightly complicated
 
 ## D syntax:
 
 D supports three different forms of raw strings. The first two are 
 similar, being rraw text and `raw text`. Besides the choice of 
 delimiters, they behave identically, in that the raw text may contain 
 anything except for the appropriate quote character. The third syntax is a 
 slightly more complicated form of C++11's syntax, and is called a 
 delimited string. It takes two forms.
 
 The first looks like q(raw text) where the ( may be any non-identifier 
 non-whitespace character. If the character is one of [({ then it is a 
 nesting delimiter, and the close delimiter must be the matching ])} 
 character, otherwise the close delimiter is the same as the open. 
 Furthermore, nesting delimiters do exactly what their name says: they 
 nest. If the nesting delimiter is (), then any ( in the raw text must be 
 balanced with a ) in the raw text. In other words, q(foo(bar)) evaluates 
 to foo(bar), but q(foo(bar) and q(foobar)) are both illegal.
 
 The second uses any identifier as the delimiter. In this case, the 
 identifier must immediately be followed by a newline, and in order to 
 close the string, the close delimiter must be preceded by a newline. This 
 looks like
 
 qdelim
 this is some raw text
 delim
 
 It's essentially a heredoc. Note that the first newline is not part of the 
 string, but the final newline is, so this evaluates to this is some raw 
 text\n.
 
 Pros:
 * Flexible
 * Allows for constructing a raw string that 

Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-19 Thread Martin DeMello
Ah, good point. You could fix it by having a very small whitelist of
acceptable delimiters, but that probably takes it into overcomplex
territory.

martin

On Thu, Sep 19, 2013 at 2:46 PM, Kevin Ballard ke...@sb.org wrote:
 As I just responded to Masklinn, this is ambiguous. How do you lex `do 
 R{foo()}`?

 -Kevin

 On Sep 19, 2013, at 2:41 PM, Martin DeMello martindeme...@gmail.com wrote:

 Yes, I figured R followed by a non-alphabetical character could serve
 the same purpose as ruby's %char.

 martin

 On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard ke...@sb.org wrote:
 I didn't look at Ruby's syntax, but what you just described sounds a little 
 too free-form to me. I believe Ruby at least requires a % as part of the 
 syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for rust, 
 as it would conflict with the % operator. I don't think other punctuation 
 would work well either.

 -Kevin

 On Sep 19, 2013, at 2:10 PM, Martin DeMello martindeme...@gmail.com wrote:

 How complicated would it be to use R but with arbitrary paired
 delimiters (the way, for instance, ruby does it)? It's very handy to
 pick a delimiter you know does not appear in the string, e.g. if you
 had a string containing ')' you could use R{this is a string with a )
 in it} or R|this is a string with a ) in it|.

 martin

 On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:
 One feature common to many programming languages that Rust lacks is raw 
 string literals. Specifically, these are string literals that don't 
 interpret backslash-escapes. There are three obvious applications at the 
 moment: regular expressions, windows file paths, and format!() strings 
 that want to embed { and } chars. I'm sure there are more as well, such 
 as large string literals that contain things like HTML text.

 I took a look at 3 programming languages to see what solutions they had: 
 D, C++11, and Python. I've reproduced their syntax below, plus one more 
 custom syntax, along with pros  cons. I'm hoping we can come up with a 
 syntax that makes sense for Rust.

 ## Python syntax:

 Python supports an r or R prefix on any string literal (both short 
 strings, delimited with a single quote, or long strings, delimited with 
 3 quotes). The r or R prefix denotes a raw string, and has the 
 effect of disabling backslash-escapes within the string. For the most 
 part. It actually gets a bit weird: if a sequence of backslashes of an 
 odd length occurs prior to a quote (of the appropriate quote type for the 
 string), then the quote is considered to be escaped, but the backslashes 
 are left in the string. This means rfoo\ evaluates to the string 
 `foo\`, and similarly rfoo\\\ is `foo\\\`, but rfoo\\ is merely 
 the string `foo\\`.

 Pros:
 * Simple syntax
 * Allows for embedding the closing quote character in the raw string

 Cons:
 * Handling of backslashes is very bizarre, and the closing quote 
 character can only be embedded if you want to have a backslash before it.

 ## C++11 syntax:

 C++11 allows for raw strings using a sequence of the form Rseq(raw 
 text)seq. In this construct, `seq` is any sequence of (zero or more) 
 characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
 looks like R(raw text), which allows for anything in the raw text 
 except for the sequence `)`. The addition of the delimiter sequence 
 allows for constructing a raw string containing any sequence at all (as 
 the delimiter sequence can be adjusted based on the represented text).

 Pros:
 * Allows for embedding any character at all (representable in the source 
 file encoding), including the closing quote.
 * Reasonably straightforward

 Cons:
 * Syntax is slightly complicated

 ## D syntax:

 D supports three different forms of raw strings. The first two are 
 similar, being rraw text and `raw text`. Besides the choice of 
 delimiters, they behave identically, in that the raw text may contain 
 anything except for the appropriate quote character. The third syntax is 
 a slightly more complicated form of C++11's syntax, and is called a 
 delimited string. It takes two forms.

 The first looks like q(raw text) where the ( may be any non-identifier 
 non-whitespace character. If the character is one of [({ then it is a 
 nesting delimiter, and the close delimiter must be the matching ])} 
 character, otherwise the close delimiter is the same as the open. 
 Furthermore, nesting delimiters do exactly what their name says: they 
 nest. If the nesting delimiter is (), then any ( in the raw text must be 
 balanced with a ) in the raw text. In other words, q(foo(bar)) 
 evaluates to foo(bar), but q(foo(bar) and q(foobar)) are both 
 illegal.

 The second uses any identifier as the delimiter. In this case, the 
 identifier must immediately be followed by a newline, and in order to 
 close the string, the close delimiter must be preceded by a newline. This 
 looks like

 qdelim
 this is some raw text
 delim

 It's essentially a