Re: [rust-dev] Proposed API for character encodings

2013-09-22 Thread Simon Sapin

Le 21/09/2013 16:38, Olivier Renaud a écrit :

I'd expect this offset to be absolute. After all, the only thing that the
programmer can do with this information at this point is to report it to the
user ; if the programmer wanted to handle the error, he could have done it by
using a trap. A relative offset has no meaning outside of the processing loop,
whereas an absolute offset can still be useful even outside of the program (if
the source of the stream is a file, then an absolute offset will give the exact
location of the error in the file).

A counter is super cheap, I would'nt worry about its cost. Actually, it just
has to be incremented once for each call to 'feed'.


Well to get the position inside a given chunk of input you still have to 
count individual bytes. (Maybe with Iterator::enumerate?) Unless maybe 
we do dirty pointer arithmetic…


If possible, I’d rather find a way to not have to pay that cost in the 
common case where the error handling is *not* abort and DecodeError is 
never used.


This is also a bit annoying as each implementation will have to repeat 
the counting logic, but maybe it’s still worth it.




Note : for the encoder, you will have to specify wether the offset is a 'code
point' count or a 'code unit' count.


Yes. I don’t know yet. If we do [1] and make the input generic it will 
probably have to be code points.


[1] https://mail.mozilla.org/pipermail/rust-dev/2013-September/005662.html

Otherwise, it may be preferable to match Str::slice and count UTF-8 
bytes. (Which I suppose is what you call code units?)


--
Simon Sapin
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Steven Ashley
Hi everyone,

Have we considered syntax similar to Ruby style heredocs? I particularly
like the light looking syntax.

- The indentation of the block is determined by the indentation of the eos
marker. Keeping code flow natural.

eos
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud
eos

- Brackets in the eos marker are flipped to allow [[[raw]]]

- eoseos causes a literal eos to be inserted. For example a raw
string

My main concern is that  might be a common operator. Perhaps  would be
ok?

Thoughts?
On 21/09/2013 4:28 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric (with
  using [[ and nary a quote in sight).

 I think an important thing to keep in mind is that the main reason
 behind creating a new form of literal is for things like:

 * Escapes in format! strings
 * Possible regular expression syntax (this also may be a syntax extension)
 * Type literal windows paths (escaping \ is hard)
 * Otherwise long literals which may contain quotes (like html text)

 With those in mind, although Lua's syntax is sufficient, is it nice to
 use? If the first thing I saw as an introduction to Rust was:

 fn main() {
   println!([[Hello, {}!]], world);
 }

 I would be a little confused. Now the [[/]] aren't really necessary in
 this case, but I'm personally unsure of how usable [[/]] would be
 throughout the language. Raw literals in languages like C++ and Lua I
 think aren't intended to be used that often. Instead they should be
 used only when necessary, and you frequently don't see them in code.
 For rust, the use cases which are the cause of this discussion are
 actually fairly common, and I'm not sure that we'd want to see [[/]]
 all over the place, although of course that's just my opinion :)

 Skimming back, I haven't seen a suggestion of the backtick character
 as a delimiter. Go takes this approach, and I don't believe that in Go
 you can have a backtick anywhere in a backtick literal, and otherwise
 what you see is what you get. It's at least something to consider,
 though.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Steven Ashley
Oh right, that's fair enough. I think the indentation/escaping issues can
be fixed however the new line issues you mentioned will still exist for
strings split over multiple lines using this syntax.

Good luck!

Steven

On Monday, September 23, 2013, Kevin Ballard wrote:

 Heredocs are primarily intended for multiline strings. Raw strings are
 intended for strings that have no escapes. Raw strings typically allow
 newlines, but that is not their primary purpose (and in Rust, regular
 strings allow newlines anyway). Trying to use a heredoc syntax for raw
 strings is just a headache (because of indentation, and dealing with the
 first and/or trailing newline in the heredoc).

 -Kevin

 On Sep 22, 2013, at 11:52 AM, Artem Egorkine art...@gmail.com wrote:

 I must be missing something about ruby heredocs, but the indentation had
 always been a painful question about them (
 http://stackoverflow.com/questions/3772864/how-do-i-remove-leading-whitespace-chars-from-ruby-heredoc).
 Another thing, of course, it's that they are by no means raw (which of
 course doesn't stop rust from adopting their syntax for raw strings. I
 would just say that it would be nice to pick such syntax for raw strings
 that allows for both single line raw strings and multi-line raw strings to
 be represented easily.
 On Sep 22, 2013 1:00 PM, Steven Ashley ste...@ashley.net.nz wrote:

 Hi everyone,

 Have we considered syntax similar to Ruby style heredocs? I particularly
 like the light looking syntax.

 - The indentation of the block is determined by the indentation of the
 eos marker. Keeping code flow natural.

 eos
 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
 eiusmod tempor
 incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud
 eos

 - Brackets in the eos marker are flipped to allow [[[raw]]]

 - eoseos causes a literal eos to be inserted. For example a raw
 string

 My main concern is that  might be a common operator. Perhaps  would
 be ok?

 Thoughts?
 On 21/09/2013 4:28 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric
 (with
  using [[ and nary a quote in sight).

 I think an important thing to keep in mind is that the main reason
 behind creating a new form of literal is for things like:

 * Escapes in format! strings
 * Possible regular expression syntax (this also may be a syntax
 extension)
 * Type literal windows paths (escaping \ is hard)
 * Otherwise long literals which may contain quotes (like html text)

 With those in mind, although Lua's syntax is sufficient, is it nice to
 use? If the first thing I saw as an introduction to Rust was:

 fn main() {
   println!([[Hello, {}!]], world);
 }

 I would be a little confused. Now the [[/]] aren't really necessary in
 this case, but I'm personally unsure of how usable [[/]] would be
 throughout the language. Raw literals in languages like C++ and Lua I
 think aren't intended to be used that often. Instead they should be
 used only when necessary, and you frequently don't see them in code.
 For rust, the use cases which are the cause of this discussion are
 actually fairly common, and I'm not sure that we'd want to see [[/]]
 all over the place, although of course that's just my opinion :)

 Skimming back, I haven't seen a suggestion of the backtick character
 as a delimiter. Go takes this approach, and I don't believe that in Go
 you can have a backtick anywhere in a backtick literal, and otherwise
 what you see is what you get. It's at least something to consider,
 though.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev



___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Steven Ashley
I'm in favour of C++11 syntax.

On Monday, September 23, 2013, Steven Ashley wrote:

 Oh right, that's fair enough. I think the indentation/escaping issues can
 be fixed however the new line issues you mentioned will still exist for
 strings split over multiple lines using this syntax.

 Good luck!

 Steven

 On Monday, September 23, 2013, Kevin Ballard wrote:

 Heredocs are primarily intended for multiline strings. Raw strings are
 intended for strings that have no escapes. Raw strings typically allow
 newlines, but that is not their primary purpose (and in Rust, regular
 strings allow newlines anyway). Trying to use a heredoc syntax for raw
 strings is just a headache (because of indentation, and dealing with the
 first and/or trailing newline in the heredoc).

 -Kevin

 On Sep 22, 2013, at 11:52 AM, Artem Egorkine art...@gmail.com wrote:

 I must be missing something about ruby heredocs, but the indentation had
 always been a painful question about them (
 http://stackoverflow.com/questions/3772864/how-do-i-remove-leading-whitespace-chars-from-ruby-heredoc).
 Another thing, of course, it's that they are by no means raw (which of
 course doesn't stop rust from adopting their syntax for raw strings. I
 would just say that it would be nice to pick such syntax for raw strings
 that allows for both single line raw strings and multi-line raw strings to
 be represented easily.
 On Sep 22, 2013 1:00 PM, Steven Ashley ste...@ashley.net.nz wrote:

 Hi everyone,

 Have we considered syntax similar to Ruby style heredocs? I particularly
 like the light looking syntax.

 - The indentation of the block is determined by the indentation of the eos
 marker. Keeping code flow natural.

 eos
 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
 eiusmod tempor
 incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud
 eos

 - Brackets in the eos marker are flipped to allow [[[raw]]]

 - eoseos causes a literal eos to be inserted. For example a raw
 string

 My main concern is that  might be a common operator. Perhaps  would
 be ok?

 Thoughts?
 On 21/09/2013 4:28 AM, Alex Crichton a...@crichton.co wrote:

  Of the 3, Lua's is probably the best, although it's a bit esoteric (with
  using [[ and nary a quote in sight).

 I think an important thing to keep in mind is that the main reason
 behind creating a new form of literal is for things like:

 * Escapes in format! strings
 * Possible regular expression syntax (this also may be a syntax extension)
 * Type literal windows paths (escaping \ is hard)
 * Otherwise long literals which may contain quotes (like html text)

 With those in mind, although Lua's syntax is sufficient, is it nice to
 use? If the first thing I saw as an introduction to Rust was:

 fn main() {
   println!([[Hello, {}!]], world);
 }

 I would be a little confused. Now the [[/]] aren't really necessary in
 this case, but I'm personally unsure of how usable [[/]] would be
 throughout the language. Raw literals in languages like C++ and Lua I
 think aren't intended to be used that often. Instead they should be
 used only when necessary, and you frequently don't see them in code.
 For rust, the use cases which are the cause of this discussion are
 actually fairly common, and I'm not sure that we'd want to see [[/]]
 all over the place, although of course that's just my opinion :)

 Skimming back, I haven't seen a suggestion of the backtick character
 as a delimiter. Go takes this approach, and I don't believe that in Go
 you can have a backtick anywhere in a backtick literal, and otherwise
 what you see is what you get. It's at least something to consider,
 though.
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


 

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Sebastian Sylvan
On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard ke...@sb.org wrote:

 One feature common to many programming languages that Rust lacks is raw
 string literals.


This is one of those things where I feel almost all languages get wrong,
and probably mostly for historical reasons. IMO there should *only* be raw
string literals on the syntax level. It seems extremely weird to me that
languages have this second-level language that gets interpreted within a
literal. That kind of higher level processing should be part of a
formatting library (e.g. a macro like fmt), rather than an embedded
language inside the literal syntax. So, I think string literals should
contain exactly what they contain in their source form, without any
additional processing. If you want to express characters that are
inconvenient to type, you can use control sequences and a (standard)
formatting library to produce them.

-- 
Sebastian Sylvan
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread SiegeLord

On 09/22/2013 05:40 PM, Kevin Ballard wrote:

I've filed a summary of this conversation as an RFC issue on the GitHub issue 
tracker.

https://github.com/mozilla/rust/issues/9411


I've used a variation of the option 10 for my own configuration format's 
raw strings:


delimraw textdelim

Where delim was an equivalent of an identifier.

If ` is a problem, then maybe using ' works too?

'delimraw textdelim'

'raw text'

-SL
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Kevin Ballard
' doesn't work because 'delim is parsed as a lifetime.

-Kevin

On Sep 22, 2013, at 3:41 PM, SiegeLord slab...@aim.com wrote:

 On 09/22/2013 05:40 PM, Kevin Ballard wrote:
 I've filed a summary of this conversation as an RFC issue on the GitHub 
 issue tracker.
 
 https://github.com/mozilla/rust/issues/9411
 
 I've used a variation of the option 10 for my own configuration format's raw 
 strings:
 
 delimraw textdelim
 
 Where delim was an equivalent of an identifier.
 
 If ` is a problem, then maybe using ' works too?
 
 'delimraw textdelim'
 
 'raw text'
 
 -SL
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread SiegeLord

On 09/22/2013 07:10 PM, Kevin Ballard wrote:

' doesn't work because 'delim is parsed as a lifetime.


The parser will have to be modified to support raw strings in any of 
their manifestations. Is it a fact that there is no possible parser than 
can differentiate between 'delim and 'delim ? I guess it'll give 
trouble to this current syntax 'fooblah, but it wouldn't be the first 
place in the grammar where a space was necessary to disambiguate between 
constructs (  comes to mind).


-SL
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Kevin Ballard
It would require changing the rules for lifetimes, with no benefit (and no 
clear new rule to use anyway). 'foodelim is perfectly legal today, and I see 
no reason to change that.

-Kevin

On Sep 22, 2013, at 4:26 PM, SiegeLord slab...@aim.com wrote:

 On 09/22/2013 07:10 PM, Kevin Ballard wrote:
 ' doesn't work because 'delim is parsed as a lifetime.
 
 The parser will have to be modified to support raw strings in any of their 
 manifestations. Is it a fact that there is no possible parser than can 
 differentiate between 'delim and 'delim ? I guess it'll give trouble to this 
 current syntax 'fooblah, but it wouldn't be the first place in the grammar 
 where a space was necessary to disambiguate between constructs (  comes to 
 mind).
 
 -SL

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread SiegeLord

On 09/22/2013 07:45 PM, Kevin Ballard wrote:

It would require changing the rules for lifetimes, with no benefit (and no clear new rule to 
use anyway). 'foodelim is perfectly legal today, and I see no reason to 
change that.

It's not as big a change as you make it out to be, but fair enough.

Looking at the parser right now, it seems to me that implementing the 
leading 'R' in C++'s syntax will be just as difficult/easy as doing my 
delimstuffdelim proposal so I'm sticking to that idea as my 'vote'.


If C++ way is chosen, I'd suggest the following permutation of the 
delimeters, as I think it looks lighter (by virtue of using smaller 
characters):


r'delimraw stringdelim'
r'raw string'

-SL
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: Syntax for raw string literals

2013-09-22 Thread Kevin Ballard
On Sep 22, 2013, at 5:27 PM, SiegeLord slab...@aim.com wrote:

 On 09/22/2013 07:45 PM, Kevin Ballard wrote:
 It would require changing the rules for lifetimes, with no benefit (and no 
 clear new rule to use anyway). 'foodelim is perfectly legal today, and I 
 see no reason to change that.
 It's not as big a change as you make it out to be, but fair enough.
 
 Looking at the parser right now, it seems to me that implementing the leading 
 'R' in C++'s syntax will be just as difficult/easy as doing my 
 delimstuffdelim proposal so I'm sticking to that idea as my 'vote'.

With C++11 syntax, `Rfoo` is very obviously the start of a raw string. With 
your syntax, what about `addfoo`? Is that obviously the start of a raw string, 
or did the user just forget to type the ( in their function call? They may look 
the same to a lexer, but I think that being very clear about what starts the 
raw string is beneficial for reading.

 If C++ way is chosen, I'd suggest the following permutation of the 
 delimeters, as I think it looks lighter (by virtue of using smaller 
 characters):
 
 r'delimraw stringdelim'
 r'raw string'

I'd really rather not overload the meaning of the ' character, if at all 
possible. Right now it's used for lifetimes, and character literals. Expanding 
it to also be used in string literals just feels like unnecessary overloading. 
We already have a perfectly good  that means string literal. I suppose you 
could flip that to rdelim'raw string'delim or r'raw string'. I just don't 
see why that's any better than Rdelim(raw string)delim or R(raw string). 
Especially in the r'raw string' case, having lots of little tick marks in a 
row takes more effort to visually distinguish. I suppose r(raw string) is an 
option, but if we're that close to C++11 we may as well just go whole hog and 
be consistent with their syntax.

-Kevin
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev