Re: [rust-dev] Proposed API for character encodings

Olivier Renaud Sat, 21 Sep 2013 08:39:39 -0700

Le samedi 21 septembre 2013 07:59:26 Simon Sapin a écrit :
> Le 20/09/2013 20:07, Olivier Renaud a écrit :
> > I have one more question regarding the error handling : in DecodeError,
> > what does 'input_byte_offset' mean ? Is it relative to the
> > 'invalid_byte_sequence' or to the beginning of the decoded stream ?
> 
> Good point. I’m not sure. (Remember I make this up as we go along :).)
> If it’s from the entirety of the input this would require decoders to
> keep count, which is unnecessary work in cases where you don’t use it.
> (eg. with the Replace error handling.)
> 
> So it could be from the beginning of the input in the last call to
> .feed() to the begining of the invalid byte sequence, *which can be
> negative*, in case the invalid sequence started in an earlier .feed() call.
> 
> What do you think it should be?


I'd expect this offset to be absolute. After all, the only thing that the 
programmer can do with this information at this point is to report it to the 
user ; if the programmer wanted to handle the error, he could have done it by 
using a trap. A relative offset has no meaning outside of the processing loop, 
whereas an absolute offset can still be useful even outside of the program (if 
the source of the stream is a file, then an absolute offset will give the exact 
location of the error in the file).

A counter is super cheap, I would'nt worry about its cost. Actually, it just 
has to be incremented once for each call to 'feed'.

Note : for the encoder, you will have to specify wether the offset is a 'code 
point' count or a 'code unit' count.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposed API for character encodings

Reply via email to