I really like the API you are proposing. In particular, the error handling is
close to what I was expecting from such an API.
I have some remarks, though.
Is there a reason for encoders and decoders to not be reusable ? I think it
would be reasonable to specify that they get back to their initial state once
the 'flush' method is called, or when a 'DecodeError' is returned.
Is a condition raised when the order of method calls is not respected ? E.g.
if one calls 'flush' multiple times, of calls 'feed' and then 'decode' ?
It is not clear what is given as a parameter to the 'decoding_error'
condition. I guess it's the exact subset of byte sequence that cannot be
decoded, possibly spanning multiple 'feed' calls. Is that correct ? Is it
sufficient for variable-length encodings ?
I am doubtful that the encoder is just a decoder with [u8] and str swapped. A
decoder must deal with a possibly invalid sequence of bytes, while an encoder
deals with str, which is guaranteed to be a valid utf8 sequence. An encoder
must handle unmappable characters, whereas a decoder doesn't (actually, it
depends whether we consider unicode to be universal or not...).
I think it would be a good idea to make a difference between an invalid
sequence and an unmappable character. I think there should be both an
'invalid_sequence' and an 'unmappable_char' condition.
Also, the 'fatal' handler is a bit scary, based on the name I'd expect it to
result in a 'fail!'.
I propose this set of conditions and handlers :
// Decoder conditions
condition! {
/// The byte sequence is not a valid input
pub invalid_sequence : ~[u8] -> Option<~str>;
/// The byte sequence cannot be represented in Unicode (rarely used)
pub unmappable_bytes : ~[u8] -> Option<~str>;
}
// Encoder condition
condition! {
/// The Unicode string cannot be represented in the target encoding
/// (essential for single byte encodings)
pub unmappable_str : ~str -> Option<~[u8]>;
}
/// Functions to be used with invalid_sequence::cond.trap
/// or unmappable_bytes::cond.trap
mod decoding_error_handlers {
fn decoder_error(_: ~[u8]) -> Option<~str> { None }
fn replacement(_: ~[u8]) -> Option<~str> { Some(~"\uFFFD") }
fn ascii_substitute(_: ~[u8]) -> Option<~str> { Some(~"\u001A") }
fn ignore(_: ~[u8]) -> Option<~str> { Some(~"") }
}
/// Functions to be used with unmappable_str::cond.trap
mod encoding_error_handlers {
fn decoder_error(_: ~str) -> Option<~[u8]> { None }
fn ascii_substitute(_: ~str) -> Option<~[u8]> { Some(~[0x1A]) }
fn ignore(_: ~str) -> Option<~[u8]> { Some(~[]) }
}
Not sure about this substitute/replacement duality. Maybe we can have only one
function name 'default', that would be FFFD for unicode and 1A for ascii.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev