Here is an updated proposal, based on email and IRC feedback. The
changes are:
* Fix .feed() and .flush() to have the self parameter they need.
* Remove the iterator stuff. I don’t find it super useful, and it’s easy
enough to build on top of the "push" API. KISS.
* Duplicate the "one shot" convenience API in Decoder so that it’s
usable without involving trait objects and dynamic dispatch.
* Make the output generic in the low-level API by having StringWriter
instead of ~str
* Add encoding_from_label()
* De-emphasize the Encoding trait by moving it to the end. It is only
useful together with encoding_from_label() and other dynamic-dispatch
scenarios. If the encoding to use is known at compile time, one can use
eg. UTF8Decoder directly.
Again, this is only decoders. Encoders are basically the same, with [u8]
and str swapped. Maybe the output could just be std::rt::io::Writer.
/// Each implementation of Encoding has one corresponding implementation
/// of Decoder (and one of Encoder).
///
/// A new Decoder instance should be used for every input.
/// A Decoder instance should be discarded
/// after DecodeError was returned.
trait Decoder {
/// Simple, "one shot" API.
/// Decode a single byte string that is entirely in memory.
/// May raise the decoding_error condition.
fn decode(input: &[u8]) -> Result<~str, DecodeError> {
// Implementation left out.
// This is a default method, but not meant to be overridden.
}
fn new() -> Self;
/// Call this repeatedly with a chunck of input bytes.
/// As much as possible of the decoded text is appended to output.
/// May raise the decoding_error condition.
fn feed<W: StringWriter>(&self, input: &[u8], output: &mut W)
-> Option<DecodeError>;
/// Call this to indicate the end of the input.
/// The Decoder instance should be discarded afterwards.
/// Some encodings may append some final output at this point.
/// May raise the decoding_error condition.
fn flush<W: StringWriter>(&self, output: &mut W)
-> Option<DecodeError>;
}
/// Takes the invalid byte sequence.
/// Return a replacement string, or None to abort with a DecodeError.
condition! {
pub decoding_error : ~[u8] -> Option<~str>;
}
/// Functions to be used with decoding_error::cond.trap
mod decoding_error_handlers {
fn fatal(_: ~[u8]) -> Option<~str> { None }
fn replacement(_: ~[u8]) -> Option<~str> { Some(~"\uFFFD") }
}
struct DecodeError {
input_byte_offset: uint,
invalid_byte_sequence: ~[u8],
}
trait StringWriter {
fn write_char(&mut self, c: char);
fn write_str(&mut self, s: &str);
}
/// Only supports the set of labels defined in the spec
/// http://encoding.spec.whatwg.org/#encodings
/// Such a label can come eg. from an HTTP header:
/// Content-Type: text/plain; charset=<label>
fn encoding_from_label(label: &str) -> &'static Encoding {
// Implementation left out
}
/// Types implementing this trait are "algorithms"
/// such as UTF8, UTF-16, SingleByteEncoding, etc.
/// Values of these types are "encodings" as defined in the WHATWG spec:
/// UTF-8, UTF-16-LE, Windows-1252, etc.
trait Encoding {
/// Could become an associated type with a ::new() constructor
/// when the language supports that.
fn new_decoder(&self) -> ~Decoder;
/// Simple, "one shot" API.
/// Decode a single byte string that is entirely in memory.
/// May raise the decoding_error condition.
fn decode(&self, input: &[u8]) -> Result<~str, DecodeError> {
// Implementation (using a Decoder) left out.
// This is a default method, but not meant to be overridden.
}
}
What do you think?
--
Simon Sapin
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev