Re: [rust-dev] Proposed API for character encodings

Jeffery Olson Thu, 19 Sep 2013 05:41:01 -0700

On Thu, Sep 19, 2013 at 1:05 AM, Simon Sapin <[email protected]> wrote:


> Le 18/09/2013 23:31, Brian Anderson a écrit :
>
>> On 09/10/2013 08:47 AM, Simon Sapin wrote:
>>
>>> Iterator<u8> and Iterator<char> are tempting, but we may need to work
>>> on big chucks at a time for efficiency: Iterator<~[u8]> and
>>> Iterator<~str>. Or could single-byte/char iterators be reliably
>>> inlined to achieve similar efficiency?
>>>
>>
>> Can Iterator<&[u8]> work if the iterator itself contains a fixed-sized
>> or preallocated buffer? For I/O purposes, allocating a bunch of buffers
>> just to write them out to a stream sounds wasteful..
>>
>> [..]
>>
>>
>> I don't understand this iterator. I'm guessing it calls `concat` on the
>> `DecoderIterator` during each call to `next`, but `concat` consumes
>> `DecoderIterator`s inner `Iterator`, so it subsequent calls to `concat`
>> won't work.
>>
>
> Valid points, but please see the rest of this thread for the updated
> proposal.
>
> https://mail.mozilla.org/**pipermail/rust-dev/2013-**September/005556.html<https://mail.mozilla.org/pipermail/rust-dev/2013-September/005556.html>
>

Yes, would like to here opinion on the implications of using a StringWriter
as output for the lower-level API, as mentioned in the proposal.


>
> I just removed the iterator stuff as it’s relatively easy to build on top
> of the "push"-based API, and there are many variations of it, so we don’t
> need to figure out the details in the first iteration.
>
>
>
>  This API only deals with Decoding. What about Encoding?
>>
>
> As noted in the proposals, it’s basically the same but with [u8] and str
> swapped. I did not include it to keep the size of the email manageable.
>
>
>
>  I don't see the utility of the `Encoding` factory type here, especially
>> of instantiating it to get a `Decoder`. As you indicate, it's instance
>> methods may want to be static methods.
>>
>
> It’s not useful if you know the encoding to use at compile-time. (Eg. this
> format always uses UTF-8.) It’s only useful for code that switches
> encodings at run-time, based eg. on the charset parameter of a Content-Type
> HTTP header. It’s used with a function that I forgot to include in the
> first proposal:
>
>
> fn get_encoding_from_label(label: &str) -> ~Encoding { /* ... */ }
>
>
>  Looks like a fine start to me. Let's do it.
>>
>
> I’m also looking for feedback on the error handling. Do you think
> conditions are the right approach? How much power exactly should the
> handlers have?
>
> As to the implementation: rust-encoding has a lot that could be adapted.
> https://github.com/**lifthrasiir/rust-encoding<https://github.com/lifthrasiir/rust-encoding>
>

Can someone comment on whether we should look at adapting what's in
str::from_utf8 (really, str::raw::from_buf_len is where the action is) and
str::from_utf16 for this? Everyone in IRC I ask says that they are
"correct".. they're also highly optimized.. are they appropriate for this
API? And if not, are comfortable having two totally separate paths for
string decoding?


>
> I may start doing the work at some point, but I’m not making any promise
> on when. In the meantime, anyone interested feel free to take this up.
>

Thanks again for taking the time to look at this issue, Simon. I don't
think there's a huge rush, as I imagine we're too close to 0.8 to rush this
in.


>
> Cheers,
> --
> Simon Sapin
>
> ______________________________**_________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/**listinfo/rust-dev<https://mail.mozilla.org/listinfo/rust-dev>
>

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposed API for character encodings

Reply via email to