Re: [rust-dev] Meeting-weekly-2013-12-03, str.from_utf8

Simon Sapin Wed, 04 Dec 2013 06:02:57 -0800

Hi,

In response to:
https://github.com/mozilla/rust/wiki/Meeting-weekly-2013-12-03#strfrom_utf8

Yes, error handling other than strict/fail requires allocation. Isuggest taking the pull request for the special case of non-allocatingstrict UTF-8, and keeping error handling for a future, larger API thatalso handles other encodings (and incremental processing):


https://github.com/mozilla/rust/pull/10701
https://github.com/mozilla/rust/wiki/Proposal-for-character-encoding-API


[On invalid UTF-8 bytes]

brson: One has a condition that lets you replace a bad character

I believe this is not implemented. The current not_utf8 condition letsyou do the entire decoding yourself.

acrichto: We could truncate by default.

I am very much opposed to this. Truncating silently loses data(potentially lots of it!) It should not be implemented, let alone be thedefault.

jack: In python, you have to specify how you want it transformed.
Truncate vs. replace with '?', etc. Maybe there should be an
alternate version that takes the transform.
pnkfelix: But doesn't work with slices...
jack: There's truncate, replace, and fail.

Python does not have truncate. It has ignore (skip invalid bytesequences but continue with the rest of the input), strict (fail), andreplace (with � U+FFFD REPLACEMENT CHARACTER). You don’t have to specifyan error handling, strict is the default.

Ignore is bad IMO as it silently loses data (although it’s not as bad astruncate) though it could have uses I’m not thinking of right now.



Side note:

Regarding failing vs. returning an Option or Result: I’d be in favor ofonly having the latter. Having two versions of the same API (foo() andfoo_opt()) is ugly, and it’s easy to get "value or fail" from an Optionwith .unwrap()


--
Simon Sapin
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Meeting-weekly-2013-12-03, str.from_utf8

Reply via email to