Hi,

In response to:
https://github.com/mozilla/rust/wiki/Meeting-weekly-2013-12-03#strfrom_utf8

Yes, error handling other than strict/fail requires allocation. I suggest taking the pull request for the special case of non-allocating strict UTF-8, and keeping error handling for a future, larger API that also handles other encodings (and incremental processing):

https://github.com/mozilla/rust/pull/10701
https://github.com/mozilla/rust/wiki/Proposal-for-character-encoding-API


[On invalid UTF-8 bytes]
brson: One has a condition that lets you replace a bad character

I believe this is not implemented. The current not_utf8 condition lets you do the entire decoding yourself.

acrichto: We could truncate by default.

I am very much opposed to this. Truncating silently loses data (potentially lots of it!) It should not be implemented, let alone be the default.

jack: In python, you have to specify how you want it transformed.
Truncate vs. replace with '?', etc. Maybe there should be an
alternate version that takes the transform.
pnkfelix: But doesn't work with slices...
jack: There's truncate, replace, and fail.

Python does not have truncate. It has ignore (skip invalid byte sequences but continue with the rest of the input), strict (fail), and replace (with � U+FFFD REPLACEMENT CHARACTER). You don’t have to specify an error handling, strict is the default.

Ignore is bad IMO as it silently loses data (although it’s not as bad as truncate) though it could have uses I’m not thinking of right now.


Side note:

Regarding failing vs. returning an Option or Result: I’d be in favor of only having the latter. Having two versions of the same API (foo() and foo_opt()) is ugly, and it’s easy to get "value or fail" from an Option with .unwrap()

--
Simon Sapin
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to