Hi,
I've been poking around with the issue of unicode support, which I think
is very important in the language, but carries a bit of a ... tension in
terms of what we include and at-what-level of the libraries.
Namely: the canonical "unicode implementation" in the world is libicu,
in terms of completeness and correctness and such, and it's *huge*. A
full build of it is like 16mb or so. So while I'd love to expose all of
libicu as core::char and core::str -- to take the needs of
unicode-text-as-advertized seriously -- part of me thinks that making
such a physically-large dependency part of the "minimum library for
nontrivial rust programs" might offend as many people as it pleases.
I see a few possible solutions to this:
1. Manually convert (using helper scripts of such) a small "important"
chunk of the unicode data tables into rust code and include that in
core, with a binding to the "full" libicu as a component of std or
even an external crate in cargo.
2. Integrate libicu as a submodule but make a minimalist build of it
part of our core library. There are a lot of optional parts; leave
out the less-important parts.
3. Integrate libicu as a submodule and build the whole thing, unicode
is too important to skimp on.
4. ??? some other option
I'm going to be exploring #2 today in the sense of finding out how small
a build I can make while still including "the important parts". I'll
report further here when I find out. But in the meantime I'm curious
what people think about this issue.
I'm also curious what people think are "the important parts" of unicode.
This is more of a solicitation for input from actual unicode experts.
It's a giant standard and I'm curious who, if anyone on this list, has a
clear picture of sensible ways to divide it up.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev