Hi,

I've been poking around with the issue of unicode support, which I think is very important in the language, but carries a bit of a ... tension in terms of what we include and at-what-level of the libraries.

Namely: the canonical "unicode implementation" in the world is libicu, in terms of completeness and correctness and such, and it's *huge*. A full build of it is like 16mb or so. So while I'd love to expose all of libicu as core::char and core::str -- to take the needs of unicode-text-as-advertized seriously -- part of me thinks that making such a physically-large dependency part of the "minimum library for nontrivial rust programs" might offend as many people as it pleases.

I see a few possible solutions to this:

 1. Manually convert (using helper scripts of such) a small "important"
    chunk of the unicode data tables into rust code and include that in
    core, with a binding to the "full" libicu as a component of std or
    even an external crate in cargo.

 2. Integrate libicu as a submodule but make a minimalist build of it
    part of our core library. There are a lot of optional parts; leave
    out the less-important parts.

 3. Integrate libicu as a submodule and build the whole thing, unicode
    is too important to skimp on.

 4. ??? some other option

I'm going to be exploring #2 today in the sense of finding out how small a build I can make while still including "the important parts". I'll report further here when I find out. But in the meantime I'm curious what people think about this issue.

I'm also curious what people think are "the important parts" of unicode. This is more of a solicitation for input from actual unicode experts. It's a giant standard and I'm curious who, if anyone on this list, has a clear picture of sensible ways to divide it up.

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to