HEADSUP: UFS2 patch coming...

2002-06-19 Thread Poul-Henning Kamp


Kirk is loading and aiming is committatron with the UFS2 patch,
expect to see it hit -current any day soon.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: PATCH: wchar_t is already defined in libstd++

2002-06-19 Thread Terry Lambert

OK, this has turned into a long essay, so unless questions are really
addressed to me explicitly, I will try to avoid writing anything else
on this subject.

Here's my Jeremiad on Unicode.  Take it for what it's worth.


"Johny Mattsson (EPA)" wrote:
> 
>Part 1.1Type: Plain Text (text/plain)

| If we settle on wchar_t being 16bits, then we will still be forced to do
| UTF-7/8/16 to properly handle a random Unicode (or ISO/IEC 10646) string,
| since we must deal with that charming thing known as "surrogate pairs" (see
| section 3.7 of the Unicode standard v3.0). This again breaks the "one
| wchar_t == on character". When being forced to deal with Unicode, I much
| prefer working with 32bits, since that guarantees that I get a fixed length
| for each character. Admittedly, it is space inefficient to the Nth degree,
| but speedwise it is better.

ISO/IEC 10646-1 doesn't have any code points allocated above the
low 16 bits.  It's the same as the Unicode 1.1 standard.

Unicode 3.0 throws a whole lot of dead languages into the mix,
or it tries to allocate seperate code points for non-existant
character sets, whose glyphs should be, according the Unicode
Philosophy that resulted in the controversial CJK unification,
unified with existing glyphs within the character set.  Unicode,
after all, is a character set standard, not a font encoding
standard.

Unicode 3.x has not been ratified as an ISO/IEC standard, and it
may not ever be.  So Unicode 3.x incursions above 16 bits are not
really a valid argument until Unicode 3.x is standardized in some
way other than administrative fiat by the Unicode Consortium
having published a new version to sell more books and justify its
continued existance to the people funding it.

--

Historically, I've really had a love/hate relationship with
Unicode.

When Unicode was originally designed, it was intentionally
designed to exclude fixed-cell rendering technologies: if
the font was pre-rendeered, you could not render characters
with ligatures intact.

Personally, I blame this in the fact that Taligent, the real
driving force behind the first Unicode standard, was an IBM
and Apple joint venture, and owed its pocket books to rendering
technologies like Display PostScript, which were direct
competitors with X Windows... and X Windows uses fixed cell
rendering technology, even when it's using TrueType fonts.

So when Unicode first came out, the "private use" areas were
not lare enough, nor sufficinetly near or interleaved with,
that of ligatured languages, like Tamil and Devengari, or
even Arabic and Hebrew.

There was a fundamental assumption that the rendering technology
would be disjoint from the encoding technology, and that the
cost, due to the arrangement of the "private use" areas, was to
be bourne in the rendering engine.  And rendering engines where
that was not possible (e.g. X Windows) would just have to paint
pixels and eat the overhead in the applications (and they did;
you can install "xtamil" from ports and see how it works).

The Japanese *hate* Unicode.  The primary reason for this hate
is, to be blunt, that Unicode is not a superset of JIS-208 or
JIS-208 + JIS-212; the secondary reason is that Japanese is as
nearly protectionist as French, and the CJK unification used
the Chinese dictionary order.  There is a good reason for this,
however: Chinese dictionary order is capable of classifying
Japanese Ideograms.  A simplification of this is that Chinese
dictionary classification is in "stroke, radical" order; thus
it is capable of classifying ideograms that "look like" they
are Chinese ideograms.  The Japanese classification system is
not capable of doing this, and the Japanese have two widely
recognized classification systems for lexical ordering internal
to Japan, so it's not even possible to pick a "right order" if
you were to say "all the Japanese characters, *then* all the
Chinese characters.

In practice, this is a subject for academics who care about the
number of angels which can dance on the head of a pin.  But it
has a slightly deeper protectionist agenda, as well.  The
Japanese computer market, for a very long time, was not a
commoditized market.  Perhaps the largest market share went to
the NEC-PC98 (indeed, there's explicit support in FreeBSD for
this machine).  In such a market, it's possible to create
products which are non-commodity, and end up "owning" customers.
In addition, things like EUC encoding and XPG/4 are rarely
supported by non-Japanese software titles, which protects the
local software production market.  MITI, in fact, has as one of
its mandates, the protection of a market for locally produced
software.

Microsoft's introduction of Unicode, and the subsequent ability
of third party software written solely to support Microsoft
interfaces that used "oleString" and other wchar_t types
natively, meant that there was immediate support for Japanese
in these products.  Microsoft broke down the wall that had
been built in order to protect local