Erik Ostermueller asked: > We have a large amount of C++ that currently has Unicode 2.0 support. > > Could you all help me figure out what types of operations will fail > if we attempt to pass Unicode 3.0 thru this code? > > I can start the list off with > > -sorting > -searching for text
This depends greatly on what implementation you did for sorting and searching, and how it handles unassigned code points in your Unicode 2.0 code. If the code was designed to be forward compatible, it should do reasonable things with unassigned code points, and getting Unicode 3.0 data which is actually using those code points should not disturb your existing code. But, on the other hand, if you have built in a bunch of range checks or have used tables which cannot gracefully handle the appearance of unassigned code points in your data, then it could well blow up. The Unicode Collation Algorithm was not defined until after Unicode 2.0, and was first synched with Unicode 2.1. It has also been considerably updated since then -- the current version is aimed at Unicode 3.1. You should take a look at the current version to check for gotchas you may have in your current code. > -text comparison I assume here you are not talking about language-specific collation comparisons, but just Unicode analogs of strcmp() and the like. If so, those should behave well -- they aren't usually programmed in ways which make them sensitive to particular code point assignments. > -other character classification (isSpace, isDigit, etc...). Again, these depend on what kinds of forward compatibility assumptions your original code made. If it provides meaningful results for unassigned code points in Unicode 2.0, then tossing Unicode 3.0 data at such APIs shouldn't cause any problem to existing code, other than not getting the right results for Unicode 3.0 additions until you have modified and updated your property tables. > > I'm understand that these operations probably won't work in ALL cases. > But how about basic plumbing code -- creating and copying string? Constructors and copy constructors ought to work fine, unless you've done something odd. What you should be more concerned about, however, is how your code is going to get from Unicode 3.0 to Unicode 3.1 (or higher), because then you will have to deal with supplementary characters. Any assumptions that characters don't lie outside the range U+0000..U+FFFF will be broken. Whether this will be a small problem or a big problem for your code depends on whether you are effectively processing Unicode in UTF-8, UTF-16, or UTF-32 (or combinations of those). The biggest hit, when moving from Unicode 3.0 to Unicode 3.1 (or higher) is for UTF-16 APIs. See Unicode Technical Note #7, Migrating Software to Supplementary Characters, for some ideas: http://www.unicode.org/notes/tn7/ --Ken > > As I mentioned in my last post, I've enjoyed > listening in on this forum -- I've learned a whole lot. > > Thanks, > > --Erik Ostermueller >

