Again - 'invalid data' and 'garbage'. Because you're thinking old data with old definition. How about new data and old software?
Your approach means that if a new character is defined in say ISO 8859-8, then all old software should report it as error. And all users must upgrade. When (and if!) an update is available. My approach would mean that old software would not properly display (nor collate) the new character, would however not reject the data. Recognizing what the character actually was is not that hard and is something that many of us did for years. And if the data is eventually converted back to the same codeset, using the same (old) mapping table, the original data is preserved. There are two approaches: A - detecting the errors as early as possible, and B - gracefully handling the data as long as possible. Both have its benefits. I am very much in favor of the first, but sometimes it is simply not possible to use that approach. Once you admit that people are entitled to choose the second approach (depending on their needs), then it is useful to have the behavior defined for it. OK, another way of looking at all this. I believe you would accept three options: A - Reject the stream. B - Drop the invalid data. C - Replace the invalid characters with U+FFFD (the replacement character). Then my proposal could be viewed as an addition to option C, with one difference. Instead of one replacement character, I propose to have 256 (though in most cases only 128 would be used). Now, what does that violate? Lars Kristan > -----Original Message----- > From: Doug Ewell [mailto:[EMAIL PROTECTED]] > Sent: Saturday, March 16, 2002 06:59 > To: [EMAIL PROTECTED] > Cc: Lars Kristan > Subject: Re: Missing values in mapping-tables? > > > Lars Kristan <[EMAIL PROTECTED]> wrote: > > > Suppose ISO 8859-8 is ever upgraded (even if not likely, > but - for the > sake > > of argument). One might say that it would be bad to change > an existing > > definition in the table e.g. for 0xBF from 0x2DBF to 0x20AC. But how > is that > > worse from changing it from <undefined> to 0x20AC ? > > I think it is actually better, since you can never guess > what will be > > implemented for <undefined>. "Throw and exception" is what I keep > seeing in > > these discussions. Who will catch it? The secretary on the third > floor? > > "Defining" undefined code points to be something they aren't is not a > Good Thing. Even if ISO 8859-8 were updated at some time in > the future, > with new code points being added, the old data that was > created with the > old 8859-8 would still contain invalid data. > > > If mapping for undefined values would be 0xhh -> 0x2Dhh, then there > would be > > a consistent definition of what to do if somebody wants to do > something else > > than throw things out the window. Consequentially, there would be a > better > > chance of being able to repair inadvertently processed data at some > later > > time. > > It's not repairable, because it contained garbage. > > -Doug Ewell > Fullerton, California > > >

