Re: I don't want to be facing 8-bit bugs in 2013
You want to be facing 8-bit bugs in 2002? I recommend reconsideration of priorities. -- James W. Meritt CISSP, CISA Booz | Allen | Hamilton phone: (410) 684-6566
Re: [idn] Re: I don't want to be facing 8-bit bugs in 2013
Unicode is not usable in international context. ... It would not be worth replying to these threadworn and repeated assertions by Mr. Ohta, except that some members of this list may not be that familiar with Unicode. Clearly Unicode is being used successfully in a huge variety of products in international contexts. For more information on the CJK repertoire (which is the part Mr. Ohta objects to), see http://www.unicode.org/unicode/faq/han_cjk.html. Mark — Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Original Message - From: Masataka Ohta [EMAIL PROTECTED] To: Robert Elz [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, March 20, 2002 07:58 Subject: [idn] Re: I don't want to be facing 8-bit bugs in 2013 Kre; | IDNA does _not_ work, because Unicode does not work in International | context. This argument is bogus, and always has been. If (and where) unicode is defective, the right thing to do is to fix unicode. Unicode is not usable in international context. There is no unicode implementaion work in international context. Unicode is usable in some local context. There is some unicode implementaion work in local contexts. However, the context information must be supplied out of band. And, the out of band information is equivalent to charset information, regardless of whether you call it charset or not. So, stop arguing against unicode (10646) - just fix any problems it has. Fix is to supply context information out of band to specify which Unicode-based local character set to use. With MIME, it is doable by using different charset names for different local character set. See, for example, RFC1815. As for IDN, it can't just say use charset of utf-7 or use charset of utf-8. IDN can say for Japanese, use charset of utf-7-japanese. Or, if you insist not to distinguish different local character sets by MIME charset names, IDN can say use charset of utf-7, but, for Japanese, use Japan local version of utf-7 and somehow specify how a name is used for Japanese. Anyway, with the fix, there is no reason to prefer Unicode-based local character sets, which is not widely used today, than existing local character sets already used world wide. Masataka Ohta
Re: I don't want to be facing 8-bit bugs in 2013
Date:Thu, 21 Mar 2002 00:57:18 +0859 () From:Masataka Ohta [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Otha-san | Anyway, with the fix, there is no reason to prefer Unicode-based | local character sets, which is not widely used today, than existing | local character sets already used world wide. Let's assume that local char sets, and an explicit indication of which to use is adequate for this purpose (as Harald has said, and I agree, that's not sufficient for all purposes, but for stuff like domain names, file names, etc, I suspect it is). Then, let's take all the MIME names of those charsets, and number them, 0, 1, 2, 3, ... (as many as it takes), that's a 1::1 mapping, after all the mime charset names are in ascii, we wouldn't want to keep that. Then, rather than labelling whole names with a charset, we'll label every individual character, so if, by some strange chance, ascii (or 8859-1) happened to be allocated number 0, then 'A' would be 0-65. This way we can mix and match names with characters from any random character set (it may be a little less efficient than one label per name but that's OK, assume we'll pay that price for the flexibility). No, we'll just list all the possible characters in all the char sets, with their char set numbers attached, in one big table. What we have at that point is (essentially) 10646 - unicode. Just up above you said this works (throwing in some redundant labelling cannot cause it to stop working, nor can altering the labels from ascii to binary numerals). Of course, a large fraction of the char sets that we just listed in a big table contain ascii as a subset,so we end up with dozens of versions of A (it's in 8859-1 8859-2 ... tis-620 (Thai) ...). All of those A's are in all respects identical, keeping them will do no more than cause interoperability problems. So let's go through and squash all the duplicates. and then renumber everything to be a bit more compact (the renumbering is a 1::1 operation that alters nothing important). Having done that, we have exactly 10646 (or can have, if we picked the right character sets for our initial big list of everything, gave them the appropriate char set numbers, and compressed the number spaces in just the right way). Again, you said above, this works. The only place in all of this where there's any possibility of a problem is in the squash all the duplicates - if some characters that are duplicates aren't removed, or some that aren't are treated as if they are. If this happened, then there'd be a bug in the actual character tables (too many, or too few) - but this doesn't alter the principle of the thing. If there is a bug like this (which I am not able to judge) then someone should get it fixed. Whether or not there is a bug, the unicode/10646 approach is clearly the most flexible way, and is perfectly adequate for, labelling things using whatever characters anyone wants to use - internationally or locally. There is simply no way to rationally claim that a local char set, plus label, is adequate and unicode is not. kre
Re: I don't want to be facing 8-bit bugs in 2013
Kre; | IDNA does _not_ work, because Unicode does not work in International | context. This argument is bogus, and always has been. If (and where) unicode is defective, the right thing to do is to fix unicode. Unicode is not usable in international context. There is no unicode implementaion work in international context. Unicode is usable in some local context. There is some unicode implementaion work in local contexts. However, the context information must be supplied out of band. And, the out of band information is equivalent to charset information, regardless of whether you call it charset or not. So, stop arguing against unicode (10646) - just fix any problems it has. Fix is to supply context information out of band to specify which Unicode-based local character set to use. With MIME, it is doable by using different charset names for different local character set. See, for example, RFC1815. As for IDN, it can't just say use charset of utf-7 or use charset of utf-8. IDN can say for Japanese, use charset of utf-7-japanese. Or, if you insist not to distinguish different local character sets by MIME charset names, IDN can say use charset of utf-7, but, for Japanese, use Japan local version of utf-7 and somehow specify how a name is used for Japanese. Anyway, with the fix, there is no reason to prefer Unicode-based local character sets, which is not widely used today, than existing local character sets already used world wide. Masataka Ohta
Re: I don't want to be facing 8-bit bugs in 2013
Anyway, with the fix, there is no reason to prefer Unicode-based local character sets, which is not widely used today, than existing local character sets already used world wide. Of course there is. What do you do when someone wants to combine charsets from different nations? For example, say a Japanese man named Ohta married a Mexican woman whose paternal surname was Colón. Their child's full surname, if they lived in Mexico, would be Ohta y Colón. If that child wants to spell their surname correctly, they can't use a just-European or just-Japanese character set; they probably need Unicode. /=\ |John Stracke|Principal Engineer | |[EMAIL PROTECTED] |Incentive Systems, Inc. | |http://www.incentivesystems.com |My opinions are my own. | |=| |I imagine the wages of sin *are* death, but by the time they take| |taxes out it's just sort of a tired feeling. --Paula Poundstone | \=/
Re: I don't want to be facing 8-bit bugs in 2013
On Mar 20, D. J. Bernstein [EMAIL PROTECTED] wrote: False. IDNA does _not_ work. IDNA causes interoperability failures. Mail ... with the current DNS resolvers in place... OK, others have pointed out failures with things like SSL/HTTPS (which is broken in several interesting way anyway from the point of view of scalability and long-term usefullness), but even then I can see work arounds. All of them are more preferable to throwing away every serial console in the world because a greek delta doesn't display properly on it and therefore has a 'display failure'. That assumption is false. Consider, for example, an MTA configured to accept mail for pi.cr.yp.to, with a Greek pi. The MTA compares the incoming domain name to pi.cr.yp.to. That doesn't involve the resolver. Well, funnily enough I have to tell my MTA about .co.uk domains AND .com domains as well, even though the bit in front of them is IDENTICAL. If I register the .net I have to tell it about that as well. They're different domains. If I buy an IDN with a Greek letter in it, I'm going to have to tell my MTA about that. If think that by adding a new domain to your config you shouldn't have to tell your MTA about that, you're obviously not a very experienced admin. Now, please explain why the same user should prefer a domain name that's _occasionally_ displayed with the desired delta but _usually_ displayed as incomprehensible gobbledygook. Until he fixes his machine to have proper Greek alphabet support. However, when it comes over to my PDP-8 it appears as gobbledygook. But it still works. Sure, that's REAL broken. In short, you're looking at the long-term IDNA benefits (never mind the interoperability failures and all the other problems) but refusing to look at the long-term UTF-8 benefits. Inconsistent once again. The IETF is about the long-term. It not about the next 2 years. Even I know that. You should certainly know that. * interoperability failures; That don't exist. Or the ones that do can be given work-around far simpler than replacing every piece of equipment in th eknown universe and updating every piece of software ever written. Ever. * inconsistent displays of the same name; That don't matter. * unnecessary implementation and deployment costs; The fact you're suggesting that what *I'm* proposing and what the IDNA is proposing requires unnecessary implementation and deployment costs, quite frankly, makes me snort my coffee up my nose. * multiple semantically similar names; OK, let's not allow uppercase Greek Alpha. Does that fix one of your problems? Sheesh. * identical displays of different names; and Not a problem for the IETF. Not even a problem for users. In Greece A means Alpha. In the US it means 'Ay'. If you're a Greek in New York, you might have problems. You're going to have problems anyway, but at least with IDNA PunyCode is your friend. * typing failures. We get those anyway. It's why slahdot.org exists. False. Every step in http://cr.yp.to/proto/idnc3.html preserves interoperability. Fixing half a dozen pieces of software and then just allowing every mand and his dog to register special Bernsteinised-domains? Right. I see. There are several options. One option is to work around the hardware limitations in software, displaying something like | /\/ /\ | /^ /\ /\ /\ \/\ \/ | * \_ \/ | | | *ROFL* # cat Mail/inbox | utf8decode | figlet That's how I like to read my mail in the morning! Seriosuly man, I'm sat here on a customer's site in the middle of the South Atlantic looking forward to an 18 hour RAF flight home to the UK, and I'm crying with laughter. You really have cheered me up. That was so off-the-wall and unexpected - to propose IDNs to be displayed as ASCII art - that I'm going to be smiling and giggling to myself for days. You have confirmed what I should have known - you're stark raving insane. Quite, quite, quite mad. Fantastic. Another, much more popular, option is to move your email reading, web browsing, etc. from your 1970s-vintage VT100 to a graphics terminal. Have you considered the VT340, for example? Or an IBM PC, model 5150? You're proposing the abondonment of legacy hardware when there is no need. That costs money. Serious money. If I have my hostname setup as pi.alpha-ol.com and I need to gain access over the serial port at boot with a VT100, the option of anything else is not available. a good 20% of sites out there will just have to shut down ops permanently Get a grip, Paul. I'm not the one proposing we turn the world into one great big ASCII art factory. Anyway, I'm going to be away from mail for a few days due to travelling around - don't think I'm giving up on you yet though. :-) Seriously... ascii art... hehehe... -- Paul Robinson
Re: I don't want to be facing 8-bit bugs in 2013
Harald; [EMAIL PROTECTED] wrote: Unicode is usable in some local context. Agreed. Note that some is changing to many as time goes on. Irreleant, because some contexts are not compatible. The point here is that there can be no universal context. There is some unicode implementaion work in local contexts. However, the context information must be supplied out of band. Agreed. Then, how can you provide the information with IDN? And, the out of band information is equivalent to charset information, regardless of whether you call it charset or not. Do not agree. for most values of what we currently have registered as charset, it is not sufficient to identify the context. That is a problem of current and past charset reviewers including you. Therefore, depending on charset to identify context is not only useless, but actively harmful. Agreed. ISO 2022 escape sequence is the way to go. My opinion, which I stated in RFC 1766, and have found no reason to change. It was already denied by real world examples. Masataka Ohta
Re: I don't want to be facing 8-bit bugs in 2013
Erkki I. Kolehmainen; The use of local character sets (encoding) is doomed for particularly ww information interchange. Interestingly enough, ww information interchange is working very well with local character sets. The reason is because only people sharing a language, a scripting system and a character encoding system join each exchange, regardless of whether it is ww or intranational. For example, ww IETF communication is with English, Latin script and ASCII. Introduction of ISO-8859-1 or Unicode does not make IETF use Finnish. Your attempt to put ISO-8859-1 characters is not acceptable for me and your mail is filtered to be pure ASCII by my mailer, which is fair because many of us have no way to input non-ASCII ISO-8859-1 characters. Masataka Oha
Re: I don't want to be facing 8-bit bugs in 2013
While the discussion of the use of various character set is interesting topic, one which is also of interest to IDN WG, such prolonged discussion are better carried out in a forum which is dedicated to this, such as [EMAIL PROTECTED], a list which is formed to talk about the generic problem of I18N and L10N in IETF, and not IDN. Please bring it over to the other list and when/if there is a conclusion, please keep the IDN informed. Thanks. -James Seng - Original Message - From: Masataka Ohta [EMAIL PROTECTED] To: Erkki Kolehmainen [EMAIL PROTECTED] Cc: D. J. Bernstein [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, March 21, 2002 7:44 AM Subject: Re: I don't want to be facing 8-bit bugs in 2013 Erkki I. Kolehmainen; The use of local character sets (encoding) is doomed for particularly ww information interchange. Interestingly enough, ww information interchange is working very well with local character sets. The reason is because only people sharing a language, a scripting system and a character encoding system join each exchange, regardless of whether it is ww or intranational. For example, ww IETF communication is with English, Latin script and ASCII. Introduction of ISO-8859-1 or Unicode does not make IETF use Finnish. Your attempt to put ISO-8859-1 characters is not acceptable for me and your mail is filtered to be pure ASCII by my mailer, which is fair because many of us have no way to input non-ASCII ISO-8859-1 characters. Masataka Oha
Re: I don't want to be facing 8-bit bugs in 2013
D. J. Bernstein; Paul Robinson writes: You tell him that although it's gobbledygook to people without greek alphabet support, it will still work. It's not convenient, but it WILL work. Guaranteed. False. IDNA does _not_ work. IDNA causes interoperability failures. IDNA does _not_ work, because Unicode does not work in International context. People who say that IDN is purely a DNS issue are confused. It's purely a cultural issue. In fact, the cost of fixing UTF-8 displays is much _smaller_ than the cost of fixing IDNA displays. UTF-8 has been around for many years, has built up incredible momentum (as illustrated by RFC 2277), and already works in a huge number of programs. In international context, it is technically impossible to properly display Unicode characters. There is no implementation exist. While some implementations work in some localized context, local character set serves better for the context. Masataka Ohta
Re: I don't want to be facing 8-bit bugs in 2013
Date:Wed, 20 Mar 2002 14:32:41 +0859 () From:Masataka Ohta [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] | IDNA does _not_ work, because Unicode does not work in International | context. This argument is bogus, and always has been. If (and where) unicode is defective, the right thing to do is to fix unicode. That is, it isn't the principle of a single encoding of all characters that anyone is objecting to here, it is that some specific characters have been implemented incorrectly (merged with others) as I understand it. I'm not competent to decide how important this problem is, and this is not the forum to debate it anyway (so please don't reply just to tell me how significant the problem is, nor why). Do that with whoever maintains unicode. If you can't get enough of the unicode experts to agree that there's a problem that needs fixing, then by definition, there isn't. That's just the same way as the IETF works (whether the unicode group actually work this way or not - if not, they should...) - that a few people believe something is broken is irrelevant if they can't demonstrate that well enough to sway others to agree with them. So, stop arguing against unicode (10646) - just fix any problems it has. kre