Re: I don't want to be facing 8-bit bugs in 2013

2002-03-21 Thread Meritt James

You want to be facing 8-bit bugs in 2002?  I recommend reconsideration
of priorities.

-- 
James W. Meritt CISSP, CISA
Booz | Allen | Hamilton
phone: (410) 684-6566




Re: [idn] Re: I don't want to be facing 8-bit bugs in 2013

2002-03-21 Thread Mark Davis

 Unicode is not usable in international context.
...

It would not be worth replying to these threadworn and repeated
assertions by Mr. Ohta, except that some members of this list may not
be that familiar with Unicode. Clearly Unicode is being used
successfully in a huge variety of products in international contexts.

For more information on the CJK repertoire (which is the part Mr. Ohta
objects to), see http://www.unicode.org/unicode/faq/han_cjk.html.

Mark
—

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: Masataka Ohta [EMAIL PROTECTED]
To: Robert Elz [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, March 20, 2002 07:58
Subject: [idn] Re: I don't want to be facing 8-bit bugs in 2013


 Kre;

| IDNA does _not_ work, because Unicode does not work in
International
| context.
 
  This argument is bogus, and always has been.   If (and where)
unicode
  is defective, the right thing to do is to fix unicode.

 Unicode is not usable in international context.

 There is no unicode implementaion work in international context.

 Unicode is usable in some local context.

 There is some unicode implementaion work in local contexts.

 However, the context information must be supplied out of band.

 And, the out of band information is equivalent to charset
 information, regardless of whether you call it charset or
 not.

  So, stop arguing against unicode (10646) - just fix any problems
it has.

 Fix is to supply context information out of band to specify which
 Unicode-based local character set to use.

 With MIME, it is doable by using different charset names for
 different local character set.

 See, for example, RFC1815.

 As for IDN, it can't just say use charset of utf-7 or use charset
 of utf-8.

 IDN can say for Japanese, use charset of utf-7-japanese.

 Or, if you insist not to distinguish different local character sets
by
 MIME charset names, IDN can say use charset of utf-7, but, for
 Japanese, use Japan local version of utf-7 and somehow specify
 how a name is used for Japanese.

 Anyway, with the fix, there is no reason to prefer Unicode-based
 local character sets, which is not widely used today, than existing
 local character sets already used world wide.

 Masataka Ohta







Re: I don't want to be facing 8-bit bugs in 2013

2002-03-21 Thread Robert Elz

Date:Thu, 21 Mar 2002 00:57:18 +0859 ()
From:Masataka Ohta [EMAIL PROTECTED]
Message-ID:  [EMAIL PROTECTED]

Otha-san

  | Anyway, with the fix, there is no reason to prefer Unicode-based
  | local character sets, which is not widely used today, than existing
  | local character sets already used world wide.

Let's assume that local char sets, and an explicit indication of which
to use is adequate for this purpose (as Harald has said, and I agree,
that's not sufficient for all purposes, but for stuff like domain names,
file names, etc, I suspect it is).

Then, let's take all the MIME names of those charsets, and number them,
0, 1, 2, 3, ... (as many as it takes), that's a 1::1 mapping, after all
the mime charset names are in ascii, we wouldn't want to keep that.

Then, rather than labelling whole names with a charset, we'll label
every individual character, so if, by some strange chance, ascii (or 8859-1)
happened to be allocated number 0, then 'A' would be 0-65.
This way we can mix and match names with characters from any random
character set (it may be a little less efficient than one label per name
but that's OK, assume we'll pay that price for the flexibility).

No, we'll just list all the possible characters in all the char sets,
with their char set numbers attached, in one big table.

What we have at that point is (essentially) 10646 - unicode.   Just up
above you said this works (throwing in some redundant labelling cannot
cause it to stop working, nor can altering the labels from ascii to binary
numerals).

Of course, a large fraction of the char sets that we just listed in a big
table contain ascii as a subset,so we end up with dozens of versions of
A (it's in 8859-1 8859-2 ... tis-620 (Thai) ...).  All of those A's are
in all respects identical, keeping them will do no more than cause
interoperability problems.   So let's go through and squash all the duplicates.
and then renumber everything to be a bit more compact (the renumbering is
a 1::1 operation that alters nothing important).

Having done that, we have exactly 10646 (or can have, if we picked the
right character sets for our initial big list of everything, gave them
the appropriate char set numbers, and compressed the number spaces in
just the right way).

Again, you said above, this works.

The only place in all of this where there's any possibility of a problem is
in the squash all the duplicates - if some characters that are duplicates
aren't removed, or some that aren't are treated as if they are.  If this
happened, then there'd be a bug in the actual character tables (too many, or
too few) - but this doesn't alter the principle of the thing.

If there is a bug like this (which I am not able to judge) then someone
should get it fixed.

Whether or not there is a bug, the unicode/10646 approach is clearly the most
flexible way, and is perfectly adequate for, labelling things using
whatever characters anyone wants to use - internationally or locally.

There is simply no way to rationally claim that a local char set, plus label,
is adequate and unicode is not.

kre






Re: I don't want to be facing 8-bit bugs in 2013

2002-03-20 Thread Masataka Ohta

Kre;

   | IDNA does _not_ work, because Unicode does not work in International
   | context.
 
 This argument is bogus, and always has been.   If (and where) unicode
 is defective, the right thing to do is to fix unicode.

Unicode is not usable in international context.

There is no unicode implementaion work in international context.

Unicode is usable in some local context.

There is some unicode implementaion work in local contexts.

However, the context information must be supplied out of band.

And, the out of band information is equivalent to charset
information, regardless of whether you call it charset or
not.

 So, stop arguing against unicode (10646) - just fix any problems it has.

Fix is to supply context information out of band to specify which
Unicode-based local character set to use.

With MIME, it is doable by using different charset names for
different local character set.

See, for example, RFC1815.

As for IDN, it can't just say use charset of utf-7 or use charset
of utf-8.

IDN can say for Japanese, use charset of utf-7-japanese.

Or, if you insist not to distinguish different local character sets by
MIME charset names, IDN can say use charset of utf-7, but, for
Japanese, use Japan local version of utf-7 and somehow specify
how a name is used for Japanese.

Anyway, with the fix, there is no reason to prefer Unicode-based
local character sets, which is not widely used today, than existing
local character sets already used world wide.

Masataka Ohta




Re: I don't want to be facing 8-bit bugs in 2013

2002-03-20 Thread John Stracke

Anyway, with the fix, there is no reason to prefer Unicode-based
local character sets, which is not widely used today, than existing
local character sets already used world wide.

Of course there is.  What do you do when someone wants to combine charsets 
from different nations? For example, say a Japanese man named Ohta married 
a Mexican woman whose paternal surname was Colón.  Their child's full surname, if they 
lived in Mexico, would be Ohta y Colón.  If that child wants to spell their surname 
correctly, they can't use a 
just-European or just-Japanese character set; they probably need Unicode.

/=\
|John Stracke|Principal Engineer  |
|[EMAIL PROTECTED]   |Incentive Systems, Inc. |
|http://www.incentivesystems.com |My opinions are my own. |
|=|
|I imagine the wages of sin *are* death, but by the time they take|
|taxes out it's just sort of a tired feeling. --Paula Poundstone  |
\=/




Re: I don't want to be facing 8-bit bugs in 2013

2002-03-20 Thread Paul Robinson

On Mar 20, D. J. Bernstein [EMAIL PROTECTED] wrote:

 False. IDNA does _not_ work. IDNA causes interoperability failures. Mail

... with the current DNS resolvers in place...

OK, others have pointed out failures with things like SSL/HTTPS (which is 
broken in several interesting way anyway from the point of view of 
scalability and long-term usefullness), but even then I can see work 
arounds. All of them are more preferable to throwing away every serial 
console in the world because a greek delta doesn't display properly on it 
and therefore has a 'display failure'.

 That assumption is false. Consider, for example, an MTA configured to
 accept mail for pi.cr.yp.to, with a Greek pi. The MTA compares the
 incoming domain name to pi.cr.yp.to. That doesn't involve the resolver.

Well, funnily enough I have to tell my MTA about .co.uk domains AND .com 
domains as well, even though the bit in front of them is IDENTICAL. If I 
register the .net I have to tell it about that as well. They're different 
domains. If I buy an IDN with a Greek letter in it, I'm going to have to 
tell my MTA about that. If think that by adding a new domain to your config 
you shouldn't have to tell your MTA about that, you're obviously not a very 
experienced admin.
 
 Now, please explain why the same user should prefer a domain name that's
 _occasionally_ displayed with the desired delta but _usually_ displayed
 as incomprehensible gobbledygook.

Until he fixes his machine to have proper Greek alphabet support. However, 
when it comes over to my PDP-8 it appears as gobbledygook. But it still 
works. Sure, that's REAL broken.
 
 In short, you're looking at the long-term IDNA benefits (never mind the
 interoperability failures and all the other problems) but refusing to
 look at the long-term UTF-8 benefits. Inconsistent once again.

The IETF is about the long-term. It not about the next 2 years. Even I know 
that. You should certainly know that.
 
* interoperability failures;

That don't exist. Or the ones that do can be given work-around far simpler 
than replacing every piece of equipment in th eknown universe and updating 
every piece of software ever written. Ever.

* inconsistent displays of the same name;

That don't matter.

* unnecessary implementation and deployment costs;

The fact you're suggesting that what *I'm* proposing and what the IDNA is 
proposing requires unnecessary implementation and deployment costs, quite 
frankly, makes me snort my coffee up my nose.

* multiple semantically similar names;

OK, let's not allow uppercase Greek Alpha. Does that fix one of your 
problems? Sheesh.

* identical displays of different names; and

Not a problem for the IETF. Not even a problem for users. In Greece A means 
Alpha. In the US it means 'Ay'. If you're a Greek in New York, you might 
have problems. You're going to have problems anyway, but at least with IDNA 
PunyCode is your friend.

* typing failures.

We get those anyway. It's why slahdot.org exists.
 
 False. Every step in http://cr.yp.to/proto/idnc3.html preserves
 interoperability.

Fixing half a dozen pieces of software and then just allowing every mand and 
his dog to register special Bernsteinised-domains? Right. I see.
 
 There are several options. One option is to work around the hardware
 limitations in software, displaying something like
 
   |
/\/ /\ |   /^ /\ /\ /\
\/\ \/ | * \_ \/ | | |

*ROFL*

# cat Mail/inbox | utf8decode | figlet

That's how I like to read my mail in the morning!

Seriosuly man, I'm sat here on a customer's site in the middle of the South 
Atlantic looking forward to an 18 hour RAF flight home to the UK, and I'm 
crying with laughter. You really have cheered me up. That was so 
off-the-wall and unexpected - to propose IDNs to be displayed as ASCII art - 
that I'm going to be smiling and giggling to myself for days. You have 
confirmed what I should have known - you're stark raving insane. Quite, 
quite, quite mad. Fantastic.
 
 Another, much more popular, option is to move your email reading, web
 browsing, etc. from your 1970s-vintage VT100 to a graphics terminal.
 Have you considered the VT340, for example? Or an IBM PC, model 5150?

You're proposing the abondonment of legacy hardware when there is no need. 
That costs money. Serious money. If I have my hostname setup as 
pi.alpha-ol.com and I need to gain access over the serial port at boot with 
a VT100, the option of anything else is not available. 
 
  a good 20% of sites out there will just have to shut down ops permanently
 
 Get a grip, Paul.

I'm not the one proposing we turn the world into one great big ASCII art 
factory. 

Anyway, I'm going to be away from mail for a few days due to travelling 
around - don't think I'm giving up on you yet though. :-)

Seriously... ascii art... hehehe...

-- 
Paul Robinson




Re: I don't want to be facing 8-bit bugs in 2013

2002-03-20 Thread Masataka Ohta

Harald;

 [EMAIL PROTECTED] wrote:
 
  Unicode is usable in some local context.
 
 Agreed. Note that some is changing to many as time goes on.

Irreleant, because some contexts are not compatible.

The point here is that there can be no universal context.

  There is some unicode implementaion work in local contexts.
 
  However, the context information must be supplied out of band.
 
 Agreed.

Then, how can you provide the information with IDN?

  And, the out of band information is equivalent to charset
  information, regardless of whether you call it charset or
  not.
 
 Do not agree.
 for most values of what we currently have registered as charset,  it is 
 not sufficient to identify the context.

That is a problem of current and past charset reviewers including you.

 Therefore, depending on charset to identify context is not only useless, 
 but actively harmful.

Agreed. ISO 2022 escape sequence is the way to go.

 My opinion, which I stated in RFC 1766, and have found no reason to change.

It was already denied by real world examples.

Masataka Ohta




Re: I don't want to be facing 8-bit bugs in 2013

2002-03-20 Thread Masataka Ohta

Erkki I. Kolehmainen;

 The use of local character sets (encoding) is doomed for particularly ww
 information interchange.

Interestingly enough, ww information interchange is working very
well with local character sets.

The reason is because only people sharing a language, a scripting
system and a character encoding system join each exchange, regardless
of whether it is ww or intranational.

For example, ww IETF communication is with English, Latin script and
ASCII. Introduction of ISO-8859-1 or Unicode does not make IETF use
Finnish.

Your attempt to put ISO-8859-1 characters is not acceptable for me
and your mail is filtered to be pure ASCII by my mailer, which is
fair because many of us have no way to input non-ASCII ISO-8859-1
characters.

Masataka Oha




Re: I don't want to be facing 8-bit bugs in 2013

2002-03-20 Thread James Seng

While the discussion of the use of various character set is interesting
topic, one which is also of interest to IDN WG, such prolonged discussion
are better carried out in a forum which is dedicated to this, such as
[EMAIL PROTECTED], a list which is formed to talk about the
generic problem of I18N and L10N in IETF, and not IDN.

Please bring it over to the other list and when/if there is a conclusion,
please keep the IDN informed.

Thanks.

-James Seng

- Original Message -
From: Masataka Ohta [EMAIL PROTECTED]
To: Erkki Kolehmainen [EMAIL PROTECTED]
Cc: D. J. Bernstein [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, March 21, 2002 7:44 AM
Subject: Re: I don't want to be facing 8-bit bugs in 2013


 Erkki I. Kolehmainen;

  The use of local character sets (encoding) is doomed for particularly ww
  information interchange.

 Interestingly enough, ww information interchange is working very
 well with local character sets.

 The reason is because only people sharing a language, a scripting
 system and a character encoding system join each exchange, regardless
 of whether it is ww or intranational.

 For example, ww IETF communication is with English, Latin script and
 ASCII. Introduction of ISO-8859-1 or Unicode does not make IETF use
 Finnish.

 Your attempt to put ISO-8859-1 characters is not acceptable for me
 and your mail is filtered to be pure ASCII by my mailer, which is
 fair because many of us have no way to input non-ASCII ISO-8859-1
 characters.

 Masataka Oha





Re: I don't want to be facing 8-bit bugs in 2013

2002-03-19 Thread Masataka Ohta

D. J. Bernstein;

 Paul Robinson writes:
  You tell him that although it's gobbledygook to people without greek
  alphabet support, it will still work. It's not convenient, but it WILL
  work. Guaranteed.
 
 False. IDNA does _not_ work. IDNA causes interoperability failures.

IDNA does _not_ work, because Unicode does not work in International
context.

 People who say that IDN is purely a DNS issue are confused.

It's purely a cultural issue.

 In fact, the cost of fixing UTF-8 displays is much _smaller_ than the
 cost of fixing IDNA displays. UTF-8 has been around for many years, has
 built up incredible momentum (as illustrated by RFC 2277), and already
 works in a huge number of programs.

In international context, it is technically impossible to properly
display Unicode characters.

There is no implementation exist.

While some implementations work in some localized context, local
character set serves better for the context.

Masataka Ohta




Re: I don't want to be facing 8-bit bugs in 2013

2002-03-19 Thread Robert Elz

Date:Wed, 20 Mar 2002 14:32:41 +0859 ()
From:Masataka Ohta [EMAIL PROTECTED]
Message-ID:  [EMAIL PROTECTED]

  | IDNA does _not_ work, because Unicode does not work in International
  | context.

This argument is bogus, and always has been.   If (and where) unicode
is defective, the right thing to do is to fix unicode.

That is, it isn't the principle of a single encoding of all characters
that anyone is objecting to here, it is that some specific characters
have been implemented incorrectly (merged with others) as I understand it.

I'm not competent to decide how important this problem is, and this is
not the forum to debate it anyway (so please don't reply just to tell me
how significant the problem is, nor why).   Do that with whoever maintains
unicode.

If you can't get enough of the unicode experts to agree that there's a
problem that needs fixing, then by definition, there isn't.  That's just
the same way as the IETF works (whether the unicode group actually work
this way or not - if not, they should...) - that a few people believe
something is broken is irrelevant if they can't demonstrate that well
enough to sway others to agree with them.

So, stop arguing against unicode (10646) - just fix any problems it has.

kre