Re: japanese xml

2001-09-04 Thread John Cowan
Marco Cimarosti scripsit: > > [...] Definition: A character is an atomic unit of text as > > specified by ISO/IEC 10646 [ISO/IEC 10646] [...] > > I should not try to interpret XML specs. 'Anyway, my understanding is that > the XML legislators are simply saying that they adopt Unicode definition

RE: japanese xml

2001-09-04 Thread Marco Cimarosti
MAIL PROTECTED]>, <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Subject: Re: japanese xml Date: Mon, 3 Sep 2001 22:29:51 -0700 Organization: Trigeminal Softw

Re: japanese xml

2001-09-04 Thread Michael \(michka\) Kaplan
From: "KUSANO Takayuki" <[EMAIL PROTECTED]> > > This is only a problem for people who do not want to use Unicode. > > But, most people can't live without 'legacy' encodings, because > there are many documents, data in 'legacy' encodings and there are > stille many applications/terminals that ca

Re: japanese xml

2001-09-04 Thread KUSANO Takayuki
At Mon, 3 Sep 2001 22:29:51 -0700, Michael (michka) Kaplan wrote: > > This is only a problem for people who do not want to use Unicode. But, most people can't live without 'legacy' encodings, because there are many documents, data in 'legacy' encodings and there are stille many applications/te

Re: japanese xml

2001-09-03 Thread David Starner
On Mon, Sep 03, 2001 at 10:59:26PM -0700, Michael (michka) Kaplan wrote: > Actually, I would be (happens now with CP-1252 vs. ISO-8858-1). Where? What characters? I glanced at a local copy of the Unicode charts for them, and both were the identity function for characters in ASCII. I'm not talkin

Re: japanese xml

2001-09-03 Thread Michael \(michka\) Kaplan
From: "David Starner" <[EMAIL PROTECTED]> > Frankly, the attitude of "Forget all the stuff that you have working; > just throw it all away and move to Unicode" is not one that wins many > converts. Backward compatibility and the ability to interface with > other systems running different stuff is

Re: japanese xml

2001-09-03 Thread David Starner
On Mon, Sep 03, 2001 at 10:29:51PM -0700, Michael (michka) Kaplan wrote: > This is only a problem for people who do not want to use Unicode. No! It's a problem with anyone who has to interoperate with people using non-Unicode systems or needs to use legacy data. Would you be that dismissive abou

Re: japanese xml

2001-09-03 Thread Michael \(michka\) Kaplan
From: "David Starner" <[EMAIL PROTECTED]> > > This is only a problem for people who do not want to use Unicode. > > No! It's a problem with anyone who has to interoperate with people using > non-Unicode systems or needs to use legacy data. Would you be that dismissive > about it if each ISO-8859-

Re: japanese xml

2001-09-03 Thread Michael \(michka\) Kaplan
ED]>; <[EMAIL PROTECTED]> Sent: Monday, September 03, 2001 9:54 PM Subject: Re: japanese xml > On Mon, Sep 03, 2001 at 11:31:31PM -0400, [EMAIL PROTECTED] wrote: > > If there are two or more different mappings between Unicode/10646 and some > > other encoding -- say, JIS X0

Re: japanese xml

2001-09-03 Thread David Starner
On Mon, Sep 03, 2001 at 11:31:31PM -0400, [EMAIL PROTECTED] wrote: > If there are two or more different mappings between Unicode/10646 and some > other encoding -- say, JIS X0208 -- then different XML processors certainly > may emit different outputs. That is not XML's fault, and it is not Unic

Re: japanese xml

2001-09-03 Thread DougEwell2
In a message dated 2001-09-03 18:02:09 Pacific Daylight Time, [EMAIL PROTECTED] writes: > [XML], however, provides little information on existing CESs already > in use for the interchange of Japanese characters. Such CESs are > allowed as mere options among many others. Furthermore, [XML]

Re: japanese xml

2001-09-03 Thread 'Viranga Ratnaike'
of characters. They are not > that they mandate one of Unicode forms as the only encoding for a XML source > file. Hi Marco, this is just a followup to the thread; I thought people might be interested. This email refers to a description in htt

ICU conversion of codepage data (Was: japanese xml

2001-09-01 Thread Carl W. Brown
Misha, > case of Japanese) may cover all the characters you require, in which > Additionally, if you are thinking of XML (or > HTML) then you can encode *all* Unicode characters in an EUC-encoded > document, by employing numeric character references for characters > outside the EUC character repe

RE: japanese xml

2001-08-31 Thread Misha . Wolf
On 31/08/2001 17:16:23 Marco Cimarosti wrote: [...] > (Misha, I hope I finally succeeded figuring out what you were meaning!) > > Ciao. > _ Marco I agree 100% :-) Regarding Viranga's question about inventing one's own encoding (in the sense of Internet "charset"), anyone is free to design an e

RE: japanese xml

2001-08-31 Thread Marco Cimarosti
Viranga Ratnaike wrote: >[...] And apologies for my previously vague questions. As you have seen, both Misha and I thought that your question was very clear, nevertheless we understood two totally different things. Moreover, even after all the attempts of explanations, we both are still convince

Re: japanese xml

2001-08-31 Thread Viranga Ratnaike
Does '-jp' (or "euc-jp" collectively) imply JIS ? If so, does this violate section 2.2 from the XML 1.0 standard? Can you have a document that simultaneously satisfies Unicode and JIS? Or (as is more likely : ) is my understanding

Re: japanese xml

2001-08-30 Thread Martin Duerst
Hello David, What you say is true, but it affects only a very small set of codepoints, mainly symbols. For more documentation, I recommend to read http://www.w3.org/TR/japanese-xml/. Regards, Martin. At 13:13 01/08/30 -0500, David Starner wrote: >On Thu, Aug 30, 2001 at 09:51:24AM -0

RE: japanese xml

2001-08-30 Thread Martin Duerst
At 10:39 01/08/30 +0100, [EMAIL PROTECTED] wrote: >Additionally, if you are thinking of XML (or >HTML) then you can encode *all* Unicode characters in an EUC-encoded >document, by employing numeric character references for characters >outside the EUC character repertoire. Using the same techniqu

RE: japanese xml

2001-08-30 Thread Peter_Constable
On 08/30/2001 12:14:49 PM Marco Cimarosti wrote: >Yes, yes. XML documents can represent characters in at least two ways: >2) By representing them with numeric references in the form "Ӓ" etc. >The numeric references themselves are sequences of characters ("&" + "#" + >one or more of "0".."9

RE: japanese xml

2001-08-30 Thread Peter_Constable
Marco: >> Furthermore, Viranga's context appears to be XML, in which >> case it *is* possible to encode *all* Unicode code points >> using EUC (or ISO-8859-1 or ASCII or ...) > >Yes, yes. XML documents can represent characters in at least two ways: >2) By representing them with numeric refere

RE: japanese xml

2001-08-30 Thread Carl W. Brown
David, > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of David Starner > Sent: Thursday, August 30, 2001 11:13 AM > To: [EMAIL PROTECTED] > Subject: Re: japanese xml > > > On Thu, Aug 30, 2001 at 09:51:24AM -0700, Addison P

RE: japanese xml

2001-08-30 Thread Jungshik Shin
On Wed, 29 Aug 2001, Marco Cimarosti wrote: > "euc-jp" means the Japanese character set (JIS) serialized in EUC ("Extended > Unix Code"). I'm afraid this is slightly misleading because EUC-JP encodes NOT a *single* coded character set BUT *three* coded character sets, US-ASCII/JIS X 201, JIS X

Re: japanese xml

2001-08-30 Thread David Starner
On Thu, Aug 30, 2001 at 09:51:24AM -0700, Addison Phillips [wM] wrote: > And it is worth mentioning, becuase, in fact, > EUC-JP (and many other encodings) are perfectly interoperablefor the > subset of characters that they represent. One of the big complaints I hear in trying to Unicodize Li

RE: japanese xml

2001-08-30 Thread Misha . Wolf
s outside EUC-JP > represented as NCRs", and our parser handles that quite well... > > Addison > > -Original Message- > From: Ayers, Mike [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 30, 2001 10:00 AM > To: 'Addison Phillips [wM]' > Cc: [EMA

RE: japanese xml

2001-08-30 Thread Ayers, Mike
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 30, 2001 10:42 AM > Interesting. My original reply is pasted in below. Please > tell me how you managed to arrive at your interpretation. As I mentioned already, I misread your original reply, partially bec

RE: japanese xml

2001-08-30 Thread Misha . Wolf
On 30/08/2001 18:00:22 Mike Ayers wrote: [...] >Misha was not talking about EUC-JP, rather EUC-unicode (or some name > like that), which encodes unicode scalar values using the EUC method, and > uses character references for those values (most of them) that are outside > of the EUC encoding

RE: japanese xml

2001-08-30 Thread Misha . Wolf
On 30/08/2001 18:16:45 "Ayers, Mike" wrote: [...] >Ah, yes - rereading carefully I see that you are not proposing what > I thought you were proposing. You are also not answering the OP's question, > which was: > > > >Is it ok for Unicode code points to be encoded/serialized using EUC?

RE: japanese xml

2001-08-30 Thread Addison Phillips [wM]
coding. That's "EUC-JP with characters outside EUC-JP represented as NCRs", and our parser handles that quite well... Addison -Original Message- From: Ayers, Mike [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 30, 2001 10:00 AM To: 'Addison Phillips [wM]' Cc:

RE: japanese xml

2001-08-30 Thread Misha . Wolf
use for an XML > file would be a Unicode encoding such as UTF-8 or UTF-16. > 4. However, you can use any other encoding, provided you tag the file > appropriately (so that the parser knows what the encoding is and can > translate it to its internal representation). > 5 You are not requ

RE: japanese xml

2001-08-30 Thread Marco Cimarosti
Misha Wolf wrote: > You seem to be implying that Viranga's question was: > "Can one encode all Unicode code points using EUC?" > > That is a strange interpretation of: > "Is it ok for Unicode code points to be encoded/serialized > using EUC?" In fact, that is exactly my interpretation. I think t

RE: japanese xml

2001-08-30 Thread Ayers, Mike
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 30, 2001 10:07 AM > > On 30/08/2001 17:53:06 "Ayers, Mike" wrote: > > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > > Sent: Thursday, August 30, 2001 08:36 AM > > > > > Furthermore, Viranga's context appear

RE: japanese xml

2001-08-30 Thread Misha . Wolf
On 30/08/2001 17:53:06 "Ayers, Mike" wrote: > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, August 30, 2001 08:36 AM > > > Furthermore, Viranga's context appears to be XML, in which > > case it *is* possible to encode *all* Unicode code points > > using EUC (or ISO-8859

RE: japanese xml

2001-08-30 Thread Carl W. Brown
Marco > > > Is the conversion from euc-jp to utf-8/utf-16 simple; are there > > algorithms and/or converters, out there, that I can access? > > Such a conversion requires three steps: > > 1) decode EUC byte sequences into JIS code points (i.e. get one > integer for > each character); > 2)

RE: japanese xml

2001-08-30 Thread Ayers, Mike
representation). Slight but relevant correction: you can use any encoding of which the parser is aware. > 5 You are not required to use EUC-JP for your Japanese XML > files: you can > use the Unicode encodings directly. In some cases, though, your file > editting software may

RE: japanese xml

2001-08-30 Thread Addison Phillips [wM]
ded you tag the file appropriately (so that the parser knows what the encoding is and can translate it to its internal representation). 5 You are not required to use EUC-JP for your Japanese XML files: you can use the Unicode encodings directly. In some cases, though, your file editting software may

RE: japanese xml

2001-08-30 Thread Ayers, Mike
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 30, 2001 08:36 AM > Furthermore, Viranga's context appears to be XML, in which > case it *is* possible to encode *all* Unicode code points > using EUC (or ISO-8859-1 or ASCII or ...) I ask again - where's the

RE: japanese xml

2001-08-30 Thread Misha . Wolf
I have no idea of what you're talking about. Misha On 30/08/2001 16:11:14 "Ayers, Mike" wrote: > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, August 30, 2001 06:06 AM > > > IMO, I correctly replied to Viranga's question and I've > > no idea what you're talking about

RE: japanese xml

2001-08-30 Thread Misha . Wolf
You seem to be implying that Viranga's question was: "Can one encode all Unicode code points using EUC?" That is a strange interpretation of: "Is it ok for Unicode code points to be encoded/serialized using EUC?" Furthermore, Viranga's context appears to be XML, in which case it *is* possible t

RE: japanese xml

2001-08-30 Thread Ayers, Mike
I have no idea what kind of stunt you're trying to pull. /|/|ike > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 30, 2001 08:37 AM > > I have no idea of what you're talking about. > > Misha > > > On 30/08/2001 16:11:14 "Ayers, Mike" wrote: > > > From:

RE: japanese xml

2001-08-30 Thread Ayers, Mike
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 30, 2001 06:06 AM > IMO, I correctly replied to Viranga's question and I've > no idea what you're talking about below. Let me try to put it another way. What you said may have been technically correct, but i

RE: japanese xml

2001-08-30 Thread Marco Cimarosti
Misha Wolf wrote: > IMO, I correctly replied to Viranga's question and I've > no idea what you're talking about below. Viranga's short question was: "Is it ok for Unicode code points to be encoded/serialized using EUC?" My (Marco's) short answer was: "EUC size simply doesn't fit Unicode." Your

RE: japanese xml

2001-08-30 Thread Misha . Wolf
IMO, I correctly replied to Viranga's question and I've no idea what you're talking about below. Misha On 30/08/2001 13:46:57 Marco Cimarosti wrote: > Misha Wolf wrote: > > On 30/08/2001 09:16:21 Marco Cimarosti wrote: > > > Viranga Ratnaike wrote: > > > > Is it ok for Unicode code points to b

RE: japanese xml

2001-08-30 Thread Marco Cimarosti
Misha Wolf wrote: > On 30/08/2001 09:16:21 Marco Cimarosti wrote: > > Viranga Ratnaike wrote: > > > Is it ok for Unicode code points to be > > > encoded/serialized using EUC? > [...] > > > > EUC size simply doesn't fit Unicode. > > > [...] > That is, IMO, quite a misleading reply. It would be mo

RE: japanese xml

2001-08-30 Thread Misha . Wolf
On 30/08/2001 09:16:21 Marco Cimarosti wrote: > Viranga Ratnaike wrote: > > Is it ok for Unicode code points to be > > encoded/serialized using EUC? > > I'm not planning on doing this; just wondering what (?if any?) > > restrictions, there are on choice of transformation format. > > EUC size simp

RE: japanese xml

2001-08-30 Thread Marco Cimarosti
Viranga Ratnaike wrote: > Is it ok for Unicode code points to be > encoded/serialized using EUC? > I'm not planning on doing this; just wondering what (?if any?) > restrictions, there are on choice of transformation format. EUC size simply doesn't fit Unicode. Each EUC-encoded character is eith

Re: japanese xml

2001-08-29 Thread 'Viranga Ratnaike'
ought I'd ask. Is there much interest, for Unicode, in Japan? Most documents, I find, use JIS. Regards, Viranga On Wed, Aug 29, 2001 at 11:21:43AM +0200, Marco Cimarosti wrote: > Viranga Ratnaike wrote: > >I was hunting for examples of japanese xml

Re: japanese xml

2001-08-29 Thread Shigemichi Yazawa
At Wed, 29 Aug 2001 18:13:41 +1000, Viranga Ratnaike <[EMAIL PROTECTED]> wrote: > I was hunting for examples of japanese xml and came across the > following, which looks rather cool. Except that it doesn't seem > to actually be unicode. I thought XML

Re: japanese xml

2001-08-29 Thread Martin Duerst
ents can be in other encodings. Regards, Martin. At 18:13 01/08/29 +1000, Viranga Ratnaike wrote: >Hi, > > I was hunting for examples of japanese xml and came across the > following, which looks rather cool. Except that it doesn't seem > to actually

RE: japanese xml

2001-08-29 Thread Marco Cimarosti
Viranga Ratnaike wrote: >I was hunting for examples of japanese xml and came across the >following, which looks rather cool. Except that it doesn't seem >to actually be unicode. I thought XML had mandated unicode? > http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-j

japanese xml

2001-08-29 Thread Viranga Ratnaike
Hi, I was hunting for examples of japanese xml and came across the following, which looks rather cool. Except that it doesn't seem to actually be unicode. I thought XML had mandated unicode? http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-j