Marco Cimarosti scripsit:
> > [...] Definition: A character is an atomic unit of text as
> > specified by ISO/IEC 10646 [ISO/IEC 10646] [...]
>
> I should not try to interpret XML specs. 'Anyway, my understanding is that
> the XML legislators are simply saying that they adopt Unicode definition
MAIL PROTECTED]>, <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>
Subject: Re: japanese xml
Date: Mon, 3 Sep 2001 22:29:51 -0700
Organization: Trigeminal Softw
From: "KUSANO Takayuki" <[EMAIL PROTECTED]>
> > This is only a problem for people who do not want to use Unicode.
>
> But, most people can't live without 'legacy' encodings, because
> there are many documents, data in 'legacy' encodings and there are
> stille many applications/terminals that ca
At Mon, 3 Sep 2001 22:29:51 -0700,
Michael (michka) Kaplan wrote:
>
> This is only a problem for people who do not want to use Unicode.
But, most people can't live without 'legacy' encodings, because
there are many documents, data in 'legacy' encodings and there are
stille many applications/te
On Mon, Sep 03, 2001 at 10:59:26PM -0700, Michael (michka) Kaplan wrote:
> Actually, I would be (happens now with CP-1252 vs. ISO-8858-1).
Where? What characters? I glanced at a local copy of the Unicode charts
for them, and both were the identity function for characters in ASCII.
I'm not talkin
From: "David Starner" <[EMAIL PROTECTED]>
> Frankly, the attitude of "Forget all the stuff that you have working;
> just throw it all away and move to Unicode" is not one that wins many
> converts. Backward compatibility and the ability to interface with
> other systems running different stuff is
On Mon, Sep 03, 2001 at 10:29:51PM -0700, Michael (michka) Kaplan wrote:
> This is only a problem for people who do not want to use Unicode.
No! It's a problem with anyone who has to interoperate with people using
non-Unicode systems or needs to use legacy data. Would you be that dismissive
abou
From: "David Starner" <[EMAIL PROTECTED]>
> > This is only a problem for people who do not want to use Unicode.
>
> No! It's a problem with anyone who has to interoperate with people using
> non-Unicode systems or needs to use legacy data. Would you be that
dismissive
> about it if each ISO-8859-
ED]>; <[EMAIL PROTECTED]>
Sent: Monday, September 03, 2001 9:54 PM
Subject: Re: japanese xml
> On Mon, Sep 03, 2001 at 11:31:31PM -0400, [EMAIL PROTECTED] wrote:
> > If there are two or more different mappings between Unicode/10646 and
some
> > other encoding -- say, JIS X0
On Mon, Sep 03, 2001 at 11:31:31PM -0400, [EMAIL PROTECTED] wrote:
> If there are two or more different mappings between Unicode/10646 and some
> other encoding -- say, JIS X0208 -- then different XML processors certainly
> may emit different outputs. That is not XML's fault, and it is not Unic
In a message dated 2001-09-03 18:02:09 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> [XML], however, provides little information on existing CESs already
> in use for the interchange of Japanese characters. Such CESs are
> allowed as mere options among many others. Furthermore, [XML]
of characters. They are not
> that they mandate one of Unicode forms as the only encoding for a XML source
> file.
Hi Marco,
this is just a followup to the thread; I thought people might be
interested.
This email refers to a description in
htt
Misha,
> case of Japanese) may cover all the characters you require, in which
> Additionally, if you are thinking of XML (or
> HTML) then you can encode *all* Unicode characters in an EUC-encoded
> document, by employing numeric character references for characters
> outside the EUC character repe
On 31/08/2001 17:16:23 Marco Cimarosti wrote:
[...]
> (Misha, I hope I finally succeeded figuring out what you were meaning!)
>
> Ciao.
> _ Marco
I agree 100% :-)
Regarding Viranga's question about inventing one's own encoding (in the
sense of Internet "charset"), anyone is free to design an e
Viranga Ratnaike wrote:
>[...] And apologies for my previously vague questions.
As you have seen, both Misha and I thought that your question was very
clear, nevertheless we understood two totally different things.
Moreover, even after all the attempts of explanations, we both are still
convince
Does '-jp' (or "euc-jp" collectively) imply JIS ?
If so, does this violate section 2.2 from the XML 1.0 standard?
Can you have a document that simultaneously satisfies Unicode and
JIS? Or (as is more likely : ) is my understanding
Hello David,
What you say is true, but it affects only a very small set of
codepoints, mainly symbols. For more documentation, I recommend
to read http://www.w3.org/TR/japanese-xml/.
Regards, Martin.
At 13:13 01/08/30 -0500, David Starner wrote:
>On Thu, Aug 30, 2001 at 09:51:24AM -0
At 10:39 01/08/30 +0100, [EMAIL PROTECTED] wrote:
>Additionally, if you are thinking of XML (or
>HTML) then you can encode *all* Unicode characters in an EUC-encoded
>document, by employing numeric character references for characters
>outside the EUC character repertoire. Using the same techniqu
On 08/30/2001 12:14:49 PM Marco Cimarosti wrote:
>Yes, yes. XML documents can represent characters in at least two
ways:
>2) By representing them with numeric references in the form "Ӓ"
etc.
>The numeric references themselves are sequences of characters ("&" +
"#" +
>one or more of "0".."9
Marco:
>> Furthermore, Viranga's context appears to be XML, in which
>> case it *is* possible to encode *all* Unicode code points
>> using EUC (or ISO-8859-1 or ASCII or ...)
>
>Yes, yes. XML documents can represent characters in at least two ways:
>2) By representing them with numeric refere
David,
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of David Starner
> Sent: Thursday, August 30, 2001 11:13 AM
> To: [EMAIL PROTECTED]
> Subject: Re: japanese xml
>
>
> On Thu, Aug 30, 2001 at 09:51:24AM -0700, Addison P
On Wed, 29 Aug 2001, Marco Cimarosti wrote:
> "euc-jp" means the Japanese character set (JIS) serialized in EUC ("Extended
> Unix Code").
I'm afraid this is slightly misleading because EUC-JP encodes NOT
a *single* coded character set BUT *three* coded character sets,
US-ASCII/JIS X 201, JIS X
On Thu, Aug 30, 2001 at 09:51:24AM -0700, Addison Phillips [wM] wrote:
> And it is worth mentioning, becuase, in fact,
> EUC-JP (and many other encodings) are perfectly interoperablefor the
> subset of characters that they represent.
One of the big complaints I hear in trying to Unicodize Li
s outside EUC-JP
> represented as NCRs", and our parser handles that quite well...
>
> Addison
>
> -Original Message-
> From: Ayers, Mike [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 30, 2001 10:00 AM
> To: 'Addison Phillips [wM]'
> Cc: [EMA
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 30, 2001 10:42 AM
> Interesting. My original reply is pasted in below. Please
> tell me how you managed to arrive at your interpretation.
As I mentioned already, I misread your original reply, partially
bec
On 30/08/2001 18:00:22 Mike Ayers wrote:
[...]
>Misha was not talking about EUC-JP, rather EUC-unicode (or some name
> like that), which encodes unicode scalar values using the EUC method, and
> uses character references for those values (most of them) that are outside
> of the EUC encoding
On 30/08/2001 18:16:45 "Ayers, Mike" wrote:
[...]
>Ah, yes - rereading carefully I see that you are not proposing what
> I thought you were proposing. You are also not answering the OP's question,
> which was:
>
>
>
>Is it ok for Unicode code points to be encoded/serialized using EUC?
coding. That's "EUC-JP with characters outside EUC-JP
represented as NCRs", and our parser handles that quite well...
Addison
-Original Message-
From: Ayers, Mike [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 30, 2001 10:00 AM
To: 'Addison Phillips [wM]'
Cc:
use for an XML
> file would be a Unicode encoding such as UTF-8 or UTF-16.
> 4. However, you can use any other encoding, provided you tag the file
> appropriately (so that the parser knows what the encoding is and can
> translate it to its internal representation).
> 5 You are not requ
Misha Wolf wrote:
> You seem to be implying that Viranga's question was:
> "Can one encode all Unicode code points using EUC?"
>
> That is a strange interpretation of:
> "Is it ok for Unicode code points to be encoded/serialized
> using EUC?"
In fact, that is exactly my interpretation. I think t
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 30, 2001 10:07 AM
>
> On 30/08/2001 17:53:06 "Ayers, Mike" wrote:
> > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> > > Sent: Thursday, August 30, 2001 08:36 AM
> >
> > > Furthermore, Viranga's context appear
On 30/08/2001 17:53:06 "Ayers, Mike" wrote:
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, August 30, 2001 08:36 AM
>
> > Furthermore, Viranga's context appears to be XML, in which
> > case it *is* possible to encode *all* Unicode code points
> > using EUC (or ISO-8859
Marco
>
> > Is the conversion from euc-jp to utf-8/utf-16 simple; are there
> > algorithms and/or converters, out there, that I can access?
>
> Such a conversion requires three steps:
>
> 1) decode EUC byte sequences into JIS code points (i.e. get one
> integer for
> each character);
> 2)
representation).
Slight but relevant correction: you can use any encoding of which
the parser is aware.
> 5 You are not required to use EUC-JP for your Japanese XML
> files: you can
> use the Unicode encodings directly. In some cases, though, your file
> editting software may
ded you tag the file
appropriately (so that the parser knows what the encoding is and can
translate it to its internal representation).
5 You are not required to use EUC-JP for your Japanese XML files: you can
use the Unicode encodings directly. In some cases, though, your file
editting software may
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 30, 2001 08:36 AM
> Furthermore, Viranga's context appears to be XML, in which
> case it *is* possible to encode *all* Unicode code points
> using EUC (or ISO-8859-1 or ASCII or ...)
I ask again - where's the
I have no idea of what you're talking about.
Misha
On 30/08/2001 16:11:14 "Ayers, Mike" wrote:
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, August 30, 2001 06:06 AM
>
> > IMO, I correctly replied to Viranga's question and I've
> > no idea what you're talking about
You seem to be implying that Viranga's question was:
"Can one encode all Unicode code points using EUC?"
That is a strange interpretation of:
"Is it ok for Unicode code points to be encoded/serialized
using EUC?"
Furthermore, Viranga's context appears to be XML, in which
case it *is* possible t
I have no idea what kind of stunt you're trying to pull.
/|/|ike
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 30, 2001 08:37 AM
>
> I have no idea of what you're talking about.
>
> Misha
>
>
> On 30/08/2001 16:11:14 "Ayers, Mike" wrote:
> > > From:
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, August 30, 2001 06:06 AM
> IMO, I correctly replied to Viranga's question and I've
> no idea what you're talking about below.
Let me try to put it another way. What you said may have been
technically correct, but i
Misha Wolf wrote:
> IMO, I correctly replied to Viranga's question and I've
> no idea what you're talking about below.
Viranga's short question was: "Is it ok for Unicode code points to be
encoded/serialized using EUC?"
My (Marco's) short answer was: "EUC size simply doesn't fit Unicode."
Your
IMO, I correctly replied to Viranga's question and I've
no idea what you're talking about below.
Misha
On 30/08/2001 13:46:57 Marco Cimarosti wrote:
> Misha Wolf wrote:
> > On 30/08/2001 09:16:21 Marco Cimarosti wrote:
> > > Viranga Ratnaike wrote:
> > > > Is it ok for Unicode code points to b
Misha Wolf wrote:
> On 30/08/2001 09:16:21 Marco Cimarosti wrote:
> > Viranga Ratnaike wrote:
> > > Is it ok for Unicode code points to be
> > > encoded/serialized using EUC?
> [...]
> >
> > EUC size simply doesn't fit Unicode.
> >
> [...]
> That is, IMO, quite a misleading reply. It would be mo
On 30/08/2001 09:16:21 Marco Cimarosti wrote:
> Viranga Ratnaike wrote:
> > Is it ok for Unicode code points to be
> > encoded/serialized using EUC?
> > I'm not planning on doing this; just wondering what (?if any?)
> > restrictions, there are on choice of transformation format.
>
> EUC size simp
Viranga Ratnaike wrote:
> Is it ok for Unicode code points to be
> encoded/serialized using EUC?
> I'm not planning on doing this; just wondering what (?if any?)
> restrictions, there are on choice of transformation format.
EUC size simply doesn't fit Unicode.
Each EUC-encoded character is eith
ought I'd ask.
Is there much interest, for Unicode, in Japan? Most documents,
I find, use JIS.
Regards,
Viranga
On Wed, Aug 29, 2001 at 11:21:43AM +0200, Marco Cimarosti wrote:
> Viranga Ratnaike wrote:
> >I was hunting for examples of japanese xml
At Wed, 29 Aug 2001 18:13:41 +1000,
Viranga Ratnaike <[EMAIL PROTECTED]> wrote:
> I was hunting for examples of japanese xml and came across the
> following, which looks rather cool. Except that it doesn't seem
> to actually be unicode. I thought XML
ents can be in other encodings.
Regards, Martin.
At 18:13 01/08/29 +1000, Viranga Ratnaike wrote:
>Hi,
>
> I was hunting for examples of japanese xml and came across the
> following, which looks rather cool. Except that it doesn't seem
> to actually
Viranga Ratnaike wrote:
>I was hunting for examples of japanese xml and came across the
>following, which looks rather cool. Except that it doesn't seem
>to actually be unicode. I thought XML had mandated unicode?
> http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-j
Hi,
I was hunting for examples of japanese xml and came across the
following, which looks rather cool. Except that it doesn't seem
to actually be unicode. I thought XML had mandated unicode?
http://java.sun.com/xml/jaxp-1.1/examples/samples/weekly-euc-j
50 matches
Mail list logo