At 9:35 PM +0100 6/20/01, [EMAIL PROTECTED] wrote:
| In addition, XML 1.0 attempts to adapt to the line-end conventions of
| various modern operating systems, but discriminates against the
| convention used on IBM and IBM-compatible mainframes. XML 1.0 documents
| generated on mainframes must
This is going out to three mailing lists. I'd like to add a fourth
and suggest that future discussion take place on xml-dev, which
probably has the broadest reach of interested parties.
Starting in Unicode 3.0 a number of new characters have been added both
for new scripts that were previously
On 21/06/2001 14:37:59 Elliotte Rusty Harold wrote:
This is going out to three mailing lists. I'd like to add a fourth
and suggest that future discussion take place on xml-dev, which
probably has the broadest reach of interested parties.
[...]
The Blueberry requirements [1] are very
At 3:20 PM +0100 6/21/01, [EMAIL PROTECTED] wrote:
The Blueberry requirements [1] are very thoughtfully written and
do *not* make any of the errors you describe. I suggest a second
reading.
I don't think I said the Blueberry requirements were in error, just
that they're wrong-headed. The
We have a specific requirment of converting Latin -1 character set ( iso
8859-1 ) text to ASCII charactet set ( a set of only 128 characters). Is
there any special set of utilities available or service providers who can do
that type of job.
Look after recode (a GNU package). It performs the
On Thu, 21 Jun 2001 09:40:22 -0400, Elliotte Rusty Harold wrote:
At 9:35 PM +0100 6/20/01, [EMAIL PROTECTED] wrote:
| In addition, XML 1.0 attempts to adapt to the line-end conventions of
| various modern operating systems, but discriminates against the
| convention used on IBM and
Misha Wolf hat written::
In addition, XML 1.0 attempts to adapt to the line-end conventions of
various modern operating systems, but discriminates against the
convention used on IBM and IBM-compatible mainframes. XML 1.0 documents
generated on mainframes must either violate the local line-end
I only have one question. What do blueberries have to do with XML?
Rick
The only reason there's a problem here at all is because IBM
tried to go it alone as a monopoly and set standards by fiat for years
rather than working with the rest of the industry. Consequently their
mainframe character sets don't really interoperate well with everybody
else's character
Abolish all in-process Unicode encodings except UTF-16.
If everyone uses the same encoding form then there is no problem with different string
lengths, results of binary comparisons, etc.
Once we are here, abolish all little-endian UTF-16 implementations. This will save a
lot of byte swapping,
In the way of solutions seeking a problem, I would like to
propose a new UTF: UTF-17.
UTF-17 converts each Unicode code point to a sequence of
1 synchronizing byte followed by 7 further bytes, for a total
of 8 bytes per character. Each code point in the range
0..10 is treated as a 21-bit
At 4:43 PM -0400 6/21/01, John Cowan wrote:
Let me also note that it is only *parsers* that are affected by
this particular change. It does *not* require change at any level
above the parser. U+0085 (and hopefully U+2028 as well), like the
existing CR and LF and CR/LF sequences, would be
John,
I think that for XML they should use U+0085
(NEXT LINE) and U+2028 (LINE SEPARATOR) only since CR LF are subject to
interpretation.
If you use a line feed it should be exactly that. It advances on line but
does not affect you horizontal positioning. It only positions vertically.
This
Nice, but you have the same kind of shortest-form problem as in UTF-8:
38 30 30 30 30 30 30 30 could be mis-interpreted by a lenient decoder as U+.
Ts, ts...
At least it sorts binary in code point order.
markus
Markus,
Thank you for your comment.
Nice, but you have the same kind of shortest-form problem as in UTF-8:
38 30 30 30 30 30 30 30 could be mis-interpreted by a lenient decoder as U+.
Well, actually, that is not technically a shortest-form problem. All
UTF-17 forms are exactly 8 bytes
15 matches
Mail list logo