Re: XML Blueberry Requirements

2001-06-21 Thread Elliotte Rusty Harold
At 9:35 PM +0100 6/20/01, [EMAIL PROTECTED] wrote: | In addition, XML 1.0 attempts to adapt to the line-end conventions of | various modern operating systems, but discriminates against the | convention used on IBM and IBM-compatible mainframes. XML 1.0 documents | generated on mainframes must

Re: XML Blueberry Requirements

2001-06-21 Thread Elliotte Rusty Harold
This is going out to three mailing lists. I'd like to add a fourth and suggest that future discussion take place on xml-dev, which probably has the broadest reach of interested parties. Starting in Unicode 3.0 a number of new characters have been added both for new scripts that were previously

Re: XML Blueberry Requirements

2001-06-21 Thread Misha . Wolf
On 21/06/2001 14:37:59 Elliotte Rusty Harold wrote: This is going out to three mailing lists. I'd like to add a fourth and suggest that future discussion take place on xml-dev, which probably has the broadest reach of interested parties. [...] The Blueberry requirements [1] are very

Re: XML Blueberry Requirements

2001-06-21 Thread Elliotte Rusty Harold
At 3:20 PM +0100 6/21/01, [EMAIL PROTECTED] wrote: The Blueberry requirements [1] are very thoughtfully written and do *not* make any of the errors you describe. I suggest a second reading. I don't think I said the Blueberry requirements were in error, just that they're wrong-headed. The

Re: converting ISO 8859-1 character set text to ASCII (128)charactet set

2001-06-21 Thread Antoine Leca
We have a specific requirment of converting Latin -1 character set ( iso 8859-1 ) text to ASCII charactet set ( a set of only 128 characters). Is there any special set of utilities available or service providers who can do that type of job. Look after recode (a GNU package). It performs the

Re: XML Blueberry Requirements

2001-06-21 Thread From Net Link
On Thu, 21 Jun 2001 09:40:22 -0400, Elliotte Rusty Harold wrote: At 9:35 PM +0100 6/20/01, [EMAIL PROTECTED] wrote: | In addition, XML 1.0 attempts to adapt to the line-end conventions of | various modern operating systems, but discriminates against the | convention used on IBM and

Re: XML Blueberry Requirements

2001-06-21 Thread Otto Stolz
Misha Wolf hat written:: In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the convention used on IBM and IBM-compatible mainframes. XML 1.0 documents generated on mainframes must either violate the local line-end

RE: XML Blueberry Requirements

2001-06-21 Thread Rick McGowan
I only have one question. What do blueberries have to do with XML? Rick

RE: XML Blueberry Requirements

2001-06-21 Thread Carl W. Brown
The only reason there's a problem here at all is because IBM tried to go it alone as a monopoly and set standards by fiat for years rather than working with the rest of the industry. Consequently their mainframe character sets don't really interoperate well with everybody else's character

The perfect solution for the UTF-8/16 discussion

2001-06-21 Thread Markus Scherer
Abolish all in-process Unicode encodings except UTF-16. If everyone uses the same encoding form then there is no problem with different string lengths, results of binary comparisons, etc. Once we are here, abolish all little-endian UTF-16 implementations. This will save a lot of byte swapping,

UTF-17

2001-06-21 Thread Kenneth Whistler
In the way of solutions seeking a problem, I would like to propose a new UTF: UTF-17. UTF-17 converts each Unicode code point to a sequence of 1 synchronizing byte followed by 7 further bytes, for a total of 8 bytes per character. Each code point in the range 0..10 is treated as a 21-bit

Re: XML Blueberry Requirements

2001-06-21 Thread Elliotte Rusty Harold
At 4:43 PM -0400 6/21/01, John Cowan wrote: Let me also note that it is only *parsers* that are affected by this particular change. It does *not* require change at any level above the parser. U+0085 (and hopefully U+2028 as well), like the existing CR and LF and CR/LF sequences, would be

RE: XML Blueberry Requirements

2001-06-21 Thread Carl W. Brown
John, I think that for XML they should use U+0085 (NEXT LINE) and U+2028 (LINE SEPARATOR) only since CR LF are subject to interpretation. If you use a line feed it should be exactly that. It advances on line but does not affect you horizontal positioning. It only positions vertically. This

Re: UTF-17

2001-06-21 Thread Markus Scherer
Nice, but you have the same kind of shortest-form problem as in UTF-8: 38 30 30 30 30 30 30 30 could be mis-interpreted by a lenient decoder as U+. Ts, ts... At least it sorts binary in code point order. markus

Re: UTF-17

2001-06-21 Thread Kenneth Whistler
Markus, Thank you for your comment. Nice, but you have the same kind of shortest-form problem as in UTF-8: 38 30 30 30 30 30 30 30 could be mis-interpreted by a lenient decoder as U+. Well, actually, that is not technically a shortest-form problem. All UTF-17 forms are exactly 8 bytes