Re: [Q2] (Re: The strings design document)

2004-04-30 Thread Jeff Clites
On Apr 28, 2004, at 5:01 AM, Dan Sugalski wrote: At 3:17 AM -0700 4/28/04, Jeff Clites wrote: On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: For example, consider the following: use Unicode; open FOO, foo.txt, :charset(latin-3); open BAR, bar.txt, :charset(big5); $filehandle = 0;

Re: [Q2] (Re: The strings design document)

2004-04-30 Thread Larry Wall
On Fri, Apr 30, 2004 at 08:38:18AM -0700, Jeff Clites wrote: : On Apr 28, 2004, at 5:01 AM, Dan Sugalski wrote: : : At 3:17 AM -0700 4/28/04, Jeff Clites wrote: : On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: : : For example, consider the following: : : use Unicode; : open FOO, foo.txt,

Re: [Q2] (Re: The strings design document)

2004-04-30 Thread Jeff Clites
On Apr 30, 2004, at 9:02 AM, Larry Wall wrote: On Fri, Apr 30, 2004 at 08:38:18AM -0700, Jeff Clites wrote: : On Apr 28, 2004, at 5:01 AM, Dan Sugalski wrote: : : At 3:17 AM -0700 4/28/04, Jeff Clites wrote: : On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: : : For example, consider the

Re: [Q1] (Re: The strings design document)

2004-04-28 Thread Jeff Clites
On Apr 27, 2004, at 10:25 AM, Dan Sugalski wrote: At 9:40 AM -0700 4/27/04, Jeff Clites wrote: On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points

Re: [Q1] (Re: The strings design document)

2004-04-28 Thread Jarkko Hietaniemi
I think you're basically forcing this concept onto national standards which lack it. I don't think that most of the national standards actually define the semantics of the characters they encode (categorizations, case mapping, sort order), and although they assign byte sequences to

[Q1] (Re: The strings design document)

2004-04-27 Thread Jeff Clites
On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points (65 is capital A, 776 is a combining diaresis) as well as a set of categorizations

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Jarkko Hietaniemi
1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö (o-with-diaeresis) is considered a separate letter in Swedish, but is just a accented o

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Dan Sugalski
At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote: 1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö (o-with-diaeresis) is considered a

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Jarkko Hietaniemi
Dan Sugalski wrote: At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote: 1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö

The strings design document

2004-04-23 Thread Dan Sugalski
Is tacked on. Note that we *do* have to support as core languages which don't force unicode universally (perl 5, python, and ruby) *and* we have to support the writing of stream filters in pure parrot, so the goal of 100% pure unadulterated Unicode except at the very edge isn't attainable, no