Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Damian Conway
Larry wrote: But at the moment I'm thinking there's something wrong about any approach that requires a special character on the signature side. I'm starting to think that all the convolving should be specified on the left. So in this: for parallel(x, y, z) - $x, $y, $z { ... } the

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
--- Matthew Zimmerman [EMAIL PROTECTED] wrote: On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote: Matthew Zimmerman wrote in perl.perl6.language : So let me make my original question a little more general: are Perl 6 source files encoded in Latin-1, UTF-8, or

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Larry Wall
, everybody's doing it! First one's free, kid... ;-) People who believe slippery slope arguments should never go skiing. On the other hand, even the useful slippery slopes have beginner slopes. I think one advantage of using Unicode for advanced features is that it *looks* scary. So in general we

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Larry Wall
will be the default. Actually, Unicode will be the default. 8859-1 can probably also be handled without declaration. If you want trigraph support, you'll have to put use encoding 'ugly-american'; at the top of your files. ;-) ;-) ;-) Otherwise, it'll be one-character ?fancyops? all the way. Mmm, I

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
--- [EMAIL PROTECTED], UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR. wrote: Mmm, I view one-character Unicode operators as more of an escape hatch for the future, not as something to be made mandatory. But then, I'm one of those ugly Americans. EBCDIC didn't support brackets, originally, so

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Mark J. Reed
On 2002-11-04 at 12:26:56, Austin Hastings wrote: 1- ? and ? are really useful in my context. Okay. Now can you get your mailer to send them properly? :)

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Me
After all, there's gotta be some advantage to being the Fearless Leader... Larry Thousands will cry for the blood of the Perl 6 design team. As Leader, you can draw their ire. Because you are Fearless, you won't mind... -- ralph

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Damian Conway
Ken Fox wrote: I know I'm just another sample point in a sea of samples, but my embedded symbol parser seems optimized for alphabetic symbols. The cool non-alphabetic Unicode symbols are beautiful to look at, but they don't help me read or write faster. Once again: we're only talking about

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Damian Conway
Garrett Goebel wrote: Can't we have our cake and eat it too? Give ASCII digraph or trigraph alternatives for the incoming tide of Perl6 Unicode? Allow both * and »*«? I'd really prefer we didn't. I'd much rather keep and for other things. Or something similar '*', [*], etc... Much as I

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Me
people on the list who can't be bothered to read the documentation for their own keyboard IO system. Most of this discussion seems to focus on keyboarding. But that's of little consequence. This will always be spotted before it does much harm and will affect just one person and their software

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
--- Me [EMAIL PROTECTED] wrote: people on the list who can't be bothered to read the documentation for their own keyboard IO system. Most of this discussion seems to focus on keyboarding. But that's of little consequence. This will always be spotted before it does much harm and will

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Adam D. Lopresto
of us to use them. So you're one of the very few people who bothered to set up unicode, and now you want to force the rest of us into your own little leet group. Given the choice between learning how to reconfigure their keyboard, editor, terminal, fonts, and everything else, or just not learning

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Paul Johnson
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote: In short: 1- ? and ? are really useful in my context. 2- I can make my work environment generate them in one (modified) keystroke. 3- I can make my home environment do likewise. 4- The ascii-only version isn't faster and

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Simon Cozens
character sets to come and play with the big boys. And eventually, the old trigraphs died out because everyone caught up with the decent (for the era) character sets. That's assuming we have to have Unicode operators. I would, however, like to hear a passionate argument in favour of this, because

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. Not only the MUA level. Usually source code is written in a lowest common denominator of ascii, even for languages that allow unicode identifiers (Java) or markup. That's because source code is handled

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Austin Hastings) writes: Yeah, but ActiveState does Perl, and Microsoft owns ActiveState To what extent are *either* of those statements true? :) -- All the good ones are taken.

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Ken Fox
Damian Conway wrote: Larry Wall wrote: That suggests to me that the circumlocution could be *. A five character multiple symbol??? I guess that's the penalty for not upgrading to something that can handle unicode. Unless this is subtle humor, the Huffman encoding idea is getting seriously

RE: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Garrett Goebel
Ken Fox wrote: Damian Conway wrote: Larry Wall wrote: That suggests to me that the circumlocution could be *. A five character multiple symbol??? I guess that's the penalty for not upgrading to something that can handle unicode. Unless this is subtle humor, the Huffman encoding

RE: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Brent Dax
Garrett Goebel: # Ken Fox wrote: # Unless this is subtle humor, the Huffman encoding idea is getting # seriously out of hand. That 5 char ASCII sequence is *identically* # encoded when read by the human eye. Humans can probably type the 5 # char sequence faster too. How does Unicode win

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Michael Lazzaro
On Monday, November 4, 2002, at 08:55 AM, Brent Dax wrote: # Can't we have our cake and eat it too? Give ASCII digraph or # trigraph alternatives for the incoming tide of Perl6 Unicode? The Unicode version is more typing than the non-Unicode version, so what's the advantage? It's prettier

Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-04 Thread Michael Lazzaro
On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote: You know, separate streams in a for loop are not going to be that common in practic, so maybe we should look around a little harder for a supercomma that isn't a semicolon. Now *that* would be a big step in reducing ambiguity... Or

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
character-sets through Unicode. As part of the agreement, ActiveState will add features previously missing from Windows ports of Perl, as well as full support for Unicode - a key feature to users dealing with Asian character sets. blah blah blah ... =Austin

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Austin Hastings
scorin' script, Von der bis an die all(), Von der any() bis an den - Uniperl, Uniperl uber alles, Uber alles in der welt! So you're one of the very few people who bothered to set up unicode, and now you want to force the rest of us into your own little leet group. Nerp. Hadn't given

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Brian Ingerson
characters as US-ASCII. Several people the other day reported on the differences in UTF8 vs. Latin-1 handling among pine, elm, and other mailers. Not only the MUA level. Usually source code is written in a lowest common denominator of ascii, even for languages that allow unicode

Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Simon Cozens
[EMAIL PROTECTED] (Austin Hastings) writes: If @a [*=] @b; doesn't scan like rats chewing their way into your cable, what does? This is why God gave us functions as well as operators. -- I _am_ pragmatic. That which works, works, and theory can go screw itself. - Linus Torvalds

Re: UTF-8 and Unicode FAQ, demos

2002-11-03 Thread Rafael Garcia-Suarez
Matthew Zimmerman wrote in perl.perl6.language : So let me make my original question a little more general: are Perl 6 source files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of translation mechanism, like specifying the charset on the command line? I expect probably

Unicode Checker

2002-11-03 Thread David Wheeler
For all you Mac OS X fans out there: http://www.earthlingsoft.net/UnicodeChecker/ Regards, David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Simon Cozens
[EMAIL PROTECTED] (Matthew Zimmerman) writes: Larry has been consistently using OxAB op 0xBB in his messages to represent a (French quote) hyperop, (corresponding to the Unicode characters 0x00AB and 0x00BB) More and more conversations like this, (and how many have we seen here already

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Bart Schuller
On Sat, Nov 02, 2002 at 06:07:34AM -0700, Luke Palmer wrote: I do most of my work over an ssh connection to my favorite server, through gnome-terminal. gnome-terminal does not support unicode, so this whole thread has been filled with ?'s and \251's. I can't see a thing... gnome-terminal

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Paul Johnson
to the Unicode characters 0x00AB and 0x00BB) More and more conversations like this, (and how many have we seen here already?) about characters sets, encodings, mail quoting issues, in fact, anything other than Perl, will be rife on every Perl-related mailing list if we persist

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Markus Laire
On 2 Nov 2002 at 0:06, Simon Cozens wrote: [EMAIL PROTECTED] (Matthew Zimmerman) writes: Larry has been consistently using OxAB op 0xBB in his messages to represent a (French quote) hyperop, (corresponding to the Unicode characters 0x00AB and 0x00BB) More and more conversations

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Luke Palmer
, will be rife on every Perl-related mailing list if we persist with this idiotic idea of having Unicode operators. It may seem idiotic to the egocentric people who only needs chars a-z in his language. But for all others (think about Chinese), Unicode is real asset. I don't think anyone's

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread David Wheeler
with this idiotic idea of having Unicode operators. You keep saying or suggesting that the idea of using Unicode operators is idiotic. Perhaps you could make an argument in support that assertion (as Luke and Paul have done). I for one would be interested to hear your reasoning. Regards

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Larry Wall
with this idiotic idea of having Unicode operators. There will certainly be some pain in breaking out of ASCII. It might well be idiotic now, but I don't think it will be idiotic in ten years. And I am quite willing to deal with a certain amount of short-term crap on behalf of the future. Larry

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Matthew Zimmerman
idea of having Unicode operators. I don't really want Unicode operators either, but if it is decided that there will be such operators, I would still _want_to_know_how_to_use_them_. So let me make my original question a little more general: are Perl 6 source files encoded in Latin-1, UTF-8

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Simon Cozens
with this idiotic idea of having Unicode operators. I live in Switzerland and regularly deal with three languages which have various diacritics and special characters. Personally, I would be very happy with Unicode operators, but I fear that Simon's prediction would be accurate and I would much rather

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Simon Cozens
[EMAIL PROTECTED] (Markus Laire) writes: It may seem idiotic to the egocentric people who only needs chars a-z in his language. But for all others (think about Chinese), Unicode is real asset. I don't often think about Chinese. Chinese is hard. But I think about Japanese a lot of the time

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Simon Cozens
[EMAIL PROTECTED] (David Wheeler) writes: You keep saying I didn't think I was doing it habitually. or suggesting that the idea of using Unicode operators is idiotic. Perhaps you could make an argument in support that assertion (as Luke and Paul have done). Sure: More and more

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Damian Conway
Simon Cozens wrote: On the other hand, maybe I'm being as shortsighted as Thomas J Watson [1] and that once the various operating systems do get their Unicode support together and we see the introduction of the 50,000 key keyboard, Of course, scary 50K keyboards aren't really necessary. All we

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Simon Cozens
[EMAIL PROTECTED] (Damian Conway) writes: Of course, scary 50K keyboards aren't really necessary. All we really need is a keybord with configurable keys. That is, each key has an LED, or OLED, or digital plastic surface, and an index key that allows you to select the Unicode block

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Damian Conway
multiple symbol??? I guess that's the penalty for not upgrading to something that can handle unicode. Actually, we could use foo bar baz for qw too if here-docs always have to be ' or . Yes. I thought we'd pretty much decided that anyway, hadn't we? Hmmm...I wonder if one could then write: $str

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread Damian Conway
Simon Cozens wrote: Of course, scary 50K keyboards aren't really necessary. All we really need is a keybord with configurable keys. That is, each key has an LED, or OLED, or digital plastic surface, and an index key that allows you to select the Unicode block to be currently mapped onto

Re: UTF-8 and Unicode FAQ, demos

2002-11-02 Thread David Wheeler
don't see much of an argument there. That a discussion leads to discussions on other mail lists is not a reason not to use Unicode operators. Or so it seems to me. Regards, David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED

Re: UTF-8 and Unicode FAQ, demos

2002-11-01 Thread John Williams
On Thu, 31 Oct 2002, Luke Palmer wrote: now *theres* some brackets! Ooh! Let's use 2AF7 and 2AF8 for qw! Actually, I wanted to suggest »German quotes« instead of French for qw. :) ~ John Williams

Re: UTF-8 and Unicode FAQ, demos

2002-11-01 Thread Larry Wall
On Fri, Nov 01, 2002 at 10:05:27AM -0700, John Williams wrote: On Thu, 31 Oct 2002, Luke Palmer wrote: now *theres* some brackets! Ooh! Let's use 2AF7 and 2AF8 for qw! Actually, I wanted to suggest »German quotes« instead of French for qw. :) Well, the other guys are

Re: UTF-8 and Unicode FAQ, demos

2002-11-01 Thread Matthew Zimmerman
Larry has been consistently using OxAB op 0xBB in his messages to represent a (French quote) hyperop, (corresponding to the Unicode characters 0x00AB and 0x00BB) which is consistent with the iso-8859-1 encoding (despite the fact that my mailserver or his mailer insists on labelling those

UTF-8 and Unicode FAQ, demos

2002-10-31 Thread Michael Lazzaro
Here is an extensive FAQ for Unicode and UTF-8: http://www.cl.cam.ac.uk/~mgk25/unicode.html and here is a test file that will show you how many of the most common glyphs (WGL4, via Microsoft) you are capable of displaying in your current setup: http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4

Re: UTF-8 and Unicode FAQ, demos

2002-10-31 Thread Michael Lazzaro
And if you really want to drool at all the neat glyphs that the wonderful, magical world of math has given us, check out: http://www.unicode.org/charts/PDF/U2A00.pdf now *theres* some brackets! MikeL

Re: UTF-8 and Unicode FAQ, demos

2002-10-31 Thread Luke Palmer
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Date: Thu, 31 Oct 2002 10:11:00 -0800 From: Michael Lazzaro [EMAIL PROTECTED] X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/ And if you really want to drool at all the neat glyphs that the wonderful, magical world of math

Re: UTF-8 and Unicode FAQ, demos

2002-10-31 Thread Austin Hastings
--- Luke Palmer [EMAIL PROTECTED] wrote: And if you really want to drool at all the neat glyphs that the wonderful, magical world of math has given us, check out: http://www.unicode.org/charts/PDF/U2A00.pdf now *theres* some brackets! Ooh! Let's use 2AF7 and 2AF8 for

RE: Unicode thoughts...

2002-03-30 Thread Dan Sugalski
At 4:32 PM -0800 3/25/02, Brent Dax wrote: I *really* strongly suggest we include ICU in the distribution. I recently had to turn off mod_ssl in the Apache 2 distro because I couldn't get OpenSSL downloaded and configured. FWIW, ICU in the distribution is a given if we use it. Parrot will

Re: Unicode thoughts...

2002-03-30 Thread Josh Wilmes
Someone said that ICU requires a C++ compiler. That's concerning to me, as is the issue of how we bootstrap our build process. We were planning on a platform-neutral miniparrot, and IMHO that can't include ICU (as i'm sure it's not going to be written in pure ansi C) --Josh At 8:45 on

Re: Unicode thoughts...

2002-03-30 Thread Dan Sugalski
At 10:07 AM -0500 3/30/02, Josh Wilmes wrote: Someone said that ICU requires a C++ compiler. That's concerning to me, as is the issue of how we bootstrap our build process. We were planning on a platform-neutral miniparrot, and IMHO that can't include ICU (as i'm sure it's not going to be

Re: Unicode thoughts...

2002-03-30 Thread Jeff
Dan Sugalski wrote: At 10:07 AM -0500 3/30/02, Josh Wilmes wrote: Someone said that ICU requires a C++ compiler. That's concerning to me, as is the issue of how we bootstrap our build process. We were planning on a platform-neutral miniparrot, and IMHO that can't include ICU (as i'm sure

RE: Unicode thoughts...

2002-03-25 Thread Brent Dax
Jeff: # This will likely open yet another can of worms, but Unicode has been # delayed for too long, I think. It's time to add the Unicode libraries # (In our case, the ICU libraries at http://oss.software.ibm.com/icu/, # which Larry has now blessed) to Parrot. string.c already has # (admittedly

RE: Unicode thoughts...

2002-03-25 Thread Charles Bunders
We also need to make sure ICU will work everywhere. And I do mean *everywhere*. Will it work on VMS? Palm OS? Crays? Nope, nope, and nope. From their site - Operating systemCompilerTesting frequency Windows 98/NT/2000 Microsoft Visual C++ 6.0Reference

Re: Unicode thoughts...

2002-03-25 Thread Josh Wilmes
This is rather concerning to me. As I understand it, one of the goals for parrot was to be able to have a usable subset of it which is totally platform-neutral (pure ANSI C). If we start to depend too much on another library which may not share that goal, we could have trouble with the

RE: Unicode thoughts...

2002-03-25 Thread Hong Zhang
I think it will be relative easy to deal with different compiler and different operating system. However, ICU does contain some C++ code. It will make life much harder, since current Parrot only assume ANSI C (even a subset of it). Hong This is rather concerning to me. As I understand it,

Re: Unicode thoughts...

2002-03-25 Thread Jeff
consisting of basic Unicode utilities we'll need, such as Unicode_isdigit(). This can be a simple wrapper around isdigit() for the moment, until I sort out which files we need from the Unicode database, and what support functions/data structures will be required. Given that we're dedicated

Re: Unicode thoughts...

2002-03-25 Thread Jeff
that, I think an interim solution consisting of basic Unicode utilities we'll need, such as Unicode_isdigit(). This can be a simple wrapper around isdigit() for the moment, until I sort out which files we need from the Unicode database, and what support functions/data structures will be required

Unicode thoughts...

2002-03-24 Thread Jeff
This will likely open yet another can of worms, but Unicode has been delayed for too long, I think. It's time to add the Unicode libraries (In our case, the ICU libraries at http://oss.software.ibm.com/icu/, which Larry has now blessed) to Parrot. string.c already has (admittedly unavoidable, due

Re: [ID 20020130.001] Unicode broken for 0x10FFFF

2002-01-30 Thread Larry Wall
, character strings are simply sequences of integers. The internal representation must be optimized for this concept, not for any particular Unicode representation, whether UTF-8 or UTF-16 or UTF-32. Any of these could be used as underlying representations, but the abstraction of sequences of integers must

RE: [ID 20020130.001] Unicode broken for 0x10FFFF

2002-01-30 Thread Brent Dax
Larry Wall: # For various reasons, some of which relate to the sequence-of-integer # abstraction, and some of which relate to infinite strings # and arrays, # I think Perl 6 strings are likely to be represented by a list of # chunks, where each chunk is a sequence of integers of the same size or

Python on Unicode etc.

2001-06-22 Thread Nathan Torkington
This is from the latest python-dev summary. It might be of interest to folks considering how to store strings. * Adding .decode() method to Unicode * Marc-Andre Lemburg asked for opinions on adding a .decode method to unicode objects: http://mail.python.org/pipermail/python-dev/2001

RE: Should we care much about this Unicode-ish criticism?

2001-06-11 Thread Hong Zhang
for formatting. It has nothing to do width the lowercase/uppercase in roman language. I believe Unicode has many font characters. Is this Uppercase? Is this Lowercase? I believe the Unicode already defines character categories, such as L, Lu, Ll, Lo. I prefer we just use unicode term instead of extending

RE: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread NeonEdge
overloading of the locale-specific functionality. I think that's actually more than most other implementations do. Ok, that said, the way I see it (and I'm probably wrong), is that Perl may need to know the following things about each character in Unicode, based upon locale (correct me if I'm wrong

Re: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread Bryan C . Warnock
On Saturday 09 June 2001 06:24 am, NeonEdge wrote: Is this Uppercase? Is this Lowercase (is this 'half-digits', as Hong mentioned?) (if 'Caseless' needed, just !Upper !Lower?) Titlecase. Is this Punctuation? Is this a digit? Is this a word character? Is this Whitespace? Maps to

Re: Should we care much about this Unicode-ish criticism?

2001-06-09 Thread Bryan C . Warnock
/. rebuttal of the original article at http://slashdot.org/features/01/06/06/0132203.shtml, for those that haven't seen it yet. -- Bryan C. Warnock [EMAIL PROTECTED]

RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
understanding is there is NO general unicode sorting, period. The most useful one must be locale-sensitive, as defined by unicode collation. In practice, the story is even worse. For example, how do you sort strings comming from different locales, say I have an address book with names from all over

RE: Unicode sorting...

2001-06-08 Thread NeonEdge
Another example is the chinese has no definite sorting order, period. The commonly used scheme are phonetic-based or stroke-based. Since many characters have more than one pronounciations (context sensitive) and more than one forms (simplified and traditional). So if we have a mix content

RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
If this is the case, how would a regex like ^[a-zA-Z] work (or other, more sensitive characters)? If just about anything can come between A and Z, and letters that might be there in a particular locale aren't in another locale, then how will regex engine make the distinction? This syntax

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
If this is the case, how would a regex like ^[a-zA-Z] work (or other, more sensitive characters)? If just about anything can come between A and Z, and letters that might be there in a particular locale aren't in another locale, then how will regex engine make the distinction? This

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Nicholas Clark
|FALSE, if the extended boolean attributes allow transbinary truth values. Well, UNKNOWN isn't accurate either; the case *is* known. It's just neither upper nor lowercase. (I wonder what it should return for titlecase characters too, for that matter.) What happens if unicode supported

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Russ Allbery
Dan Sugalski [EMAIL PROTECTED] writes: At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote: One reason perl5.7.1+'s Encode does not do asian encodings yet is that the tables I have found so far (Mainly Unicode 3.0 based) are lossy. Joy. Hopefully by the time we're done there'll be a full

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Russ Allbery
Nicholas Clark [EMAIL PROTECTED] writes: What happens if unicode supported uppercase and lowercase numbers? [I had a dig about, and it doesn't seem to mention lowercase or uppercase digits. Are they just a typography distinction, and hence not enough to be worthy of codepoints?] Damned

RE: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Hong Zhang
What happens if unicode supported uppercase and lowercase numbers? [I had a dig about, and it doesn't seem to mention lowercase or uppercase digits. Are they just a typography distinction, and hence not enough to be worthy of codepoints?] Damned if I know; I didn't know there even

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
Z (not immediately, but doesn't matter here) in the latter. Similarly for all the accented alphabetic characters, the rules how they are sorted differ from one place to another , and many languages have special combinations like ch, ss, ij that require special attention. Unicode defines

Re: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Dan Sugalski
At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote: Dan Sugalski [EMAIL PROTECTED] writes: It does bring up a deeper issue, however. Unicode is, at the moment, apparently inadequate to represent at least some part of the asian languages. Are the encodings currently in use less inadequate

RE: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Dan Sugalski
alphabets. That wasn't the issue, though--it was whether Unicode should treat what are essentially antique or artistic representations of glyphs as separate glyphs or not. (Not that it's really for us to decide, but still...) I really don't think that's our problem--it's a font issue

Unicode sorting...

2001-06-08 Thread NeonEdge
I can't really believe that this would be a problem, but if they're integrated alphabets from different locales, will there be issues with sorting (if we're not planning to use the locale)? Are there instances where like characters were combined that will affect the sort orders? Grant M.

RE: Unicode sorting...

2001-06-08 Thread Dan Sugalski
At 11:29 AM 6/8/2001 -0700, Hong Zhang wrote: If this is the case, how would a regex like ^[a-zA-Z] work (or other, more sensitive characters)? If just about anything can come between A and Z, and letters that might be there in a particular locale aren't in another locale, then how will

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
groups. We just need to make sure there's a named group for the different languages we know of--things like [[:kanji]] or [[:hiragana]] for example. It's spelled \p{...} (after I fixed a silly typo in bleadperl) $ ./perl -Ilib -wle 'print a if \x{30a1} =~ /\p{InKatakana}/' a $ grep 30A1 lib/unicode

Re: Should we care much about this Unicode-ish criticism?

2001-06-07 Thread Nick Ing-Simmons
Dan Sugalski [EMAIL PROTECTED] writes: It does bring up a deeper issue, however. Unicode is, at the moment, apparently inadequate to represent at least some part of the asian languages. Are the encodings currently in use less inadequate? I've been assuming that an Anything-Unicode translation

RE: Should we care much about this Unicode-ish criticism?

2001-06-07 Thread Garrett Goebel
From: David L. Nicol [mailto:[EMAIL PROTECTED]] Russ Allbery wrote: a caseless character wouldn't show up in either IsLower or IsUpper. maybe an IsCaseless is warrented -- or Is[Upper|Lower] could return UNKNOWN instead of TRUE|FALSE, if the extended boolean attributes allow

RE: Should we care much about this Unicode-ish criticism?

2001-06-07 Thread Nick Ing-Simmons
Dan Sugalski [EMAIL PROTECTED] writes: I think I'd agree there. Different versions of a glyph are more a matter of art and handwriting styles, and that's not really something we ought to get involved in. But the human sitting in front of the machine cannot see the bit pattern, they can only

RE: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread NeonEdge
Before people get their panties in a bunch, I'm not dissing Unicode. The point that I am trying to make is that Unicode will probably never make everyone happy. It WILL likely become widely accepted, and should offer the best solution yet to integrating the major character sets into one

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Simon Cozens
On Wed, Jun 06, 2001 at 07:28:45AM -0400, NeonEdge wrote: If that was the goal, then they failed. Oh, for heaven's sake, don't be silly. Our goal is to write Perl 6. We haven't done that yet. That was our goal, so we failed? -- IT support will, from 1 October 2000, be provided by college and

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Dan Sugalski
about such characters, I'd rather see an optional output discipline that enforces strict Unicode output. Fair enough. On the other hand, maybe there's some use for a data structure that is a sequence of integers of various sizes, where the representation of different chunks of the array/string

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Larry Wall
arbitrary 2**n limits to gain performance.) I'm much more interested in the clean abstraction of a string is a sequence of integers than I am in the fact that those integers happen to represent particular characters under Unicode. To be sure, it's quite handy that those integers do represent

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Larry Wall
: that, and I have serious reservations about the speed of dealing with : variable length characters instead of fixed-length ones. Whether you buy it or not, I wasn't offering it as a mere conjecture. That is precisely what Perl 5.6+ is already doing for Unicode data. It's not a big deal unless

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread David L. Nicol
Dan Sugalski wrote: I've been assuming that an Anything-Unicode translation will be lossless, but this makes me wonder whether that assumption is correct. I seem to recall from reading articles on this issue that the issue is encoding of arrangement: Even with an unlimited number

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread David L. Nicol
Russ Allbery wrote: a caseless character wouldn't show up in either IsLower or IsUpper. maybe an IsCaseless is warrented -- or Is[Upper|Lower] could return UNKNOWN instead of TRUE|FALSE, if the extended boolean attributes allow transbinary truth values.

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Simon Cozens
On Wed, Jun 06, 2001 at 09:57:58AM -0400, NeonEdge wrote: Perl 6 cannot assume that Unicode is done. Don't tell anyone, but it never did. -- Thus spake the master programmer: After three days without programming, life becomes meaningless. -- Geoffrey James, The Tao

RE: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread NeonEdge
Oh, for heaven's sake, don't be silly. Our goal is to write Perl 6. We haven't done that yet. That was our goal, so we failed? Don't be ridiculous. With that as our goal, the ONLY way we could fail is to NEVER write Perl 6. Unicode, on the other hand, was originally released for public

Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Simon Cozens
. Exactly. Same goes for Unicode. And pretty much every other standard, in fact. These things evolve. They're *never* done. I don't see anyone claiming that they *are* done, but I see you telling us that they should be regarded as a failure if they're not done. Uh, great. So they're incomplete. So what

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens
:) : Unicode itself is, like the JIS standard, simply an enumeration of characters with their orderings; it says nothing about how the data is represented to the computer, and must be supplemented by one of several Unicode Transformation Formats which describe the encoding. However

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang
Courtesy of Slashdot, http://www.hastingsresearch.com/net/04-unicode-limitations.shtml I'm not sure if this is an issue for us or not, as we're generally language-neutral, and I don't see any technical issues with any of the UTF-* encodings having headroom problems. I think the author

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang
Firstly, the JIS standard defines, along with the ordering and enumeration of its characters, their glyph shape. Unicode, on the other hand does not. This means that as far as Unicode is concerned, there is literally no distinction between two distinct shapes and hence no way to specify

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Simon Cozens
On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote: Is it just me, or does this entire article reduce not to Unicode doesn't work but Unicode should assign more characters? Yes. And Unicode has assigned more characters; it's factually challenged. -- And it should be the law: If you

Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Dan Sugalski
At 06:22 PM 6/5/2001 +0100, Simon Cozens wrote: On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote: Is it just me, or does this entire article reduce not to Unicode doesn't work but Unicode should assign more characters? Yes. And Unicode has assigned more characters; it's factually

<    1   2   3   4   5   >