Larry wrote:
But at the moment I'm thinking there's something wrong about any
approach that requires a special character on the signature side.
I'm starting to think that all the convolving should be specified
on the left. So in this:
for parallel(x, y, z) - $x, $y, $z { ... }
the
--- Matthew Zimmerman [EMAIL PROTECTED] wrote:
On Sun, Nov 03, 2002 at 09:41:44AM -, Rafael Garcia-Suarez wrote:
Matthew Zimmerman wrote in perl.perl6.language :
So let me make my original question a little more
general: are Perl 6 source files encoded in Latin-1,
UTF-8, or
, everybody's doing it! First one's free, kid... ;-)
People who believe slippery slope arguments should never go skiing.
On the other hand, even the useful slippery slopes have beginner
slopes. I think one advantage of using Unicode for advanced features
is that it *looks* scary. So in general we
will be the default.
Actually, Unicode will be the default. 8859-1 can probably also be
handled without declaration.
If you want trigraph support, you'll have to put
use encoding 'ugly-american';
at the top of your files. ;-) ;-) ;-)
Otherwise, it'll be one-character ?fancyops? all the way.
Mmm, I
--- [EMAIL PROTECTED], UNEXPECTED_DATA_AFTER_ADDRESS@.SYNTAX-ERROR.
wrote:
Mmm, I view one-character Unicode operators as more of an escape
hatch
for the future, not as something to be made mandatory. But then,
I'm one of those ugly Americans.
EBCDIC didn't support brackets, originally, so
On 2002-11-04 at 12:26:56, Austin Hastings wrote:
1- ? and ? are really useful in my context.
Okay. Now can you get your mailer to send them properly? :)
After all, there's gotta be some advantage to
being the Fearless Leader...
Larry
Thousands will cry for the blood of the Perl 6
design team. As Leader, you can draw their ire.
Because you are Fearless, you won't mind...
--
ralph
Ken Fox wrote:
I know I'm just another sample point in a sea of samples, but
my embedded symbol parser seems optimized for alphabetic symbols.
The cool non-alphabetic Unicode symbols are beautiful to look at,
but they don't help me read or write faster.
Once again: we're only talking about
Garrett Goebel wrote:
Can't we have our cake and eat it too? Give ASCII digraph or trigraph
alternatives for the incoming tide of Perl6 Unicode?
Allow both * and »*«?
I'd really prefer we didn't. I'd much rather keep and for other
things.
Or something similar '*', [*], etc...
Much as I
people on the list who can't be bothered to read
the documentation for their own keyboard IO system.
Most of this discussion seems to focus on keyboarding.
But that's of little consequence. This will always be
spotted before it does much harm and will affect just
one person and their software
--- Me [EMAIL PROTECTED] wrote:
people on the list who can't be bothered to read
the documentation for their own keyboard IO system.
Most of this discussion seems to focus on keyboarding.
But that's of little consequence. This will always be
spotted before it does much harm and will
of us to use them.
So you're one of the very few people who bothered to set up unicode, and now
you want to force the rest of us into your own little leet group. Given the
choice between learning how to reconfigure their keyboard, editor, terminal,
fonts, and everything else, or just not learning
On Mon, Nov 04, 2002 at 12:26:56PM -0800, Austin Hastings wrote:
In short:
1- ? and ? are really useful in my context.
2- I can make my work environment generate them in one (modified)
keystroke.
3- I can make my home environment do likewise.
4- The ascii-only version isn't faster and
character
sets to come and play with the big boys. And eventually, the old trigraphs
died out because everyone caught up with the decent (for the era) character
sets.
That's assuming we have to have Unicode operators.
I would, however, like to hear a passionate argument in favour of
this, because
on the differences
in
UTF8 vs. Latin-1 handling among pine, elm, and other mailers.
Not only the MUA level. Usually source code is written in a lowest
common denominator of ascii, even for languages that allow unicode
identifiers (Java) or markup. That's because source code is handled
[EMAIL PROTECTED] (Austin Hastings) writes:
Yeah, but ActiveState does Perl, and Microsoft owns ActiveState
To what extent are *either* of those statements true? :)
--
All the good ones are taken.
Damian Conway wrote:
Larry Wall wrote:
That suggests to me that the circumlocution could be *.
A five character multiple symbol??? I guess that's the penalty for not
upgrading to something that can handle unicode.
Unless this is subtle humor, the Huffman encoding idea is getting
seriously
Ken Fox wrote:
Damian Conway wrote:
Larry Wall wrote:
That suggests to me that the circumlocution could be *.
A five character multiple symbol??? I guess that's the
penalty for not upgrading to something that can handle
unicode.
Unless this is subtle humor, the Huffman encoding
Garrett Goebel:
# Ken Fox wrote:
# Unless this is subtle humor, the Huffman encoding idea is getting
# seriously out of hand. That 5 char ASCII sequence is *identically*
# encoded when read by the human eye. Humans can probably type the 5
# char sequence faster too. How does Unicode win
On Monday, November 4, 2002, at 08:55 AM, Brent Dax wrote:
# Can't we have our cake and eat it too? Give ASCII digraph or
# trigraph alternatives for the incoming tide of Perl6 Unicode?
The Unicode version is more typing than the non-Unicode version, so
what's the advantage? It's prettier
On Monday, November 4, 2002, at 11:58 AM, Larry Wall wrote:
You know, separate streams in a for loop are not going to be that
common in practic, so maybe we should look around a little harder for
a supercomma that isn't a semicolon. Now *that* would be a big step
in reducing ambiguity...
Or
character-sets through Unicode.
As part of the agreement, ActiveState will add features
previously missing from Windows ports of Perl, as well as full support
for Unicode - a key feature to users dealing with Asian character sets.
blah blah blah ...
=Austin
scorin' script,
Von der bis an die all(),
Von der any() bis an den -
Uniperl, Uniperl uber alles,
Uber alles in der welt!
So you're one of the very few people who bothered to set up unicode,
and now you want to force the rest of us into your own little
leet group.
Nerp. Hadn't given
characters
as
US-ASCII. Several people the other day reported on the differences
in
UTF8 vs. Latin-1 handling among pine, elm, and other mailers.
Not only the MUA level. Usually source code is written in a lowest
common denominator of ascii, even for languages that allow unicode
[EMAIL PROTECTED] (Austin Hastings) writes:
If @a [*=] @b; doesn't scan like rats chewing their way into your
cable, what does?
This is why God gave us functions as well as operators.
--
I _am_ pragmatic. That which works, works, and theory can go screw
itself.
- Linus Torvalds
Matthew Zimmerman wrote in perl.perl6.language :
So let me make my original question a little more general: are Perl 6 source
files encoded in Latin-1, UTF-8, or will Perl 6 provide some sort of
translation mechanism, like specifying the charset on the command line?
I expect probably
For all you Mac OS X fans out there:
http://www.earthlingsoft.net/UnicodeChecker/
Regards,
David
--
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
http://david.wheeler.net/ Yahoo!: dew7e
[EMAIL PROTECTED] (Matthew Zimmerman) writes:
Larry has been consistently using
OxAB op 0xBB
in his messages to represent a (French quote) hyperop,
(corresponding to the Unicode characters 0x00AB and 0x00BB)
More and more conversations like this, (and how many have we seen here
already
On Sat, Nov 02, 2002 at 06:07:34AM -0700, Luke Palmer wrote:
I do most of my work over an ssh connection to my favorite server,
through gnome-terminal. gnome-terminal does not support unicode, so
this whole thread has been filled with ?'s and \251's. I can't see a
thing...
gnome-terminal
to the Unicode characters 0x00AB and 0x00BB)
More and more conversations like this, (and how many have we seen here
already?) about characters sets, encodings, mail quoting issues, in
fact, anything other than Perl, will be rife on every Perl-related
mailing list if we persist
On 2 Nov 2002 at 0:06, Simon Cozens wrote:
[EMAIL PROTECTED] (Matthew Zimmerman) writes:
Larry has been consistently using
OxAB op 0xBB
in his messages to represent a (French quote) hyperop,
(corresponding to the Unicode characters 0x00AB and 0x00BB)
More and more conversations
, will be rife on every Perl-related
mailing list if we persist with this idiotic idea of having Unicode
operators.
It may seem idiotic to the egocentric people who only needs chars a-z
in his language. But for all others (think about Chinese), Unicode is
real asset.
I don't think anyone's
with this idiotic idea of having Unicode
operators.
You keep saying or suggesting that the idea of using Unicode operators
is idiotic. Perhaps you could make an argument in support that
assertion (as Luke and Paul have done). I for one would be interested
to hear your reasoning.
Regards
with this idiotic idea of having Unicode
operators.
There will certainly be some pain in breaking out of ASCII. It might
well be idiotic now, but I don't think it will be idiotic in ten years.
And I am quite willing to deal with a certain amount of short-term crap
on behalf of the future.
Larry
idea of having Unicode
operators.
I don't really want Unicode operators either, but if it is decided that
there will be such operators, I would still _want_to_know_how_to_use_them_.
So let me make my original question a little more general: are Perl 6 source
files encoded in Latin-1, UTF-8
with this idiotic idea of having Unicode
operators.
I live in Switzerland and regularly deal with three languages which have
various diacritics and special characters. Personally, I would be very
happy with Unicode operators, but I fear that Simon's prediction would
be accurate and I would much rather
[EMAIL PROTECTED] (Markus Laire) writes:
It may seem idiotic to the egocentric people who only needs chars a-z
in his language. But for all others (think about Chinese), Unicode is
real asset.
I don't often think about Chinese. Chinese is hard. But I think about
Japanese a lot of the time
[EMAIL PROTECTED] (David Wheeler) writes:
You keep saying
I didn't think I was doing it habitually.
or suggesting that the idea of using Unicode operators
is idiotic. Perhaps you could make an argument in support that
assertion (as Luke and Paul have done).
Sure:
More and more
Simon Cozens wrote:
On the other hand, maybe I'm being as shortsighted as Thomas J Watson
[1] and that once the various operating systems do get their Unicode
support together and we see the introduction of the 50,000 key keyboard,
Of course, scary 50K keyboards aren't really necessary. All we
[EMAIL PROTECTED] (Damian Conway) writes:
Of course, scary 50K keyboards aren't really necessary. All we really need is
a keybord with configurable keys. That is, each key has an LED, or OLED,
or digital plastic surface, and an index key that allows you to select the
Unicode block
multiple symbol??? I guess that's the penalty for not
upgrading to something that can handle unicode.
Actually, we could use foo bar baz for qw too if here-docs always
have to be ' or .
Yes. I thought we'd pretty much decided that anyway, hadn't we?
Hmmm...I wonder if one could then write:
$str
Simon Cozens wrote:
Of course, scary 50K keyboards aren't really necessary. All we really need is
a keybord with configurable keys. That is, each key has an LED, or OLED,
or digital plastic surface, and an index key that allows you to select the
Unicode block to be currently mapped onto
don't see much of an argument there. That a discussion leads
to discussions on other mail lists is not a reason not to use Unicode
operators. Or so it seems to me.
Regards,
David
--
David Wheeler AIM: dwTheory
[EMAIL PROTECTED
On Thu, 31 Oct 2002, Luke Palmer wrote:
now *theres* some brackets!
Ooh! Let's use 2AF7 and 2AF8 for qw!
Actually, I wanted to suggest »German quotes« instead of French for qw.
:)
~ John Williams
On Fri, Nov 01, 2002 at 10:05:27AM -0700, John Williams wrote:
On Thu, 31 Oct 2002, Luke Palmer wrote:
now *theres* some brackets!
Ooh! Let's use 2AF7 and 2AF8 for qw!
Actually, I wanted to suggest »German quotes« instead of French for qw.
:)
Well, the other guys are
Larry has been consistently using
OxAB op 0xBB
in his messages to represent a (French quote) hyperop,
(corresponding to the Unicode characters 0x00AB and 0x00BB)
which is consistent with the iso-8859-1 encoding (despite
the fact that my mailserver or his mailer insists on
labelling those
Here is an extensive FAQ for Unicode and UTF-8:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
and here is a test file that will show you how many of the most common
glyphs (WGL4, via Microsoft) you are capable of displaying in your
current setup:
http://www.cl.cam.ac.uk/~mgk25/ucs/wgl4
And if you really want to drool at all the neat glyphs that the
wonderful, magical world of math has given us, check out:
http://www.unicode.org/charts/PDF/U2A00.pdf
now *theres* some brackets!
MikeL
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
Date: Thu, 31 Oct 2002 10:11:00 -0800
From: Michael Lazzaro [EMAIL PROTECTED]
X-SMTPD: qpsmtpd/0.12, http://develooper.com/code/qpsmtpd/
And if you really want to drool at all the neat glyphs that the
wonderful, magical world of math
--- Luke Palmer [EMAIL PROTECTED] wrote:
And if you really want to drool at all the neat glyphs that the
wonderful, magical world of math has given us, check out:
http://www.unicode.org/charts/PDF/U2A00.pdf
now *theres* some brackets!
Ooh! Let's use 2AF7 and 2AF8 for
At 4:32 PM -0800 3/25/02, Brent Dax wrote:
I *really* strongly suggest we include ICU in the distribution. I
recently had to turn off mod_ssl in the Apache 2 distro because I
couldn't get OpenSSL downloaded and configured.
FWIW, ICU in the distribution is a given if we use it.
Parrot will
Someone said that ICU requires a C++ compiler. That's concerning to me,
as is the issue of how we bootstrap our build process. We were planning
on a platform-neutral miniparrot, and IMHO that can't include ICU (as i'm
sure it's not going to be written in pure ansi C)
--Josh
At 8:45 on
At 10:07 AM -0500 3/30/02, Josh Wilmes wrote:
Someone said that ICU requires a C++ compiler. That's concerning to me,
as is the issue of how we bootstrap our build process. We were planning
on a platform-neutral miniparrot, and IMHO that can't include ICU (as i'm
sure it's not going to be
Dan Sugalski wrote:
At 10:07 AM -0500 3/30/02, Josh Wilmes wrote:
Someone said that ICU requires a C++ compiler. That's concerning to me,
as is the issue of how we bootstrap our build process. We were planning
on a platform-neutral miniparrot, and IMHO that can't include ICU (as i'm
sure
Jeff:
# This will likely open yet another can of worms, but Unicode has been
# delayed for too long, I think. It's time to add the Unicode libraries
# (In our case, the ICU libraries at http://oss.software.ibm.com/icu/,
# which Larry has now blessed) to Parrot. string.c already has
# (admittedly
We also need to make sure ICU will work everywhere. And I do mean
*everywhere*. Will it work on VMS? Palm OS? Crays?
Nope, nope, and nope.
From their site -
Operating systemCompilerTesting frequency
Windows 98/NT/2000 Microsoft Visual C++ 6.0Reference
This is rather concerning to me. As I understand it, one of the goals for
parrot was to be able to have a usable subset of it which is totally
platform-neutral (pure ANSI C). If we start to depend too much on
another library which may not share that goal, we could have trouble
with the
I think it will be relative easy to deal with different compiler
and different operating system. However, ICU does contain some
C++ code. It will make life much harder, since current Parrot
only assume ANSI C (even a subset of it).
Hong
This is rather concerning to me. As I understand it,
consisting of basic Unicode
utilities we'll need, such as Unicode_isdigit(). This can be a simple
wrapper around isdigit() for the moment, until I sort out which files we
need from the Unicode database, and what support functions/data
structures will be required.
Given that we're dedicated
that, I think an interim solution consisting of basic Unicode
utilities we'll need, such as Unicode_isdigit(). This can be a simple
wrapper around isdigit() for the moment, until I sort out which files we
need from the Unicode database, and what support functions/data
structures will be required
This will likely open yet another can of worms, but Unicode has been
delayed for too long, I think. It's time to add the Unicode libraries
(In our case, the ICU libraries at http://oss.software.ibm.com/icu/,
which Larry has now blessed) to Parrot. string.c already has (admittedly
unavoidable, due
, character strings are simply sequences of integers.
The internal representation must be optimized for this concept, not for
any particular Unicode representation, whether UTF-8 or UTF-16 or
UTF-32. Any of these could be used as underlying representations, but
the abstraction of sequences of integers must
Larry Wall:
# For various reasons, some of which relate to the sequence-of-integer
# abstraction, and some of which relate to infinite strings
# and arrays,
# I think Perl 6 strings are likely to be represented by a list of
# chunks, where each chunk is a sequence of integers of the same size or
This is from the latest python-dev summary. It might be of interest
to folks considering how to store strings.
* Adding .decode() method to Unicode * Marc-Andre Lemburg asked for
opinions on adding a .decode method to unicode objects:
http://mail.python.org/pipermail/python-dev/2001
for formatting. It has nothing
to do width the lowercase/uppercase in roman language. I believe Unicode
has many font characters.
Is this Uppercase?
Is this Lowercase?
I believe the Unicode already defines character categories, such as
L, Lu, Ll, Lo. I prefer we just use unicode term instead of extending
overloading of the
locale-specific functionality. I think that's actually more than most other
implementations do.
Ok, that said, the way I see it (and I'm probably wrong), is that Perl may need
to know the following things about each character in Unicode, based upon locale
(correct me if I'm wrong
On Saturday 09 June 2001 06:24 am, NeonEdge wrote:
Is this Uppercase?
Is this Lowercase (is this 'half-digits', as Hong mentioned?)
(if 'Caseless' needed, just !Upper !Lower?)
Titlecase.
Is this Punctuation?
Is this a digit?
Is this a word character?
Is this Whitespace?
Maps to
/. rebuttal of the original article at
http://slashdot.org/features/01/06/06/0132203.shtml, for those that haven't
seen it yet.
--
Bryan C. Warnock
[EMAIL PROTECTED]
understanding is there is NO general unicode sorting, period.
The most useful one must be locale-sensitive, as defined by unicode
collation. In practice, the story is even worse. For example, how do
you sort strings comming from different locales, say I have an address
book with names from all over
Another example is the chinese has no definite
sorting order, period. The commonly used scheme are
phonetic-based or stroke-based. Since many characters
have more than one pronounciations (context sensitive)
and more than one forms (simplified and traditional).
So if we have a mix content
If this is the case, how would a regex like ^[a-zA-Z] work (or other,
more
sensitive characters)? If just about anything can come between A and Z,
and
letters that might be there in a particular locale aren't in another
locale,
then how will regex engine make the distinction?
This syntax
If this is the case, how would a regex like ^[a-zA-Z] work (or other,
more
sensitive characters)? If just about anything can come between A and Z,
and
letters that might be there in a particular locale aren't in another
locale,
then how will regex engine make the distinction?
This
|FALSE, if the extended boolean attributes allow
transbinary truth values.
Well, UNKNOWN isn't accurate either; the case *is* known. It's just
neither upper nor lowercase.
(I wonder what it should return for titlecase characters too, for that
matter.)
What happens if unicode supported
Dan Sugalski [EMAIL PROTECTED] writes:
At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote:
One reason perl5.7.1+'s Encode does not do asian encodings yet is that
the tables I have found so far (Mainly Unicode 3.0 based) are lossy.
Joy. Hopefully by the time we're done there'll be a full
Nicholas Clark [EMAIL PROTECTED] writes:
What happens if unicode supported uppercase and lowercase numbers?
[I had a dig about, and it doesn't seem to mention lowercase or
uppercase digits. Are they just a typography distinction, and hence not
enough to be worthy of codepoints?]
Damned
What happens if unicode supported uppercase and lowercase numbers?
[I had a dig about, and it doesn't seem to mention lowercase or
uppercase digits. Are they just a typography distinction,
and hence not
enough to be worthy of codepoints?]
Damned if I know; I didn't know there even
Z (not immediately, but
doesn't matter here) in the latter. Similarly for all the accented
alphabetic characters, the rules how they are sorted differ from one
place to another , and many languages have special combinations like
ch, ss, ij that require special attention.
Unicode defines
At 05:20 PM 6/7/2001 +, Nick Ing-Simmons wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
It does bring up a deeper issue, however. Unicode is, at the moment,
apparently inadequate to represent at least some part of the asian
languages. Are the encodings currently in use less inadequate
alphabets.
That wasn't the issue, though--it was whether Unicode should treat what are
essentially antique or artistic representations of glyphs as separate
glyphs or not. (Not that it's really for us to decide, but still...) I
really don't think that's our problem--it's a font issue
I can't really believe that this would be a problem, but if they're integrated
alphabets from different locales, will there be issues with sorting (if we're
not planning to use the locale)? Are there instances where like characters were
combined that will affect the sort orders?
Grant M.
At 11:29 AM 6/8/2001 -0700, Hong Zhang wrote:
If this is the case, how would a regex like ^[a-zA-Z] work (or other,
more
sensitive characters)? If just about anything can come between A and Z,
and
letters that might be there in a particular locale aren't in another
locale,
then how will
groups. We just need to make sure there's a named group for the different
languages we know of--things like [[:kanji]] or [[:hiragana]] for example.
It's spelled \p{...} (after I fixed a silly typo in bleadperl)
$ ./perl -Ilib -wle 'print a if \x{30a1} =~ /\p{InKatakana}/'
a
$ grep 30A1 lib/unicode
Dan Sugalski [EMAIL PROTECTED] writes:
It does bring up a deeper issue, however. Unicode is, at the moment,
apparently inadequate to represent at least some part of the asian
languages. Are the encodings currently in use less inadequate? I've been
assuming that an Anything-Unicode translation
From: David L. Nicol [mailto:[EMAIL PROTECTED]]
Russ Allbery wrote:
a caseless character wouldn't show up in
either IsLower or IsUpper.
maybe an IsCaseless is warrented -- or Is[Upper|Lower]
could return UNKNOWN instead of TRUE|FALSE, if the
extended boolean attributes allow
Dan Sugalski [EMAIL PROTECTED] writes:
I think I'd agree there. Different versions of a glyph are more a matter of
art and handwriting styles, and that's not really something we ought to get
involved in.
But the human sitting in front of the machine cannot see the bit pattern,
they can only
Before people get their panties in a bunch, I'm not dissing Unicode. The point
that I am trying to make is that Unicode will probably never make everyone
happy. It WILL likely become widely accepted, and should offer the best
solution yet to integrating the major character sets into one
On Wed, Jun 06, 2001 at 07:28:45AM -0400, NeonEdge wrote:
If that was the goal, then they failed.
Oh, for heaven's sake, don't be silly. Our goal is to write Perl 6.
We haven't done that yet. That was our goal, so we failed?
--
IT support will, from 1 October 2000, be provided by college and
about such characters, I'd rather see an optional output
discipline that enforces strict Unicode output.
Fair enough.
On the other hand, maybe there's some use for a data structure that is
a sequence of integers of various sizes, where the representation of
different chunks of the array/string
arbitrary
2**n limits to gain performance.)
I'm much more interested in the clean abstraction of a string is a
sequence of integers than I am in the fact that those integers happen
to represent particular characters under Unicode. To be sure, it's
quite handy that those integers do represent
: that, and I have serious reservations about the speed of dealing with
: variable length characters instead of fixed-length ones.
Whether you buy it or not, I wasn't offering it as a mere conjecture.
That is precisely what Perl 5.6+ is already doing for Unicode data.
It's not a big deal unless
Dan Sugalski wrote:
I've been
assuming that an Anything-Unicode translation will be lossless, but this
makes me wonder whether that assumption is correct.
I seem to recall from reading articles on this issue that the issue is
encoding of arrangement: Even with an unlimited number
Russ Allbery wrote:
a caseless character wouldn't show up in
either IsLower or IsUpper.
maybe an IsCaseless is warrented -- or Is[Upper|Lower]
could return UNKNOWN instead of TRUE|FALSE, if the
extended boolean attributes allow transbinary truth values.
On Wed, Jun 06, 2001 at 09:57:58AM -0400, NeonEdge wrote:
Perl 6 cannot assume that Unicode is done.
Don't tell anyone, but it never did.
--
Thus spake the master programmer:
After three days without programming, life becomes meaningless.
-- Geoffrey James, The Tao
Oh, for heaven's sake, don't be silly. Our goal is to write Perl 6.
We haven't done that yet. That was our goal, so we failed?
Don't be ridiculous. With that as our goal, the ONLY way we could fail is to
NEVER write Perl 6. Unicode, on the other hand, was originally released for
public
.
Exactly. Same goes for Unicode. And pretty much every other standard, in
fact. These things evolve. They're *never* done. I don't see anyone
claiming that they *are* done, but I see you telling us that they should
be regarded as a failure if they're not done. Uh, great. So they're
incomplete. So what
:) :
Unicode itself is, like the JIS standard, simply an enumeration of
characters with their orderings; it says nothing about how the data is
represented to the computer, and must be supplemented by one of several
Unicode Transformation Formats which describe the encoding.
However
Courtesy of Slashdot,
http://www.hastingsresearch.com/net/04-unicode-limitations.shtml
I'm not sure if this is an issue for us or not, as we're generally
language-neutral, and I don't see any technical issues with any of the
UTF-* encodings having headroom problems.
I think the author
Firstly, the JIS standard defines, along with the ordering and
enumeration of its characters, their glyph shape. Unicode, on the other
hand does not. This means that as far as Unicode is concerned, there is
literally no distinction between two distinct shapes and hence no way to
specify
On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote:
Is it just me, or does this entire article reduce not to Unicode doesn't
work but Unicode should assign more characters?
Yes. And Unicode has assigned more characters; it's factually challenged.
--
And it should be the law: If you
At 06:22 PM 6/5/2001 +0100, Simon Cozens wrote:
On Tue, Jun 05, 2001 at 10:17:08AM -0700, Russ Allbery wrote:
Is it just me, or does this entire article reduce not to Unicode doesn't
work but Unicode should assign more characters?
Yes. And Unicode has assigned more characters; it's factually
301 - 400 of 491 matches
Mail list logo