[unicode] Mail loop at China.com

2001-03-23 Thread John Wilcock

 Received: from china.com (TCE-E-7-182-12.bta.net.cn [202.106.182.12])
   by unicode.org (8.9.3/8.9.3) with SMTP id AAA03398
   for [EMAIL PROTECTED]; Fri, 23 Mar 2001 00:11:40 -0500
 Received: from china.com([10.1.7.101]) by china.com(JetMail 2.5.3.0)
   with SMTP id jm93abafcca; Fri, 23 Mar 2001 06:06:05 -
 Received: from unicode.org([209.235.17.55]) by china.com(JetMail 2.5.3.0)
   with SMTP id jm533aba555b; Thu, 22 Mar 2001 14:58:28 -

China.com seems to have a mail loop. If recent experience of the same
problem on another list is anything to go by, there's little hope of
them fixing the problem.

Is there a function in LISTAR to precent duplicate messages getting
through? If not, could Sarasvati in her wisdom please track down the
offending subscriber(s) at china.com (or a domain hosted by them) and
unsubscribe them...


John.

-- 
-- Over 1500 webcams from ski resorts around the world - http://www.snoweye.com/
-- Translate your technical documents and web pages- http://www.tradoc.fr/en/




[unicode] Malay (Latin) characters in Unicode?

2001-03-23 Thread dvdeug

[Feed another to the shubnet . . .]

I have a copy of Shellbear's Practical Malay Grammar that I'm preparing 
to transcribe for Project Gutenberg. Unfortunately, he represents the 
Malaysian alphabet in a Latin transliteration that includes ng as a 
single ligatured form, and I don't know how to transcribe in Unicode. 
Some ideas: 

(1) Use a private use character. Not feasible, because it needs to readable 
by the average person, not just someone who has patience to set up their 
computer for this one file.

(2) Use a ZWJ between n and g. If I'm not mistaken, most current systems 
will show the ZWJ as a little black box, and there's going to be very 
few systems any time soon that  would  actually display the ng ligature.
Still, a good Unicode system will elide the ZWJ displaying the acceptable 
ng with the real information still in the file.

(3) Petition Unicode for a new character. Right. I'm going to argue 
for a character used in two books (that I know of) that bears 
annoying similarity to the ng (non-ligatured) flame wars, that 
in the best of cases I wait a couple years for it to be accepted.

(4) Resort to ASCII trickery to distinguish between ng (ligatured) and 
ng (non-ligatured). Marking the ng (ligatured) would be ugly; marking
the unligatured would be also ugly, although a lot rarer - I don't know 
if Malay (in this transliteration) uses ng (non-ligatured). 

(5) Just use ng. A simple, just ASCII solution. I don't know if it's 
information preserving though.

Any suggestions?

-- 
David Starner - [EMAIL PROTECTED]
Gutenberg stuff - http://dvdeug.dhis.org/guten/ (down for the week)

Free, encrypted, secure Web-based email at www.hushmail.com


[unicode] Re: removing compromises from unicode (WCode)

2001-03-23 Thread dvdeug

[Hoping the shubnet doesn't got this one too . . .]

WTF-8 could potentially be as compact or more compact than UTF-8 (for 
Greek, Arabic ...), since much of the Latin-1 and Latin Extended A blocks 
aren't needed in WCode. If you moved the other characters down to
fill that space, you might win what you lost to C1 compatibilty. 

I've considered writing up my own WCode (just for the heck of it) before. 
My big fix would be losing ASCII compatibility(!), which allows us to 
remove redundant and ill-defined controls and characters (ASCII 
apostraphe! CF-LF!). Move the basic set of controls (LS, PS, ZWJ, etc.) 
and the basic set of script-neutral punctionation and characters 
(.,:;?!; possibly the Indo-European (Arabic?) digits 0-9) into the 
bottom 128, followed by the combinging characters and then 
the decomposed Latin and so on. Losing ASCII compatibilty is
much more radical than you've proposed, though.

-- 
David Starner - [EMAIL PROTECTED]
Pointless (and temporaily down) webpage: http://dvdeug.dhis.org
Free, encrypted, secure Web-based email at www.hushmail.com


[unicode] Re: Poll of the day

2001-03-23 Thread Michael Everson

At 20:56 -0500 2001-03-22, Sarasvati wrote:
Here by popular demand is the poll of the day...

   http://www.unicode.org/~sarasvati/poll.html


Not Found
The requested URL /~sarasvati/Democratic-Process was not found on this server.


Apache/1.3.14 Server at www.unicode.org Port 80




[unicode] Re: Helpful info

2001-03-23 Thread Marco Cimarosti

Sarasvati wrote:
 TO FIND OUT WHO IS ON A LIST:
   Send a message to the listar account on the server with 
 a subject
   of "who [listname]".  You will receive a list of people 
 subscribed
   to the list who are not hidden.  Admins will be able to see
   everbody, including those hidden.

O, thank you! This is a great great utility for spammers!

A wise postmaster --clearly not the case of Sarasvati-- would have set all
subscribers to "hidden", before enabling this command.

Probably, by now, all our addresses have already been harvested and archived
by all sellers of virility extenders, TV cables descramblers, home working
schemes, hoaxes, etc.

So now they won't need to post to the Unicode list to reach us.

The next thing to do for all of us is to close our Internet mail accounts
and open new ones.

The easiest way to do this for people like me, who imprudently subscribed
their business or academic addresses, is to resign from their job or
university.

The only good thing is that also employer can now bypass the (absurd)
prohibition of sending recruitment postings -- so we'll have an opportunity
of finding another job.

Oh, by the way, and you can bypass any other prohibition. So, people who
need to post GIFs or lengthy quotes now know how to do it: just download the
list and paste it in your "To" field.

Brilliant move, Sarasvati.

_ Marco




[unicode] Re: Moving mail lists

2001-03-23 Thread Sean O Seaghdha

Ar 23 Mar 2001, ag 1:44 scrobh Sarasvati 
fn bhar "Re: Moving mail lists":

 At this moment, there are 691 addresses subscribed to
 the Unicode mail list. At least 24 of those entities
 are points of further fan-out to local lists elsewhere.
 If you can gather a list of at least 346 current
 subscribers who respond affirmatively to the question
 "should Sarasvati remove the [unicode] tag in the
 subject header?", then let it be considered that
 popular opinion is in your favor, and the tag will be
 removed.

Notwithstanding that

(a) I still think this is a stupid, unnecessary and pernicious "innovation" 
and as such should not be considered because of it's inconvenience to those 
who don't require it

and

(b) no such act of democracy was required to *institute* this unnecessary 
change

I'll take the above private response as a request for everyone opposed to or 
in favour of this piece of nonsense to e-mail Sarasvati [EMAIL PROTECTED] 
with your opinion.  I trust Sarasvati will keep us apprised of the tally, the 
proportion of non-voters, etc., etc. and is prepared to arrange for the 
neccessary hand-counts and legal actions that will ensue.

`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~
 S e  n  S  a g h d h a   [EMAIL PROTECTED]

There is only one thing in the world worse than being talked about, 
and that is not being talked about.   Oscar Wilde.




[unicode] Re: UCS-2 Files

2001-03-23 Thread Otto Stolz

Am 2001-03-22 um 14:31 h MEZ hat Tomas McGuinness geschrieben:
 I am currently developing a product that will support UCS-2

For a new project, it would be better to support UTF-16, rather
than UCS-2, from the very beginning. There are already characters
accepted for standardization that can not be encoded in UCS-2.
Cf.
- http://www.unicode.org/unicode/faq/,
- http://www.unicode.org/unicode/faq/utf_bom.html#5,
- http://www.unicode.org/unicode/alloc/Pipeline.html#Characters and
  Scripts Accepted for Unicode.

Best wishes,
  Otto Stolz




[unicode] Re: Moving mail lists

2001-03-23 Thread Roozbeh Pournader


On Fri, 23 Mar 2001, Sean O Seaghdha wrote:

 I'll take the above private response as a request for everyone opposed to or 
 in favour of this piece of nonsense to e-mail Sarasvati [EMAIL PROTECTED] 
 with your opinion.  I trust Sarasvati will keep us apprised of the tally, the 
 proportion of non-voters, etc., etc. and is prepared to arrange for the 
 neccessary hand-counts and legal actions that will ensue.

I don't think so. The kind of reply you had received simply means that
Saravasti will not change it, without even being sorry for people like us
who consider it a pain :(

--roozbeh





[unicode] Re: bytes bits

2001-03-23 Thread Jeff Guevin

Touché by all of you who've corrected my reliance on dictionaries for tech
definitions.





[unicode] Re: UCS-2 Files

2001-03-23 Thread Carl W. Brown

Jeff,

A byte is the least addressable portion of memory.  The IBM 1401 for example
has 6 bit bytes + a word mark.  Parity bits don't count.  A lot of systems
in the 50's and early 60's had 6 bit bytes.  That is why octal became so
popular.

Bytes were not used for systems like the IBM 1620 which was a scientific
system.  Memory was an array of number registers and was not character
based.  Instead the least addressable memory unit was a word.

A byte may be 8 bits now but it was not always 8 bits.

Carl


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Jeff Guevin
Sent: Thursday, March 22, 2001 12:01 PM
To: [EMAIL PROTECTED]
Subject: [unicode] Re: UCS-2 Files



 On Thu, 22 Mar 2001, [EMAIL PROTECTED] wrote:

 Better if you also keep the distinction between "octet" (a series of
 8 bits) and "byte" (a series of n bits, where n is often but NOT
 always 8).

 When is a byte not eight bits?


The Web version of the Oxford English Dictionary (http://dictionary.oed.com)
says a byte is always eight bits:

"A group of eight consecutive bits operated on as a unit in a computer."

1964 BLAAUW  BROOKS in IBM Systems Jrnl. III. 122 An 8-bit unit of
information is fundamental to most of the formats [of the System/360]. A
consecutive group of n such units constitutes a field of length n.
Fixed-length fields of length one, two, four, and eight are termed bytes,
halfwords, words, and double words respectively. 1964 IBM Jrnl. Res. 
Developm. VIII. 97/1 When a byte of data appears from an I/O device, the CPU
is seized, dumped, used and restored. 1967 P. A. STARK Digital Computer
Programming xix. 351 The normal operations in fixed point are done on four
bytes at a time. 1968 Dataweek 24 Jan. 1/1 Tape reading and writing is at
from 34,160 to 192,000 bytes per second.

 --
 Gaute Strokkeneshttp://www.srcf.ucam.org/~gs234/
 PEGGY FLEMING is stealing BASKET BALLS to feed the babies in VERMONT.









[unicode] Re: UCS-2 Files

2001-03-23 Thread Carl W. Brown

Marco,

I find that people often understand it better when you get away from bytes,
octets etc. and describe Unicode strings as an array of unsigned short (16
bit unsigned integers) in the same manner as single byte characters are an
array of 8 bit integers.  This way the only time you have to deal with
endian issues is when you deal with the memory or transmission layout of the
data.  This also helps when you get into null terminated strings.  You can
not terminate a Unicode string with a byte null, it has to be a full 16 bit
character.

Carl

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Marco Cimarosti
Sent: Thursday, March 22, 2001 7:03 AM
To: 'Tomas McGuinness'; [EMAIL PROTECTED]
Subject: [unicode] Re: UCS-2 Files



Tomas McGuinness wrote:
 I have a question relating to UCS-2. I am currently
 developing a product
 that will support UCS-2 and I have been sent several
 documents encoded in
 UCS-2. I have no reader or writer for UCS-2 but I have
 performed Hexdumps in
 UNIX. At the beginning of the UCS-2 characters there are two rogue
 characters 0xFF and 0xFE. Have these characters any importance?

They are quite important, yes. See
http://www.unicode.org/unicode/faq/utf_bom.html#24 for details.

But, beware that they are NOT characters: they are OCTETS (also known as
"bytes")!

The first thing that I'd suggest you to do when starting working with
Unicode and other character sets is to carefully disjoining the terms "byte"
and "character". Better if you also keep the distinction between "octet" (a
series of 8 bits) and "byte" (a series of n bits, where n is often but NOT
always 8).

In brief, those two octets tell you that:

1.  It is an Unicode text file.

2.  It is in format UCS-2, UTF-16, or UTF-32 (to determine whether it is
UTF-32 you need to read the next two octets: if they are 0x00 0x00, then it
is UTF-32. Else it is either UCS-2 or UTF-16, which basically you don't need
to distinguish).

3.  The 16-bit units are little endian, so you have to interpret these
two octets as (0xFF + 0xFE * 256), which yields 0xFEFF, the code of the
"BOM".

4.  All subsequent pairs of octets a,b are interpreted the same way: (a
+ b * 256).

Regards.
_ Marco





[unicode] CJK dictionary

2001-03-23 Thread grozsa11

Do you know CJK (kandji) dictionary with unicode codings?

(This range's english meaning:
4E00CJK Ideograph, First
9FA5CJK Ideograph, Last
F900CJK Compatibility Ideograph, First
FA2DCJK Compatibility Ideograph, Last)

Thanks. 
Gza

-
Rozsa Geza
432-8279, (30)996-0007;
SMSre a subject: [EMAIL PROTECTED]




[unicode] Re: UCS-2 Files

2001-03-23 Thread J M Sykes


 A byte may be 8 bits now but it was not always 8 bits.

Au contraire!

It was the designers of System/360 who invented the word "byte" to mean the
smallest addressable unit of storage, in their case 8 bits. It is others who
have appropriated the word for their own purposes, as has happened with so
many words since language was invented.

Remember Humpty Dumpty!

Mike.







[unicode] Re: Poll of the day

2001-03-23 Thread Michael Everson

At 09:21 -0800 2001-03-23, Michael \(michka\) Kaplan wrote:

After all, no one at all is claiming incompetence on the part of our
ever-vigilant Bubble Queen of the River Ganga, but some people are 
talking about how much they preferred the way our effervescent but 
bitwise conservative used to do things.

If it ain't broke, don't fix it. I fail to see any utility in this 
newfangled "[unicode]" appendage.
-- 
Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire




[unicode] Re: Helpful info

2001-03-23 Thread Otto Stolz

Sarasvati had wriiten:
 TO FIND OUT WHO IS ON A LIST...

Marco had written:
 O, thank you! This is a great great utility for spammers!

[EMAIL PROTECTED] wrote:
 Marco: you evidently missed this line in her message:
  The list of subscribers is not available.

Because of this very sentence, I have tested the Who command, and
guess what? On Fri, 23 Mar 2001 08:45:18 -0500 (EST),  Listar
[EMAIL PROTECTED] happily sent me a list of  686 subscribers
to the Unicode list.

Now I am a subscriber myself, and the Who command may well be re-
stricted to subscribers. But even this restriction would not be
safe, as the recently detected spamming technique shows.

So, it would be a good idea to disable the Who command for the
Unicode list.

Best wishes,
  Otto Stolz




[unicode] Re: Helpful info

2001-03-23 Thread Sarasvati

Marco,

Thank you for your valuable and candid opinion.
I appreciate your confidence in my intelligence.

Let me point out, however, that I specifically
wrote in my NOTES about the helpful info:

 The list of subscribers is not available.

as Peter Constable has already pointed out.

Cheery regards from your effervescent,
 -- Sarasvati




[unicode] Re: Poll of the day

2001-03-23 Thread Michael \(michka\) Kaplan

Unfortunately, anyone who felt strongly about things could easily (and
reasonably) take it another way.

After all, no one at all is claiming incompetence on the part of our
ever-vigilant
Bubble Queen of the River Ganga, but some people are talking about how much
they preferred the way our effervescent but bitwise conservative used to do
things.

Unfortunately, our cheery (and occasionally quite cheeky, as that "poll"
clearly proved!) but effervescent Sarasvati has decided that the way things
used to be is some way not preferred, and that the list server of our
forward looking group must cater to the needs of older systems (which
apparently were never working adequately up till now?).

At least they still allow people to vote at UTC meetings. :-)

michka


- Original Message -
From: "Carl W. Brown" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, March 23, 2001 8:13 AM
Subject: [unicode] Re: Poll of the day


 Adam,

 I think that the poll was not arrogant but a little fun to break the
 tension.

 Carl

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of G. Adam Stanislav
 Sent: Friday, March 23, 2001 4:38 AM
 To: Michael Everson; [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: [unicode] Re: Poll of the day


 At 09:48 23-03-2001 +, Michael Everson wrote:
  http://www.unicode.org/~sarasvati/poll.html
 
 
 Not Found
 The requested URL /~sarasvati/Democratic-Process was not found on this
 server.

 That was the whole point, I believe. Since there were only two choices,
 and both identical, there is no need to actually process the form.

 The poll offered what we used to call during the Communist era the
 Paradise Choice: God brought Eve to Adam and said: "Choose one."

 That Sarasvati wants to do things her way is fine by me. That she made
 me log into the Internet, launch my browser, just to get to the mockery
 of the poll.html and no Democratic-Process, is a slap in the face. The
 Consortium is as arrogant as ever.

 Adam
 ---
 Whiz Kid Technomagic - brand name computers for less.
 See http://www.whizkidtech.net/pcwarehouse/ for details.








[unicode] Re: Helpful info

2001-03-23 Thread Ayers, Mike


 Because of this very sentence, I have tested the Who command, and
 guess what? On Fri, 23 Mar 2001 08:45:18 -0500 (EST),  Listar
 [EMAIL PROTECTED] happily sent me a list of  686 subscribers
 to the Unicode list.

Paranoid, I just tried the same and got:

SNIP
List context changed to 'unicode' by following command.
 who unicode
List membership is only viewable by list admins.

Valid command was found in subject field, body won't be checked for further
commands.

---
Listar v1.0.0 - job execution complete.
/SNIP

Otto, could you try again, please?


/|/|ike




[unicode] Re: What is Unicode?

2001-03-23 Thread Richard Cook

Another web page, for your collective amusement:

http://linguistics.berkeley.edu/~rscook/html/Unicode-tetralog.html




[unicode] Re: Moving mail lists

2001-03-23 Thread Sean O Seaghdha

Ar 21 Mar 2001, ag 11:58 scrobh [EMAIL PROTECTED] 
fn bhar "[unicode] Re: Moving mail lists":

 Those whom can filter their mail also can alter the subject line easily
 with, for example, small perl script. 

Since this is so easy, could you send me one?

`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~:.,.:'^`~
 S e  n  S  a g h d h a   [EMAIL PROTECTED]

The only way to get rid of a temptation is to yield to it.Oscar Wilde.




[unicode] Re: Poll of the day

2001-03-23 Thread G. Adam Stanislav

At 08:13 23-03-2001 -0800, Carl W. Brown wrote:
Adam,

I think that the poll was not arrogant but a little fun to break the
tension.

I'd have probably been more amused had I not been logged off the Internet
at the time I was reading the message announcing the poll: I logged on,
loaded the browser, etc, just to participate in the poll.

Adam




[unicode] Re: Moving mail lists

2001-03-23 Thread Mark Leisher


 Those whom can filter their mail also can alter the subject line easily
 with, for example, small perl script.

Sean Since this is so easy, could you send me one?

% perl -ne 's/\[unicode\]// if (/^Subject:/);' messagefile
-
Mark Leisher  Times are bad.  Children no longer obey
Computing Research Labtheir parents, and everyone is writing
New Mexico State University   a book.
Box 30001, Dept. 3CRL-- Marcus Tullius Cicero
Las Cruces, NM  88003




[unicode] Re: Moving mail lists

2001-03-23 Thread Mark Leisher

Oops.  I missed a spot.

% perl -ne 's/\[unicode\]// if (/^Subject:/); print;' messagefile
-
Mark Leisher  Times are bad.  Children no longer obey
Computing Research Labtheir parents, and everyone is writing
New Mexico State University   a book.
Box 30001, Dept. 3CRL-- Marcus Tullius Cicero
Las Cruces, NM  88003




[unicode] Re: removing compromises from unicode (WCode)

2001-03-23 Thread Michael \(michka\) Kaplan

From: "Jonathan Coxhead" [EMAIL PROTECTED]

It would be very entertaining to do the same job with the ideographs
(down
 to the radical level) and count the number of atoms. I suspect the
resulting
 "character set" would contain less than 2000 atoms altogether.

More than just entertaining, one would definitely find the space saved to be
about 1000 times the work of the other decompositions. Addressing the
non-CJK and ignoring the CJK is like fixing app performance at boot and
ignoring the entire rest of the app's lifetime!

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






[unicode] Re: removing compromises from unicode (WCode)

2001-03-23 Thread Jonathan Coxhead

   It would be very entertaining to do the same job with the ideographs (down
to the radical level) and count the number of atoms. I suspect the resulting
"character set" would contain less than 2000 atoms altogether.

   MichKa replied ...

 More than just entertaining, one would definitely find the space saved to be
 about 1000 times the work of the other decompositions. Addressing the
 non-CJK and ignoring the CJK is like fixing app performance at boot and
 ignoring the entire rest of the app's lifetime!

   "Space saved"? Good heavens, I never had anything as mundane as "disc space" 
(or "performance" for that matter) in mind. It was purely an exercise in 
concepts and relationships, with no other goal in mind.

   Fun, for some value of "fun" :-)





[unicode] Re: Malay (Latin) characters in Unicode?

2001-03-23 Thread dvdeug

At Fri, 23 Mar 2001 00:13:33 -0800, Rick McGowan [EMAIL PROTECTED] wrote:
David Starner wrote:

 I have a copy of Shellbear's Practical Malay Grammar that I'm preparing
 to transcribe for Project Gutenberg. Unfortunately, he represents 
the
 Malaysian alphabet in a Latin transliteration that includes ng as 
a
 single ligatured form, and I don't know how to transcribe in Unicode.

Could you perhaps post or point to a picture of what it looks like? 
 I  
suppose it's an "N" with a loopy tail of some type.

More like rg. A picture is attached.

The character you are looking for is probably U+014B in lowercase or 
 
U+014A in uppercase.  I would be rather surprised if that's not what 
you're  
looking for.

It's not exactly what I was looking for. I may just use it and make a
note that the glyph is probably not exactly right.

BTW, a bit off topic here but: I think it's high time that Project  
Gutenberg adopted some very clear character encoding guidelines now 
that  
they're expanding so widely.  Or have they already adopted them and 
I've  
just missed the policy statement...?  They're in for a real mess if 
they  
don't specify character encodings in a very controlled way.

At some points, they are already a real mess. You can dig 
through Gutenberg archives and find various (unlabeled) 
encodings for the Latin-1 coverage. There's at least one 
Japenese document that just says "you need a Japenese 
OS to read this." 8-bit documents are usually labeled as
8-bit, without any indication of encoding.

OTOH, the policy of doing everything possible in ASCII has
saved Gutenberg some problems. They're moving towards
Unicode for any files that need it. The Bulgarian files are 
clearlly labeled windows-1251, which is at least as start.

See
ftp://metalab.unc.edu/pub/docs/books/gutenberg/GUTINDEX.02
and GUTINDEX.01 for recent examples. Most of the unmarked
stuff is ASCII, but there's a number of clearly Unicode marked
and "8-bit German" marked files.

-- 
David Starner - [EMAIL PROTECTED]
Free, encrypted, secure Web-based email at www.hushmail.com
 R_T_malay_ng.png


[unicode] Reading mojibake

2001-03-23 Thread 11digitboy

I taught myself to read a bit of SJIS mojibake, partly
from studying the scrambled output of my clock program
with fullwidth digits. (glitch plus O = 0. Glitch plus
P = 1, etc., I think) Anyone else here can read mojibake?
What is the English word for mojibake?
Isn't Unicode mojibake three mojibake per character,
rather than just two like in SJIS? How do you fix this,
anyway? Like if you have a lot of Unicode text, so
you don't need the extra byte.
Maybe 2 1/2 bytes per character would be good. I mean
for the extra planes and all. You guys could make a
script for this in 10 minutes.
*** JUUICHIKETAJIN ***




___
Get your own FREE Bolt Onebox - FREE voicemail, email, and
fax, all in one place - sign up at http://www.bolt.com