UCS-2 and UTF-16 [was Re: Encode, take five]

2000-09-13 Thread Mark Leisher
e the Unicode Standard 3.0 page 19). Combining surrogates constitutes a UCS-4 encoding (or UTF-32 until unavailable 10646 private use regions are removed). ----- Mark Leisher Computing Research LabCinema, radio, telev

Re: Encode, take two

2000-09-13 Thread Mark Leisher
ter. Then chars_to_utf8() and utf8_to_chars() don't need an encoding parameter because they simply convert between Unicode characters and UTF-8. Or is there some other factor I've missed in all the confusion? -----

Re: Encode, take two

2000-09-13 Thread Mark Leisher
le answer :-) ----- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept. 3CRL seeing, listen without hearing. Las Cruces,

Re: Encode, take two

2000-09-12 Thread Mark Leisher
xtraneous. ----- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept. 3CRL seeing, listen without hearing. Las Cruces,

Re: Encode, take two

2000-09-12 Thread Mark Leisher
Perl is free to burp on us. Quite right. Sorry. --------- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people

Re: Encode, take three

2000-09-12 Thread Mark Leisher
My only comment would be that the functions which assume 8859-1 should be removed to avoid the inevitable confusion. Or as some else suggested earlier, changed to use the active system encoding. - Mark Leisher Computing

Re: Encode, take three

2000-09-12 Thread Mark Leisher
; (an actual complaint I received more than once). ----- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept. 3CRL seei

Re: UCS-2 and UTF-16 [was Re: Encode, take five]

2000-09-14 Thread Mark Leisher
ed, the term UTF-32 will be deprecated and the term UCS-4 will be used instead. --------- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of ina

Re: Encode's .enc files and a question

2000-10-25 Thread Mark Leisher
happen as well. Although complicated on the surface, I highly recommend using Tech Report #22 on the Unicode website as a guideline for designing future mapping tables. ----- Mark Leisher Computing Research Lab

.enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 Thread Mark Leisher
unknown characters in the source text or change the 0x's to 0x. --------- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of in

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 Thread Mark Leisher
ly unrecognized Peter> characters? I don't know. I last used Tcl/Tk in the days of tcl7.?/tk4.? and haven't had time to play with anything newer. I do prefer Perl :-) ----- Mark Leisher Computing Research Lab

Re: Encode's .enc files and a question

2000-10-25 Thread Mark Leisher
Unicode and probably in ISO10646. --------- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept. 3CRL

Re: Encode's .enc files and a question

2000-10-25 Thread Mark Leisher
e a while now. But like many of us, I've got a handful of critical projects with hard deadlines to meet. --------- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State Un

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-27 Thread Mark Leisher
Philip> On Thu, 26 Oct 2000, Mark Leisher wrote: >> Following the first page will be all the other pages, each in the same >> format as the first: one number identifying the page followed by 256 >> double-byte Unicode (UCS-2) characters. If a char

Csets 1.7 released

2000-11-03 Thread Mark Leisher
/csets.tar.gz ftp://crl.nmsu.edu/CLR/multiling/character-sets/csets.zip - Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people

Re: Encode's .enc files and a question

2000-10-26 Thread Mark Leisher
Peter> Mark Leisher then replied: >> If the converted string contains 0x, it will be pretty clear the >> source text had bogus characters the moment you display it. Peter> According to Nick's translated doc the first character on the third Peter

Re: Encode's .enc files and a question

2000-10-26 Thread Mark Leisher
Philip> On Wed, 25 Oct 2000, Mark Leisher wrote: >> There may some day be a use for the Unicode codepoint 0x. It might >> be better to make this 0x, which is a guaranteed non-character in >> Unicode and probably in ISO10646. Philip> Isn'

Armenian encoding tables updated

2000-11-13 Thread Mark Leisher
web page. http://crl.nmsu.edu/~mleisher/csets.html These tables will be part of the CSets 1.8 distribution. - Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State

NISO standards now free

2000-11-06 Thread Mark Leisher
ional Standards Institute (ANSI). - Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept.

NISO standards now free

2000-11-06 Thread Mark Leisher
ional Standards Institute (ANSI). - Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept.

RE: Source data for perl encodings

2001-01-08 Thread Mark Leisher
asonable conversion capability at about 1/16 the size of ICU. --------- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look with

Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Leisher
ill get an answer. ----- Mark LeisherOrthodoxy, of whatever color, seems to Computing Research Lab demand a lifeless, imitative style. New Mexico State University Box 30001, Dept. 3CRL -- Politics and the English Languag

Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Leisher
abase. August 1, 2001." --------- Mark LeisherOrthodoxy, of whatever color, seems to Computing Research Lab demand a lifeless, imitative style. New Mexico State University Box 30001, Dept. 3CRL -- Politics and the Engli

Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Leisher
irectly available from ftp://www.unicode.org/Public/3.1-Update1/Unihan-3.1.1.txt.gz. ----- Mark LeisherOrthodoxy, of whatever color, seems to Computing Research Lab demand a lifeless, imitati

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Mark Leisher
the general problems that come up in Indic encodings. ----- Mark Leisher Computing Research LabTelevision has raised writing New Mexico State University to a new low. Box 30001, Dept. 3

Re: [Encode] Farsi is Okay. The problem is in Indics!

2002-04-05 Thread Mark Leisher
he Persian Hamshahri visual encoding. --------- Mark Leisher Computing Research LabTelevision has raised writing New Mexico State University to a new low. Box 30001, Dept. 3CRL -

Re: Unicode::Collate 0.23 Released

2002-09-05 Thread Mark Leisher
Tomoyuki> Unicode::Collate 0.23 is released. Could you remind us where to find it again? Thanks! ----- Mark Leisher Computing Research LabThe mountain remains unmoved at New Mexico State Univers

CSets 1.8 released

2005-05-04 Thread Mark Leisher
relevant information. http://crl.nmsu.edu/~mleisher/csets.html -- --- Mark Leisher Computing Research LabAll political parties die at last of New Mexico State University swallowing their own lies. Box 30001, MSC

CSets 1.9 released

2005-07-25 Thread Mark Leisher
these encodings is not likely to be included in the most popular character set conversion tools (i.e. iconv), so this package was put together to ease conversion of text in obscure character encodings to Unicode and for historical curiosity. Mark Leisher

CSets 2.0 released

2005-10-28 Thread Mark Leisher
welcome. -- ------- Mark Leisher Computing Research LabA sneer is the weapon of the weak. New Mexico State University -- James Russell Lowell (1819-1891) Box 30001, MSC 3CRL Las Cruces, NM 88003

CSets 2.1 released

2007-06-14 Thread Mark Leisher
e mappings not typically found in character set conversion tools available today." As always, I am happy to accept mapping tables/conversion program source code for any other obscure or under-represented encodings. -- Mark Leisher

CSets 2.1 released

2008-05-30 Thread Mark Leisher
ns there, and I will always notify these lists of updates as well. As always, corrections, new mapping tables, information about mappings, and even pointers to things like fonts or texts with odd encodings are gladly accepted. -- Mark Leisher