Re: Unicode Public Review Issues update (braille)

2003-10-07 Thread Mark E. Shoulson
Kent Karlsson wrote: The original model for these was that your text processing is done in non-Braille, and on the last leg to a device, you would transcode the regular text to a Braille sequence using a domain and language specific mapping. Having the codes in Unicode allows you to preserve

Re: Cursor movement in Hebrew, was: Non-ascii string processing?

2003-10-09 Thread Mark E. Shoulson
Peter Kirk wrote: On 08/10/2003 21:55, Jungshik Shin wrote: ... I've got a question about the cursor movement and selection in Hebrew text with such a grapheme (made up of 6 Unicode characters). What would be ordinary users' expectation when delete, backspace, and arrow keys(for cursor

Public Review Issue #23

2003-10-09 Thread Mark E. Shoulson
Looking over the Public Review Issues... trying to scramble up the learning curve and make sense of some of what it's talking about... Here's a comment. I think U+05C3 HEBREW PUNCTUATION SOF PASUQ should probably also be in Sentence_Terminal. I suppose it's true that there are Biblical verses

Re: Beyond 17 planes, was: Java char and Unicode 3.0+

2003-10-16 Thread Mark E. Shoulson
Philippe Verdy wrote: Due to that, there's a big risk that PUAs start being permanently assigned as part of a OS core charset, and that data created on distinct systems become mutually incompatible as they are using colliding subsets of PUAs (this is already the case in core fonts and script

Re: Klingons and their allies - Beyond 17 planes

2003-10-17 Thread Mark E. Shoulson
John Cowan wrote: Jill Ramonsky scripsit: It seems a simple enough case to argue - EITHER the 0x11 character space is amply big enough for everyone, as John Cowan asserts. Big enough for everyone, but not for everything. Encoding Klingon has a cost beyond the allocation of

Re: Klingons and their allies - Beyond 17 planes

2003-10-17 Thread Mark E. Shoulson
Rick McGowan wrote: Jill Ramonsky wrote... It seems to me that if 0x11 codepoints isn't a big enough space to fit in the Klingon alphabet (and other alphabets which were similarly rejected) then we need more codepoints. Simple as that. Rejection of Klingon has *absolutely* nothing

Re: Klingons and their allies - Beyond 17 planes

2003-10-17 Thread Mark E. Shoulson
That's what I mean. We'd better shut down the list. ~mark Peter Constable wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark E. Shoulson Doesn't *everyting* take time from other proposals? You mean, discussions like

Re: Klingon vs. Ogham

2003-10-17 Thread Mark E. Shoulson
Michael Everson wrote: It strikes me that the controversy about Klingon has more to do with its fictional origins than number of users. Is this not true? I don't think so. We will certainly encode Tengwar and Cirth, which have corpora of documents in them. Klingonists universally prefer

Re: Klingons and their allies - Beyond 17 planes

2003-10-17 Thread Mark E. Shoulson
John Cowan wrote: Mark E. Shoulson scripsit: I'm attaching a screenshot of http://www.kli.org/QQ/QQ0202.html?mode=UTF which SHOULD be a Unicode encoding. This is with Mozilla 1.4 and Code2000. Even people who can read pIqaD can't read this. The qapla' page works okay, but note that only

Re: Klingon vs. Ogham

2003-10-17 Thread Mark E. Shoulson
John Hudson wrote: At 11:59 AM 10/17/2003, Mark E. Shoulson wrote: And of course, there's all that discussion on Tolkien languages online... all of which used Latin transliteration (with slightly varying standards too: accented vowels, doubled vowels, tripled sometimes, etc etc. More than

Re: Klingon vs. Ogham

2003-10-19 Thread Mark E. Shoulson
Michael Everson wrote: The example of Shavian might eventually be precedent for Klingon to be encoded, but for the present one web page on the KLI's own web site does not seem to me to be sufficient evidence to meet the usage requirements. Press on, Mark. I plan to. I've been collecting

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Mark E. Shoulson
Jony Rosenne wrote: While the current combining classes may cause some difficulties for Biblical scholars (and this isn't cut and dry yet - it isn't certain whether these are Unicode problem, implementation problems, missing characters or mis-identified characters), I have yet to see a claimed

Re: Merging combining classes, was: New contribution N2676

2003-10-26 Thread Mark E. Shoulson
I remembered there was a lot of discussion about this case, which is why I brought it up. Can someone remind me why ZWNBSP would be Bad for this? Wrong RTL coding? (possibly, but it's weak, isn't it) Wrongly indicates a word-break? (this is probably a problem.) ~mark John Hudson wrote: At

Unicode Filk

2003-10-28 Thread Mark E. Shoulson
Message contents removed due to DMCA takedown letter. See http://www.chillingeffects.org/ for more information about the DMCA.

Re: [hebrew] Re: Hebrew composition model, with cantillation marks

2003-10-30 Thread Mark E. Shoulson
Peter Kirk wrote: On 28/10/2003 18:49, Philippe Verdy wrote: I just finished an Excel speadsheet that shows the Hebrew composition model, and all the problems caused by the canonical order of Hebrew diacritics. In summary, most problems come from consonnant modifiers which have a combining

Re: Hexadecimal digits?

2003-11-08 Thread Mark E. Shoulson
When I first heard about hexadecimal, I thought that using A-F for digits lacked imagination, and risked confusion with letters besides. I made up a set of digits, as I recall, and even names for them. I'm not completely convinced this is a bad idea. But it's likely. ~mark Michael Everson

Re: Berber/Tifinagh

2003-11-09 Thread Mark E. Shoulson
Philippe Verdy wrote: From: Michael Everson [EMAIL PROTECTED] At 17:54 +0100 2003-11-09, Philippe Verdy wrote: From: Michael Everson [EMAIL PROTECTED] When we encode Tifinagh we will encode Tifinagh. We will not meta-encode it for ease of transliteration to other scripts.

Re: Transliterating font

2003-11-09 Thread Mark E. Shoulson
Chris Jacobs wrote: As long as the font is explicitly advertized as a 'font with built-in transliterator', as long as the people know that what you see is not what is in the text, this seems to me indeed a good idea. Would be nice for Klingon too :-) Got one already. Several, really.

Re: Berber/Tifinagh

2003-11-10 Thread Mark E. Shoulson
Michael Everson wrote: At 10:14 -0800 2003-11-10, Curtis Clark wrote: Why isn't Latin Serbian just Cyrillic Serbian with funny glyphs? Because Latin and Serbian are self-evidently different scripts. I'm not trying to be intentionally dense here; Theban English and Serbian are different in

Re: Berber/Tifinagh (was: Swahili Banthu)

2003-11-10 Thread Mark E. Shoulson
Kenneth Whistler wrote: Philippe Verdy wrote: You seem to forget that Tifinagh is not a unified script, but a set of separate scripts where the same glyphs are used with distinct semantic functions. I think Philippe is running off the rails here. Tifinagh is a script. It comes in a

Re: Ciphers (Was: Berber/Tifinagh)

2003-11-11 Thread Mark E. Shoulson
Doug Ewell wrote: I think such a collection of symbols A becomes a cipher for a true script B when it replicates the usage of symbols in B, irregularities and all. In the Pigpen cipher, there is a symbol for C and one for T and one for H, and C+H and T+H are slapped together *exactly* as they

Re: Hexadecimal digits?

2003-11-11 Thread Mark E. Shoulson
Jill Ramonsky wrote: ...the original issue of _whether or not there should exist Unicode characters for which IsDigit() returns true and for which GetDigitValue() returns values in the range ten to fifteen_. If/when Tengwar gets coded, it will have digits for 10 and 11, as it uses base-12. I

Re: Hexadecimal digits?

2003-11-11 Thread Mark E. Shoulson
Doug Ewell wrote: jameskass at att dot net wrote: ... and not one which somehow converted James' UTF-8 into Mojibake as above. This may be the fault of my ISP, the illustrious ATT's Webmail. It may not properly tag my outgoing messages as UTF-8. A colleague has written privately to

Re: creating a test font w/ CJKV Extension B characters.

2003-11-20 Thread Mark E. Shoulson
I haven't tested this myself, but from a look at the source code, it appears that pfaedit (pfaedit.sourceforge.net) can generate format12 TTFs. (Open Source, for UNIX). ~mark On 11/20/03 03:12, Arcane Jill wrote: Is anyone able to answer this? I for one would really like to know. Thanks

Re: Compression through normalization

2003-11-24 Thread Mark E. Shoulson
On 11/24/03 01:26, Doug Ewell wrote: So the question becomes: Is it legitimate for a Unicode compression engine -- SCSU, BOCU-1, or other -- to convert text such as Hangul into another (canonically equivalent) normalization form to improve its compressibility? OK, this *is* a fascinating

Re: Unicode 4.0 Poster

2003-11-24 Thread Mark E. Shoulson
Neat. Not only did the version of Opera I happened to have fail to open it, but my whole X server crashed. (Guess someone has a fragile setup). ~mark, in a fresh session On 11/24/03 21:53, Mark Davis wrote: I remembered that I had done something with making a Unicode Poster some time ago.

Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

2003-11-24 Thread Mark E. Shoulson
On 11/24/03 20:56, Christopher John Fynn wrote: Peter Kirk [EMAIL PROTECTED] wrote: This approach would certainly have simplified pointed Hebrew a lot, so much so that it could well be serious. After all, Ethiopic was encoded as a syllabary just because the vowel points happen to have become

Re: The Chart

2003-11-25 Thread Mark E. Shoulson
I tried the chart where I teach, on RedHat Linux 9 and Mozilla 1.2 or 1.4 (I forget which) and it came through fine, if small. ~mark On 11/25/03 15:05, John Cowan wrote: Mozilla Firebird 0.7/WinXP had no problem with the Chart, though it was a little slow to open and even slower to print it.

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark E. Shoulson
Shouldn't it permit assa and aßa to co-exist? It isn't like ß is canonically equivalent to ss (if I read the file aright, it isn't even compatibility equivalent). It's a language-dependent choice to regard them as equivalent. I'd guess that should be the responsibility of the de_DE

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark E. Shoulson
On 12/01/03 09:57, Arcane Jill wrote: I believe that A is not canonically equivalent to a, but you still can't have filenames A and a coexisting in the same Windows folder. This is a consequence of having a case-insensitive filesystem. As to whether or not the case-equivalence of ss and ß

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark E. Shoulson
On 12/01/03 11:46, Mark Davis wrote: It is useful to read the standard before asserting something about it. If you don't have a hard-copy of the standard, you can always consult the online version. In this case, see 3.13 Default Case Operations in

Re: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Mark E. Shoulson
On 12/02/03 18:32, Philippe Verdy wrote: One way to achieve this is to only allow embedding of embeddable fonts within unmodifiable documents. This means a export for publication function in word processors, which should be the only way to create first a unmodifiable and signed document content,

Re: MS Windows and Unicode 4.0 ?

2003-12-02 Thread Mark E. Shoulson
I particularly like the use of U+E631 SEUSS LETTER WUM for the PUA. ~mark On 12/02/03 14:03, Michael Everson wrote: At 10:35 -0800 2003-12-02, John Hudson wrote: Have you looked at the Apple Last Resort font? Knowing from what character block an unsupported character comes is handy, but I

Re: Glottal stops (bis) (was RE: Missing African Latin letters (bis))

2003-12-06 Thread Mark E. Shoulson
On 12/05/03 21:00, Michael Everson wrote: At 17:39 -0800 2003-12-05, Kenneth Whistler wrote: Peter, For those situations in which unmarked-case glottal has been used, I think it would cause the least confusion to leave 0294 as a cap-height glyph, and call it upper case. I don't have time

Re: Fwd: Re: Transcoding Tamil in the presence of markup

2003-12-07 Thread Mark E. Shoulson
On 12/07/03 07:25, Peter Jacobi wrote: [..] Display engines need to do a better job of applying style to individual reordrant glyphs, that's all. I fully agree with this, Do you know any display engine which is capable of this? I also agree, but I point out that the sufficiently perverse

Re: Fwd: Re: Transcoding Tamil in the presence of markup

2003-12-07 Thread Mark E. Shoulson
On 12/07/03 08:55, Peter Jacobi wrote: Hi Mark, All, I also agree, but I point out that the sufficiently perverse could come up with some pretty tough examples. Applying color is a pretty benign style, but what if I wanted a boldface circumflex on a normal letter? Or even more obnoxious,

Re: New symbols (was Qumran Greek)

2003-12-08 Thread Mark E. Shoulson
On 12/08/03 18:21, Kenneth Whistler wrote: And not complete. That is simply the draft for the PDAM (preliminary draft amendment) to 10646. It will be subject to national ballot comments, which will, no doubt, result in further additions, as well as some minor modifications to what is currently

Re: Fwd: Re: Transcoding Tamil in the presence of markup

2003-12-08 Thread Mark E. Shoulson
On 12/08/03 14:16, Peter Constable wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark E. Shoulson (and now I contradict myself with a counterexample. In http://omega.enstb.org/yannis/pdf/biblical-hebrew94.pdf, Yannis Haralambous notes--correctly--that when

Re: Glottal stops (bis) (was RE: Missing African Latin letters (bis))

2003-12-09 Thread Mark E. Shoulson
On 12/09/03 02:26, Peter Constable wrote: From: [EMAIL PROTECTED] on behalf of Kenneth Whistler Nobody is agitating for an uppercase apostrophe. Not in Canada, that I know of. (I've seen indication of languages in Russia that have a case distinction for ' and possible also .) Early

Re: Swastika to be banned by Microsoft?

2003-12-14 Thread Mark E. Shoulson
On 12/14/03 07:26, Michael Everson wrote: The following story was forwarded to me. The offending characters in question are, I take it, the left-facing and right-facing swastika symbols, often used in Tibetan, found among the Chinese ideographs at U+534D (yung-drung-chi-khor) and U+5350

[Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-14 Thread Mark E. Shoulson
? Steven Heller, ISBN 1-588115-041-5 ~mark On 12/14/03 11:11, Elliotte Rusty Harold wrote: At 10:28 AM -0500 12/14/03, Mark E. Shoulson wrote: I'm embarrassed to admit it, but I find myself thinking that the swastika, THE Nazi swastika, right-facing, tilted 45, proper ratio of stroke-thickness

Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread Mark E. Shoulson
On 12/15/03 08:42, [EMAIL PROTECTED] wrote: Holocaust scholars wanting to encode German documents from the 1930s and 1940s would want the double runic S encoded, since this was a specific character found on type-writers of the era and saw regular use. Would U+16CB U+16CB be a reasonable

Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread Mark E. Shoulson
On 12/15/03 07:54, Tom Emerson wrote: Holocaust scholars wanting to encode German documents from the 1930s and 1940s would want the double runic S encoded, since this was a specific character found on type-writers of the era and saw regular use. A proposal to encode this was shot down a few years

Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread Mark E. Shoulson
On 12/15/03 09:43, Mark E. Shoulson wrote: Is this like baseball scoreboards showing the third consecutive strikeout symbol (which is a K) reversed? Is that to avoid KKK or is it for another reason? Which of course begs the question of whether we should encode a LATIN CAPITAL REVERSED K

Re: [OT reversing letters to avoid offence] Re: [Fwd: Re: Swastika to be banned by Microsoft?]

2003-12-15 Thread Mark E. Shoulson
On 12/15/03 11:28, [EMAIL PROTECTED] wrote: With the runes though it isn't just double sigels that have the second mirrored, but all double letters. FWIW not only are the sources I learnt this from not reliable on the history of the Futhark, being concerned only with the modern occult use, but

Re: Swastika to be banned by Microsoft?

2003-12-15 Thread Mark E. Shoulson
On 12/15/03 12:10, Michael Everson wrote: I am not certain that the existing code position is satisfactory for non-CJK use. That is, Tibetan, Norse, Native American, Scouting use, and so on. Those NEVER show Han brush-stroke shapes. I would like to see some discussion about whether the

Re: Case mapping of dotless lowercase letters

2003-12-18 Thread Mark E. Shoulson
On 12/18/03 06:54, Peter Kirk wrote: You will find that the spellings mill (19,600 Google matches) and milli (709,000 matches, but not all are Turkish) are interchangeable, but mill is rare (52 matches) and so probably an error. Wouldn't ?mill be a violation of Turkish vowel-harmony rules?

Re: [OT] Keyboards (was: American English translation of character names)

2003-12-19 Thread Mark E. Shoulson
On 12/19/03 03:05, Arcane Jill wrote: Another minor US/UK difference is that shift 2 is double quotes in England, not @. I remember my old TRS-80 had double-quotes on shift-2 as well. I half-remember that it had something to do with the bit-patterns, so the shift key could work by applying a

Re: Aramaic unification and information retrieval

2003-12-24 Thread Mark E. Shoulson
On 12/23/03 19:40, Philippe Verdy wrote: Could you instead take the time to work on the missing Latin letters for African languages? Why isn't there any serious work about these living languages that don't have lot of universitary support and nearly no computer resources in Africa to make this

Re: why Aramaic now

2003-12-24 Thread Mark E. Shoulson
On 12/24/03 15:02, Elaine Keown wrote: Some of the sets of symbols I found---which I simply assumed could be added to Hebrew--are innately controversial because of the Roadmap. What sorts of things do you mean, Elaine? Innately controversial sounds like a pretty strong term, and while I

Re: why Aramaic now

2003-12-25 Thread Mark E. Shoulson
On 12/25/03 16:46, Elaine Keown wrote: Elaine Keown Dear Mark and List: Some of the sets of symbols I found--- snip --are innately controversial because of the Roadmap. Examples of innately controversial for Mark: I think Hebrew's been written since 1,150 B.C. But at

Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)

2003-12-26 Thread Mark E. Shoulson
On 12/26/03 09:57, Michael Everson wrote: Every historian of writing describes the various scripts *as* scripts, and recognizes them differently. We have bilinguals where people are distinguishing the scripts in text; we have discussion, for instance in the Babylonian Talmud, specifically

Re: Aramaic unification and information retrieval

2003-12-27 Thread Mark E. Shoulson
On 12/26/03 15:27, Michael Everson wrote: At 17:46 + 2003-12-26, Christopher John Fynn wrote: (Though the Roman style Fraktur style of Latin script are probably more different from each other as some of the separately encoded Indic scripts [e.g. Kannada / Telugu]) Sorry, Chris, this is

Re: [hebrew] Re: Ancient Northwest Semitic Script

2003-12-28 Thread Mark E. Shoulson
On 12/28/03 18:34, Peter Kirk wrote: It is very interesting to me that there does seem to have been a glyph distinction (though a very subtle one) between sin and shin, in the serech example (http://orion.mscc.huji.ac.il/orion/programs/Altman/serech.jpg) of what is undoubtedly (in Unicode

Re: unicode Digest V4 #3

2004-01-05 Thread Mark E. Shoulson
On 01/05/04 08:04, Philippe Verdy wrote: Regarding dotless-i-with hook... and case mappings with each other. Both solutions maintains the distinction with Latin oi (gha) and with the latin soft sign (small b). Can we leave OI/gha out of this? Near as I can tell the *only* relevance it has to

Re: Detecting encoding in Plain text

2004-01-13 Thread Mark E. Shoulson
On 01/13/04 05:40, Marco Cimarosti wrote: Peter Kirk wrote: This one also looks dangerous. What do you mean by dangerous? This is an heuristic algorithm, so it is only supposed to work always but only in some lucky cases. If lucky cases average to, say, 20% or less then it is a bad and

Re: Cuneiform - Dynamic vs. Static

2004-01-14 Thread Mark E. Shoulson
I had a problem with this too, for a while (previous discussion on this list helped clear it up). Klingon letters had been placed in the PUA by the CSUR (ConsScript Unicode Registry, an unofficial allocation of PUA space to constructed alphabets), based on the PUA assignment of the Linux

Re: Klingon

2004-01-15 Thread Mark E. Shoulson
No, not because the font uses the PUA. Non-conformant because the font does *NOT* use the PUA. Lawrence Schoen (who made the font) put the Klingon letters as used for tlhIngan-Hol on the uppercase latin letters (with some modifications to deal with the digraph and trigraph letters, and the

Re: ConScript

2004-01-15 Thread Mark E. Shoulson
On 01/15/04 10:43, Doug Ewell wrote: I'm hoping that future ConScript assignments to the BMP PUA continue to start at the low end and work their way up, as almost all do already, instead of starting from the top like Klingon and Aiha. I do understand the assignment of Klingon from U+F8D0 to

Re: Klingon

2004-01-15 Thread Mark E. Shoulson
On 01/15/04 12:27, Philippe Verdy wrote: From: [EMAIL PROTECTED] Michael Everson scripsit: At 14:53 +0100 2004-01-15, Chris Jacobs wrote: WHY THEN DISTRIBUTES THE KLI SUCH A BLATANTLY UNCONFORMANT FONT? yIjachQo'. vItlhob. {{{:-) Demonstrating once again that the

Re: Samaritan shan symbol

2004-01-16 Thread Mark E. Shoulson
On 01/16/04 07:33, Peter Kirk wrote: Michael, you seem to have written shan rather than shin twice independently in the subject line, so presumably this is not a typo. Do you actually hold that the letter is called shan rather than shin? Do you have any evidence for this? Are you basing this

Latin Theta?

2004-01-28 Thread Mark E. Shoulson
I was playing around with making my very own IPA keyboard, and I discovered to my surprise that Unicode has no Latin Small Theta (for IPA). We have LATIN SMALL LETTER ALPHA (U+0251), LATIN SMALL LETTER GAMMA (U+0263), LATIN SMALL LETTER EPSILON (U+052B, though that's its old name), LATIN

Re: Latin Theta?

2004-01-28 Thread Mark E. Shoulson
Oh yeah, and Chi also. ~mark Mark E. Shoulson wrote: I was playing around with making my very own IPA keyboard, and I discovered to my surprise that Unicode has no Latin Small Theta (for IPA). We have LATIN SMALL LETTER ALPHA (U+0251), LATIN SMALL LETTER GAMMA (U+0263), LATIN SMALL LETTER

Re: in the NEW YORK TIMES today, report of a USA patent for a met hod to make the Arabic language easier to read/write/typeset

2004-03-15 Thread Mark E. Shoulson
And see http://www.arabetics.com/ for the official site. (Me, I think it's a cool idea, but I'm notorious for being fascinated by shiny new things.) ~mark Mike Ayers wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Michael Everson Sent: Monday, March 15, 2004 9:40 AM In

Re: Irish dotless I

2004-03-16 Thread Mark E. Shoulson
Peter Kirk wrote: On 16/03/2004 07:35, Carl W. Brown wrote: ... I suspect that just changing the font to eliminate the dot will be easier. Software won't have to be changed, existing code pages will not have to be changed, searches will work, etc. It has the disadvantage of making these

Re: [OT] Freedom and organization (was RE: help needed with adding ne w character)

2004-03-19 Thread Mark E. Shoulson
(perhaps ever so slightly closer to on-topic)... Marco Cimarosti wrote: Kenneth Whistler wrote: Why is an Anarchist asking to standardize something? Why not!? Can you elaborate on this? Myself, I am an anarchist sympathizer, and I have been deeply interested in a character encoding

Re: [Slightly OT] Font examiner program/utility?

2004-03-25 Thread Mark E. Shoulson
Mike Ayers wrote: Does anyone know of a good program for examining fonts? What I am looking for is some way to, given a font, find out both the glyphs contained and the code points (bad term?) at which those glyphs are situated. Ability to read hinting/shaping tables a bonus.

Re: What is the principle?

2004-03-31 Thread Mark E. Shoulson
[Original Message] From: Kenneth Whistler [EMAIL PROTECTED] To: [EMAIL PROTECTED] Scenario: The UTC listens to you and defines some section of the PUA as strong right-to-left by default for use in PUA-defined bidirectional scripts. Somebody else is *already* using that section of the PUA

Re: Doing Markup in Plain Text: A Modest Proposal for Planes 4-B of Unicode

2004-03-31 Thread Mark E. Shoulson
[EMAIL PROTECTED] wrote: XML has become the de facto standard for fancy text. It is therefore useful to explore ways and means of bringing XML into plain text, since obviously plain text is simpler than, and superior to, fancy text. The current method involving and and and / and who knows

Re: Doing Markup in Plain Text: A Modest Proposal for Planes 4-B of Unicode

2004-04-01 Thread Mark E. Shoulson
, being careful not to replace the contents of the HTML tags. The things you have to go through... ~mark John Cowan wrote: Mark E. Shoulson scripsit: Heh... I've occasionally caught myself almost wishing for this kind of setup, ridiculous though it be. It would be nice to be able to get just

Re: New Currency sign in Unicode

2004-04-01 Thread Mark E. Shoulson
Jim Allan wrote: Peter Kirk wrote: I wonder if Kyekyeku is finding it rather offensive that all we westerners are claiming to know better than he does what the cedi sign looks like. He says it is different from a cent sign. Let's stop speculating that he might be wrong and wait for him to

Re: U+0140

2004-04-15 Thread Mark E. Shoulson
Kenneth Whistler wrote: Philippe opined: If there's something really missing for Catalan, it's a middle-dot letter with general category Lo, and combining class 0 (i.e. NOT combining). The one thing for sure is that the Unicode Standard does not need to encode more middle dots:

Re: GB18030 and super font

2004-04-22 Thread Mark E. Shoulson
Raymond Mercier wrote: I am intrigued by GB18030 encoding. There is a table of equivalences in http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200 0.xml No doubt Unihan will at some stage include these 2 4 byte values. I enquired about the 'super font' created by a

Re: interleaved ordering (was RE: Phoenician)

2004-05-13 Thread Mark E. Shoulson
D. Starner wrote: If the input is in multiple (Indic) scripts, and let's assume that the audience (which may be a single person just asking for an sorted list of his/her files) can read the Indic scripts used, it may be helpful to interleave. (But I will not push this.) Now let's asume

Re: Coptic/Greek (Re: Phoenician)

2004-05-12 Thread Mark E. Shoulson
E. Keown wrote: Elaine Keown Tucson Dear Kenneth Whistler: down, but even for this, the edge cases result in irreconcilable arguments: is Etruscan left-to-right or right-to-left or both? A lot of the really early Greek (on the true edge between Phoenician and Greek) seems

Re: Phoenician

2004-05-11 Thread Mark E. Shoulson
Peter Kirk wrote: But have the others agreed with his judgments because they are convinced of their correctness? Or is it more that the others have trusted the judgments of the one they consider to be an expert, and have either not dared to stand up to him or have simply been unqulified to do

Re: Script vs Writing System

2004-05-13 Thread Mark E. Shoulson
Peter Constable wrote: In addition, traditional Chinese zither notation (qin pu) is also laid out in ideographic-like square blocks. However, as this is a notational system rather than a script, the constituent elements of each block represent string, finger and plucking

Re: Script vs Writing System

2004-05-13 Thread Mark E. Shoulson
Peter Constable wrote: Peter Constable wrote: I was already after the first paragraph going to mention another writing system, and I'm even more strongly reminded of it by this second paragraph: Sign Writing... And there's also Visible Speech, by Alexander Melville

Re: interleaved ordering (was RE: Phoenician)

2004-05-13 Thread Mark E. Shoulson
[EMAIL PROTECTED] wrote: Dean A. Snyders asks, Why make something we do all the time more difficult and non-standard, when what we do now works very well? Please, one thing to remember about default collation is that it's default. It's only there when no other instructions exist.

Re: Phoenician

2004-05-07 Thread Mark E. Shoulson
Dean Snyder wrote: This is ALL I am trying to do here - just presenting some perspectives that may not be apparent to non-specialists, in the hopes it will make for a better informed decision. Good enough. But didn't we also hear from some specialists who say they *do* need the

Re: Phoenician

2004-05-07 Thread Mark E. Shoulson
Dean Snyder wrote: Mark E. Shoulson wrote at 9:42 AM on Friday, May 7, 2004: Dean Snyder wrote: We need EXPLICIT reasons to justify a new encoding. Just saying that somebody wants it in XML because their font won't show up is insufficient justification, especially when the repercussions

Re: Hooks and Curls and Bars, oh my (was: New contribution)

2004-05-07 Thread Mark E. Shoulson
Ernest Cline wrote: I never said IPA wasn't useful, I just think it would have been better if it had been defined as separate script and when an IPA symbol turned into a cased Latin letter pair, to have added two letters instead of one. Viva Visible Speech! (We're working on the proposal...)

Re: Phoenician

2004-05-07 Thread Mark E. Shoulson
John Hudson wrote: Mark E. Shoulson wrote: Obviously one can find experts on both sides of this debate. Experts that need something should not be told You can't have it because we have other experts who don't like it. Indeed not, but they might actually want to be told 'Other experts have

Re: Phoenician

2004-05-07 Thread Mark E. Shoulson
Oh, this is ridiculous. They're the same script. It's shown they're not. Scholars don't want it. It's shown they do. Then ask more scholars. That way lieth madness; you can always say the *next* people we talk to will *really* put us in our place... There's always some further problem

Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-06 Thread Mark E. Shoulson
Peter Constable wrote: From: Mark E. Shoulson [mailto:[EMAIL PROTECTED] Actually, no: some accents go on unstressed syllables. For example, a dehi could coexist with a qamats-qatan. Psalms 4:2 has a qamats-qatan on the same letter as GERESH MUQDAM, as do others. Psalms 9:14 has one

Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-06 Thread Mark E. Shoulson
Simon Montagu wrote: Mark E. Shoulson wrote: However, since qamats-qatans only occur in unstressed syllables, such a thing would be rare. Actually, no: some accents go on unstressed syllables. For example, a dehi could coexist with a qamats-qatan. Psalms 4:2 has a qamats-qatan on the same

Re: Phoenician

2004-05-06 Thread Mark E. Shoulson
Dean Snyder wrote: The issue is not whether this particular proposal represents Phoenician script adequately, it does; the real issue is whether Phoenician should be separately encoded at all. I thought we had pretty much thrashed this one out by now. We've demonstrated that users of Hebrew

Re: New contribution

2004-05-06 Thread Mark E. Shoulson
Jony Rosenne wrote: Cursive Hebrew, Rashi and Square Hebrew are only font variations and should not be separately encoded. Definitely. If you tried my experiment with examples from these or other Hebrew fonts, people would have no trouble reading them. Even Rashi script with

Re: New contribution

2004-05-06 Thread Mark E. Shoulson
Peter Kirk wrote: OK, maybe not such a good example. So let's go back to Suetterlin. I would expect a much higher rate of recognition among German users of normal Latin script than among American users of normal Latin script. So a test of recognition in America might seem to indicate that

Re: New contribution

2004-05-06 Thread Mark E. Shoulson
E. Keown wrote: What Semitists do varies -- within a Ph.D. class, where they are teaching students to recognize many older variant glyphs, they may give many handouts with sets of glyphs... Within publications, which are not for specialists in early Canaanite, they do usually use square

Re: New contribution

2004-05-06 Thread Mark E. Shoulson
Playing hide and seek on the graveyards wrote: Are those mere Italian pounds or Israeli pounds of 100 agora? For the value of the agora see 1 Sam. 2:36 Israel stopped using Israeli pounds in 1980. (well, they started using Sheqels then; pounds (lira) were still legal tender until 1984.) ~mark

Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-06 Thread Mark E. Shoulson
Peter Constable wrote: Yeah, whatever. Just make sure nobody is going to come along later and say, We've discovered we need to distinguish two orderings for qamats qatan and athnah (or tipha, tevir, munah, mahapakh, merkha, merkha kefula, darga or yerah ben yomo). (Of course, if they do, they can

Re: Interleaved collation of related scripts

2004-05-14 Thread Mark E. Shoulson
Anto'nio Martins-Tuva'lkin wrote: Peter Kirk wrote: PS Multi-language bibliographies are common in Russian books. They are usually printed with the Latin script entries following the Cyrillic script ones, but I have seen interleaved ones. Check also

Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-14 Thread Mark E. Shoulson
E. Keown wrote: Elaine Keown Tucson Dear Peter, *plain text* standard is the bidirectional algorithm, which sorts out how a (horizontal) *line* of text is laid out when text of opposite directions In the 'old' Unicode 3.0 there was a one-line note on doing boustrophedon

Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-14 Thread Mark E. Shoulson
Philippe Verdy wrote: Mark wrote: to put the various marks. The bidi algorithm is enough of a headache as it stands, just trying to deal with RTL and LTR scripts and their possible coexistence on a single line. Boustrophedon is far too complex for it. May be not. [...example deleted...]

Re: Archaic-Greek/Palaeo-Hebrew (was, interleaved ordering; was, Phoenician)

2004-05-14 Thread Mark E. Shoulson
Dean Snyder wrote: My question is, do you really care what ANYBODY says about encoding or not encoding Phoenician, or has your mind been made up for 10 years and nothing can change it now? But they *DID* listen to what people had to say about it. Some said one thing, some said the other. A

Re: New contribution

2004-05-04 Thread Mark E. Shoulson
Peter Kirk wrote: On 03/05/2004 19:03, Michael Everson wrote: Wedding invitations are routinely set in Blackletter and Gaelic typefaces. I bet you 20 that if an ordinary Hebrew speaker sent out a wedding invitation in Palaeo-Hebrew no one would turn up on the day. And I bet you 20 that is an

Re: Drumming them out

2004-05-04 Thread Mark E. Shoulson
Michael Everson wrote: At 10:00 -0700 2004-05-04, Peter Kirk wrote: Out of interest, are there any dictionaries e.g. of the Phoenician language which use both Phoenician and Hebrew script, with a plain text distinction? James Kass presented a non-dictionary text the other day. I considered it

Re: New contribution

2004-05-04 Thread Mark E. Shoulson
Peter Kirk wrote: On 03/05/2004 05:19, Michael Everson wrote: ... Germans who don't read Stterlin recognize it as what it is -- a hard-to-read way that everyone used to write German not so long ago. And modern Hebrews recognise paleo-Hebrew as a now hard-to-read way that everyone used to write

Re: New contribution

2004-05-04 Thread Mark E. Shoulson
D. Starner wrote: Hebrew has the same 22 characters, with the same character properties. But Hebrew has 27 letters. Five appear in 2 forms which are recognized both by the users and by Unicode as distinct. ~mark

  1   2   3   4   >