Egyptian Demotic

2014-01-22 Thread Stephan Stiller
Hi all, Is Egyptian Demotic on somebody's roadmap for Unicode? (Egyptian Demotic is what's on the middle third of the Rosetta Stone.) Stephan ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode

Re: Representation of neutral tone in pinyin and bopomofo

2013-11-23 Thread Stephan Stiller
[CR:] in now exotic styles where the letters /ĉ, ŝ, ẑ, ŋ/ were used as well Interesting. ẑ, ĉ, ŝ (but not ŋ) have been part of most pinyin descriptions at the end of dictionaries; ẑ, ĉ, ŝ are still listed in Xiàndài Hànyǔ Cídiǎn's 6th edition. But de-facto noone uses them, and I'd regard them

Re: Representation of neutral tone in pinyin and bopomofo

2013-11-22 Thread Stephan Stiller
Hi Eric, [We met at the UTC meeting.] I. Is it correct that: in bopomofo, the neutral (or light) tone is represented by U+02D9 ˙ DOT ABOVE, and in the text representation, that character follows the bopomofo characters of the syllable (just like all the other characters for tones) 1.

Re: letters that complete the rectangle in Indic scripts

2013-09-20 Thread Stephan Stiller
On 9/19/2013 2:35 AM, Stephan Stiller wrote: As far as I am aware, a proper 'null consonant' has only arisen when it actually represents a glottal stop. There's ㅇ in hangeul (Hangul; Korean). Hebrew ע was supposedly first pharyngeal [ʕ], though it's nowadays standardly a glottal stop [ʔ

Re: letters that complete the rectangle in Indic scripts

2013-09-19 Thread Stephan Stiller
As far as I am aware, a proper 'null consonant' has only arisen when it actually represents a glottal stop. There's ㅇ in hangeul (Hangul; Korean). Hebrew ע was supposedly first pharyngeal [ʕ], though it's nowadays standardly a glottal stop [ʔ] or null ∅ (and you don't even need need a hiatus

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
On 9/17/2013 10:54 PM, Asmus Freytag wrote: On 9/17/2013 8:40 PM, Philippe Verdy wrote: In what way does UTF-16 use surrogate code /points/? An encoding form is a mapping. Let's look at this mapping: * One _inputs_ scalar values (not surrogate code points). In fact the input is

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
On 9/18/2013 12:02 AM, Stephan Stiller wrote: That still doesn't mean surrogates are used by UTF-16 = 'That still doesn't mean surrogate_code point_s are used by UTF-16'

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
On 9/18/2013 2:42 AM, Philippe Verdy wrote: There are scalar values used in so many other unrelated domains [...] There is no risk for confusion with vectors or complex numbers or reals or whatnot. On 9/18/2013 8:34 AM, Asmus Freytag wrote: I concur. Codepoint is the accepted way of referring

Re: Code point vs. scalar value

2013-09-18 Thread Stephan Stiller
Instead of selectively agreeing with Philippe's writing, it would be good to tell us why Glossary claims that surrogate code points are [r]eserved for use by UTF-16 and why there are similar statements in the Unicode book if [AF:] [o]nce you add the UTF-prefix, you are, by force, speaking of

Re: Code point vs. scalar value

2013-09-17 Thread Stephan Stiller
[AF:] It is the wording in your posts that adds to the confusion. My fundamental point is, has been, and continues to be that whenever people use the more general word code point instead of the more appropriate scalar value, that will add to the confusion. If you make the presupposition

letters that complete the rectangle in Indic scripts

2013-09-17 Thread Stephan Stiller
I have been told that Devanagari contains letters (or a letter) that were invented merely to complete the rectangular C-V table; not sure to what extent they (or it) were used subsequently. Wiki http://en.wikipedia.org/wiki/Devanagari tells me about the letter ॡ (signifying ḹ, I assume

Re: Code point vs. scalar value

2013-09-17 Thread Stephan Stiller
On 9/17/2013 5:27 PM, Asmus Freytag wrote: On 9/17/2013 2:55 PM, Stephan Stiller wrote: [AF:] It is the wording in your posts that adds to the confusion. My fundamental point is, has been, and continues to be that whenever people use the more general word code point instead of the more

Re: letters that complete the rectangle in Indic scripts

2013-09-17 Thread Stephan Stiller
I have been told that Devanagari contains letters (or a letter) that were invented merely to complete the rectangular C-V table; not sure to what extent they (or it) were used subsequently. In which reference is this mentioned? I was referring to oral communication (above I wrote I have been

Re: Code point vs. scalar value

2013-09-17 Thread Stephan Stiller
In what way does UTF-16 use surrogate code /points/? An encoding form is a mapping. Let's look at this mapping: * One _inputs_ scalar values (not surrogate code points). * The encoding form will _output_ a short sequence of encoding form–specific code units. (Various voices on this list

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
Twitter - Until recently, characters outside the BMP resulted in a Counter decrement of 2 and BMP characters gave a decrement of 1. Not sure when the change happened but now both BMP non BMP characters result in a decrement of 1 Yes!! How might that have happened? ;-) And the date line of

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
① Twitter - [...] ② Sina Weibo - [...] About a year ago I blogged about it http://schappo.blogspot.co.uk/2012/10/weibo-character-count.html And your post on Twitter is this one: http://schappo.blogspot.co.uk/2012/10/twitter-character-count.html Stephan

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
them up. The latter interpretation seemed to derive from terminological imprecision at first, but my concern and suspicion turned out to be spot-on with what Twitter did historically. On 9/16/2013 7:19 AM, Philippe Verdy wrote: 2013/9/16 Stephan Stiller stephan.stil...@gmail.com

Re: Origin of Ellipsis

2013-09-16 Thread Stephan Stiller
On 9/16/2013 7:48 AM, Stephan Stiller wrote: or count code points corresponding to code units because, well, you can match them up = or count code points corresponding to UTF-16 code units; those happen to be BMP code points. Twitter has been claiming since /at least/ April 2012 that they're

Re: Origin of Ellipsis and double spacing after a sentence.

2013-09-15 Thread Stephan Stiller
On 9/14/2013 6:24 AM, Michael Everson wrote: It facilitates comment by those who are reviewing the text. If you add proofreaders' marks to an especially difficult manuscript, maybe. I've barely seen annotated papers with comments that would not have fit into the margins, and there's still the

Re: Origin of Ellipsis

2013-09-15 Thread Stephan Stiller
On 9/15/2013 1:04 PM, Doug Ewell wrote: André Schappo wrote: U+2026 is useful for microblogs when one is looking to save characters Not if the microblog is in UTF-8, as almost all are. That's an astute observation, but André was talking about input limits

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-15 Thread Stephan Stiller
On 9/15/2013 3:07 PM, Phillips, Addison wrote: Not if the limit is counted in characters and not in bytes. Twitter, for example, counts code points in the NFC representation of a tweet. character, code point – these are confusing words :-) From the link it isn't entirely clear whether they (a)

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-15 Thread Stephan Stiller
Stephan Stiller wrote: From the link it isn't entirely clear whether they (a) count scalar values of NFC or (b) count code points of NFC. Are they not the same thing, except for surrogates? Conceptually no, but numerically yes – you are right in that regard, and I wasn't precise in my

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-15 Thread Stephan Stiller
Doug wrote me: You're not confusing code point with code unit, are you? Thanks for the note. I think what you say is that I thought (or meant to write) by first representing the sequence of scalar values in an encoding form and then counting [code points typecast from] code _units_. I think

Re: Origin of Ellipsis

2013-09-14 Thread Stephan Stiller
You've quoted the sentence out of its context (note the then word which indicates this context). I do not support this practice. Philippe, within my message you quote here isn't exactly precise about context, is it :-) I think there's a misunderstanding. My annoyance isn't in principle with

Re: Origin of Ellipsis

2013-09-14 Thread Stephan Stiller
[ME:] Books never used it. The tradition in typing was developed to assist typesetters to navigate the typewritten text they were setting. The typesetters never put two spaces after a full stop. I'm looking at what looks like a US edition/printing (1902) of the US-American novel Moby-Dick:

Re: Origin of Ellipsis

2013-09-14 Thread Stephan Stiller
On 9/14/2013 3:42 AM, Michael Everson wrote: On 14 Sep 2013, at 02:30, Stephan Stiller stephan.stil...@gmail.com wrote: This means that this dot will then need to be followed by two spaces when it is used as a sentence-ending period. This tradition is no longer current in the US. Though it's

Re: Empty set

2013-09-13 Thread Stephan Stiller
Hi Philippe, i.e. (...). at end of a truncated sentence or . (...) at start of the next truncated sentence Well, for citations in German I've generally seen [...], and for English I've seen both [...] and ..., but not (...). I included it them in my sentence (parentheses or

Re: Empty set

2013-09-13 Thread Stephan Stiller
I've never seen it in math proper, is what I meant, but ... The { [ ( ) ] } hierarchy is used in chemical nomenclature. It is specified by IUPAC (International Union of Pure and Applied Chemistry). For example: acetone (/R/)-/O/-{2-[4-(α,α,α-trifluoro-/p/-tolyloxy)phenoxy]propionyl}oxime

Re: Empty set

2013-09-13 Thread Stephan Stiller
I dd not speak about inter-word spacing (this cont affect the rendering of ellipsis itself) but about inter-letter spacing. But the context I provided was that some people ask for . . .[ .], as ugly as it is :-) And, again, the precise ideal spacing is a matter of typographic design; you can

Re: Empty set

2013-09-13 Thread Stephan Stiller
Once you've increased the width of these interword spaces to their maximum, all the characters (and these increased spaces) should be justified using interletter spacing, and this extra interletter spacing should be applied as well between the dots of the ellipsis (showing that they are

Re: Empty set

2013-09-13 Thread Stephan Stiller
Once you've increased the width of these interword spaces to their maximum, all the characters (and these increased spaces) should be justified using interletter spacing, and this extra interletter spacing should be applied as well between the dots of the

Re: Empty set

2013-09-13 Thread Stephan Stiller
[PV:] But then the existing ellipsis is not a good candidate because it has the incorrect metrics where it should use the sinographic metrics. [...] But the encoded ELLIPSIS does not fit correctly there. But I think Chinese fonts take care of that. Stephan

Re: Origin of Ellipsis (was: RE: Empty set)

2013-09-13 Thread Stephan Stiller
Exactly my thoughts: In fonts commonly used for word processing and desktop publishing, HORIZONTAL ELLIPSIS is usually not that well designed. To me the dots appear too close in plenty of fonts. But I think that the most common cause of the appearance of HORIZONTAL ELLIPSIS is that Microsoft

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
Hi Philippe, This means that this dot will then need to be followed by two spaces when it is used as a sentence-ending period. This tradition is no longer current in the US. Though it's obvious there are still plenty of middle and high school–level teachers and college-level writing

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
:-) Lots of people still do this. I did until a year or two ago. I also use non-standard punctuation, but I tend to know what majority practice is, and when I deviate it's intentional. I don't know about you, but nearly everyone who tells me that you should use two spaces (should? says who?)

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
This tradition is persistant. Persistent where? Lots of people Lots of people who and how many? Go to a bookstore or library, pick 100 items randomly, and report. If you want to make a case that it's majority or significant usage in personal correspondence or outside of professional

Re: Origin of Ellipsis

2013-09-13 Thread Stephan Stiller
This tradition is persistant. Persistent where? This is already replied within my message you quote here. Lots of people Lots of people who Same remark. So there are many contributors, on the English Wikipedia. What does many mean? I doubt double spacing of sentences is

Re: Why blackletter letters?

2013-09-12 Thread Stephan Stiller
Talking about which ... I confess I usually type a Danish Ø for convenience when I'm using this, though for publication I would tend to substitute the proper ∅. Whenever I saw the empty set symbol in printed math literature in Germany, it closely resembled Ø; I don't think I ever saw a

Re: Empty set

2013-09-12 Thread Stephan Stiller
Regarding the empty set, the page http://jeff560.tripod.com/set.html rather convincingly attributes the symbol to André Weil, who says that it was inspired by the Norwegian letter “Ø”. Well, if one looks at earlier editions of the Éléments, the symbol is clearly not printed as

Re: Why blackletter letters?

2013-09-12 Thread Stephan Stiller
I confess I usually type a Danish Ø for convenience when I'm using this, though for publication I would tend to substitute the proper ∅. Whenever I saw the empty set symbol in printed math literature in Germany, it closely resembled Ø; I don't think I ever saw a

Re: Empty set

2013-09-12 Thread Stephan Stiller
The notation { } is quite correct. It just isn’t an atomic symbol for the empty set but an expression consisting of the two characters “{” and “}”, with a list (here, an empty list) of elements between them. Reminds me of typographically composite stuff that has its own scalar value (code

Re: Empty set

2013-09-12 Thread Stephan Stiller
The situation with {} is very similar to the situation with 0̸ for the empty set and with \ for set subtraction. The Knuth's version of TeX was designed for typesetting his books, and he (probably) did not encounter situations where the meaning of these symbols is ambiguous. When AMS was

Re: Empty set

2013-09-12 Thread Stephan Stiller
Hi Philippe, I disagree. For me your spaced-out ellipsis (. . .) is not an ellipsis but are horizontal rulers (typically used in tables or input forms) to facilitate the reading of tabular data. I disagree with CMOS prescription in this case, just as you do, but the prescription exists,

Re: Why blackletter letters?

2013-09-11 Thread Stephan Stiller
Hi Gerrit, I have been aiming at creating a blackletter font (http://unifraktur.sourceforge.net/maguntia.html) Cool! • The four “required” ligatures ch, ck, ſt and tz, which were never separated in typesetting. These can be realised in the very same way as antiqua ligatures. Your page draws

Re: Why blackletter letters?

2013-09-11 Thread Stephan Stiller
On 9/11/2013 5:56 AM, Gerrit Ansmann wrote: That’s correct, but that did not seem to stop people from using a long s in Antiqua from time to time. There are a lot of post-1901 Antiqua display fonts that contain a long s as well as examples from normal text. This very rarely happens even today:

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Stephan Stiller
For Web formats (HTML, etc.), the answer is no. The obvious follow-up to the list: It'd be interesting to know where the answer is yes. People will occasionally mention ISO/IEC 2022, which can be thought of as a meta-encoding or encoding template or encoding constructor, but in the normal

Re: What to backup after corruption of code units?

2013-08-28 Thread Stephan Stiller
confusion isn't exactly rampant I guess so. But while we're splitting hairs: There simply are two meanings for the word backup, which in and of itself is nothing unusual, especially where one of them is the ordinary sense of the term (not really a technical term). In the IT domain, the to

Re: Can a single text document use multiple character encodings?

2013-08-28 Thread Stephan Stiller
On 8/28/2013 3:35 PM, Asmus Freytag wrote: The original question was about combining UTF-8 and UTF-16 in the same document. /Not quite./ Hint: The original question is in the original email.

Re: What to backup after corruption of code units?

2013-08-27 Thread Stephan Stiller
All good replies It means the program needs to go back (a.k.a. back up) but I'd say backtracking would make for better wording in TUS. Stephan

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-05 Thread Stephan Stiller
On 8/5/2013 11:26 AM, Whistler, Ken wrote: Inclusion of the precomposed characters now seen in the U+1FXX block was part of the price of the merger. What was included was precisely the repertoire requested by Greece, and no attempt was made to further rationalize forms including macrons for

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
[from RW:] /For metrical purposes/, we don't know whether the syllable is open or closed until we know what comes next. [emphasis added] About that you are right, and it was an oversight on both our parts. But the dictionary also contains πράσσω with ᾱ in an annotation, and the weight of

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
On 8/4/2013 2:59 PM, Richard Wordingham wrote: The CLDR does not yet support Ancient Greek! [...] Vowels with plain COMBINING BREVE and COMBINING MACRON don't make to the list of auxiliary exemplar characters for Modern Greek. This is a non-sequitur; why would they for Modern Greek (Dimotiki),

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
Most of the polytonic precomposed vowels are in the auxiliary exemplars for Modern Greek. I don't know – probably because of the Katharevousa legacy and the fact that Ancient Greek lives on in literary idioms, for which you ordinarily don't use a macron for reasons of orthographic

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-04 Thread Stephan Stiller
Please bear in mind that polytonic vowels ARE used in the language called Modern Greek. /Because/ of the Ancient/Attic heritage living on via Katharevousa or the occasional person persisting in polytonic orthography. In any case, modern writing has traditionally not used macrons (and certain

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-03 Thread Stephan Stiller
I've seen information concerning this we can no longer encode new precomposed characters for grapheme clusters that are already encoded in any existing standard form many times, though I'm not in a position to verify all of your content. I'm also not proposing to add precomposed

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-03 Thread Stephan Stiller
/[One consequence of the string policy is that ]/we can no longer encode new precomposed characters for grapheme clusters that are already encoded in any existing standard form/[.]/ And you've truncated the end of my sentence Well, I have not, unless you really want to count that

Re: symbols/codepoints for necessity and possibility in modal logic

2013-08-02 Thread Stephan Stiller
There are a number of box characters in the vicinity of U+27FB You mean U+25FB. U+25A1 [I think: maybe] [and] For diamond [...] U+25CA [I think: no] Have you read my previous discussion and looked at UTR 25 (p. 20 and also Ideal Sizes on p. 19)? U+25AB [and] U+25FD Definitely too

polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-02 Thread Stephan Stiller
Hi, If one wants to indicate vowel length for the length-ambiguous vowels α, ι, υ in Ancient Greek, one writes ᾱ, ῑ, ῡ. Is there a reason for why there are no diacritic-precomposed characters? I guess it's because macron usage is rare in orthographic practice, even though vowel length here

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-02 Thread Stephan Stiller
Characters restricted to dictionaries are generally not well supported. And modern textbooks in a modern world :-) The practice in Scott and Liddell is to reserve ᾱ, ῑ and ῡ for a note after the dictionary entry. Liddell Scott is old, just like Lewis Short. We've moved on since then, and

Re: polytonic Greek: diacritics above long vowels ᾱ, ῑ, ῡ

2013-08-02 Thread Stephan Stiller
The practice in Scott and Liddell is to reserve ᾱ, ῑ and ῡ for a note after the dictionary entry. I'm looking at Liddell-Scott-/Jones/ here http://www.tlg.uci.edu/lsj/ and at old pdf's of Liddell Scott [only] by Google, and I cannot easily confirm your statement. Perhaps it holds for

Re: _Unicode_code_page_and_?.net

2013-07-30 Thread Stephan Stiller
On 7/30/2013 3:27 PM, Asmus Freytag wrote: architectures that depended on swapping character sets (code pages) in mid stream I thought systems were usually married to a particular code page. I'm wondering where (historically) you'd actually change to a different code page mid-stream.

Re: Unicode code page and ☃.net

2013-07-29 Thread Stephan Stiller
I have a question regarding the supported Unicode code page. There are no Unicode code pages. I guess there is the question of what exactly a codepage is when you consider complicated encodings, esp stateful ones. But I always think of Unicode as one giant abstract codepage, and Unicode

Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Stephan Stiller
What is wrong with using DIAMOND OPERATOR? wrong is strong wording and goes beyond what I suggested or implied, but it's not clear to a user of Unicode that it's the right fit either. There are a couple of indicators factoring in: * The charts mention modal logic in conjunction with ◻

Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Stephan Stiller
Why not contact the relevant publishers and find out what they are using? Why not contact the relevant governments and find out what they're using in order to solve /_*all*_/ encoding issues for /_*all*_/ languages and writing systems within a day? :-) Publishers use metal type (or various

Re: symbols/codepoints for necessity and possibility in modal logic

2013-07-19 Thread Stephan Stiller
Hi Jörg, Thanks for the info! U+25C7 WHITE DIAMOND is the best choice I'm with you in that for now I'll go with ⟨◻ (U+25FB), ◇ (U+25C7)⟩ as the pair of choice, pending further decisions; see also what I'm writing further down. Or objections from experts stating that the symbol

symbols/codepoints for necessity and possibility in modal logic

2013-07-18 Thread Stephan Stiller
Hi all, Modal logic uses a box and a diamond (this is how they're informally called) as operators (accepting one formula and returning another) to denote necessity and possibility, resp. Older texts might use the letters L and M (resp). Which Unicode codepoints do modal box and diamond

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
Hi Richard, I know of standards for transcribing foreign alphabets (by /target/ locale – Are they relevant here? If so, which?) [...] This may well depend on both source and target locale! How often will locale have to be broken down on a non-local basis? Different newspapers in the same

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
Hi Jonathan, I definitely appreciate the partial datapoints from your links, but Google is your friend by itself doesn't lead us closer to a real answer, and in this case I think that there are at least some good answers, and in any case some answers will be better than others. This

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
My impression is that US customs officials are either quite knowledgeable or quite tolerant on such issues (or a mixture of both). The same applies to customs officials in other countries I have traveled to, and other people at airports and such. Thanks. (And, I don't have the knowledge to

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
Hey Jonathan, The official transliteration for Hebrew to the Latin script is obsolete What is the latest recommended scheme? and the situation in this country is a mess Let me guess: it has to do with the number of spelling variants in names of /aliyah/ immigrants? I've always been

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
See http://www.icao.int/publications/Documents/9303_p1_v1_cons_fr.pdf , especially Appendice 8 (p IV-50). The English version is available as http://www.icao.int/publications/Documents/9303_p1_v1_cons_en.pdf , especially Appendix 8 (p IV-47). I suppose you can't go wrong with what your own

Re: writing in an alphabet with fewer letters: letter replacements

2013-07-05 Thread Stephan Stiller
I suppose you can't go wrong with what your own passport says On second thought ... * disallowed: Ä↛A , Ö↛O , Ü↛U (as are: Å↛A , Ø↛O) ... I have a Turkish friend for whom it is Ö→O, not OE. This calls into question the general applicability of these rules. A few years ago he also told

writing in an alphabet with fewer letters: letter replacements

2013-07-04 Thread Stephan Stiller
Hi folks, For languages whose alphabets aren't too far apart (I'm thinking mostly of the set of Latin-derived alphabets), what is a good place for finding out how letter replacements for letters that are missing in a different country/locale are done? For example, how will an Icelander

Re: Arabic quoting characters

2013-06-14 Thread Stephan Stiller
On Fri, Jun 14, 2013 at 10:45 AM, Michael Fayez michaelfa...@hotmail.com mailto:michaelfa...@hotmail.com wrote: I noticed that double small parentheses that are used in professional printing in Arabic presses are not encoded in Unicode. [...]http://i.imgur.com/aAgRDq1.jpg So

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Stephan Stiller
Thank you, خالد and Richard. there is only one Indic mark I can think of for which the issue of component association arises, and that is the nukta That is good to know, given the complexity of the Indic scripts. Other thoughts: * One could simply break up Arabic ligatures in need of

interaction of Arabic ligatures with vowel marks

2013-06-11 Thread Stephan Stiller
Hi, How is the placement of vowel marks around ligatures handled in Arabic text? Does anyone have good pointers on this topic? My guess is that this does not come up often (just like the topic of pointing for handwritten Hebrew), as vowel marks are mostly not added in ordinary text.

Re: Hanzi trad-simp folding and z-variants

2013-06-09 Thread Stephan Stiller
Familiarity with a writing system makes the non-obvious parts comprehensible, as can context. The work is a thorough listing of usage instances that the authors could encounter in the wild. My informants can't recall ever having seen many of these characters. They wouldn't use them, and that

Re: Hanzi trad-simp folding and z-variants

2013-06-09 Thread Stephan Stiller
For me non-standardized' means there is not one recognized standard, this does not mean that things are completely unstable, nor that there are no traditions of what character is used for what word that have been passed down for many generations. /As I stated/, for a decent number of

Re: Hanzi trad-simp folding and z-variants

2013-06-09 Thread Stephan Stiller
The way the Cheung-Bauer list was compiled certainly hard to see how most of the characters would be in widely known. I'd need to look at CB again for accurate numbers, but to some extent it's simply because some syllable-morphemes are listed with many different attested possibilities. So

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
http://www.unicode.org/reports/tr38/ does a good summary of the possibilities. Which and where? Trying to fold from one locale to another, which is what folding from traditional to simplified would be is not a good idea, best practice is not bear in mind the locale being used, and do

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
The situation also sends to be complex once one steps putside of Putonghua. Given that the situation there is a lack of standardization (and a lack of tables laying out variant spellings), I don't think anything other than radical, hand-tuned folding to cover all possibilities is sensible to

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
As far as general folding is concerned, performing conversion (whether it's word-based or not and even if it's locale-tailored) and then a strict search will let you miss out on the z-variation you find in the wild (because of true variation or of misspellings), and a more generous inclusion

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
I. Which and where? Section 3.7.1 Simplified and Traditional Chinese Variants talks about converting between Simplified and Traditional Chinese. You wrote this http://www.unicode.org/reports/tr38/ does a good summary of the possibilities. in response to my inquiry

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
better word choice: lexical variation - orthographic variation (in my prev email)

Re: Hanzi trad-simp folding and z-variants

2013-06-08 Thread Stephan Stiller
So we both agree that Unihan is not designed to tell people how to covert between traditional and simplified characters. Yep. Though some confusion as what other questions are being discussed here. I think I misused the expression folding at some point. But the original query explicitly

Re: Hanzi trad-simp folding and z-variants

2013-06-07 Thread Stephan Stiller
Hi John, This is one of those questions that I've been wondering about as well ... my guess would be yes that should work (and dealing with z-variants is something you'll likely need to do anyways), but there *must* be some published algorithm out there that specifically addresses the issue

Re: Hanzi trad-simp folding and z-variants

2013-06-07 Thread Stephan Stiller
simplified [is] better thought of as abbreviated Part of this is a terminological argument. The historical situation is indeed more complicated than many people know, but the truth is also that irrespective of eg people's past or present usage in handwriting there have (in the past and esp

Re: Suggestion for new dingbats/symbols

2013-05-31 Thread Stephan Stiller
Excellent question and points from Albrecht Dreiheller. [AD:] So the _receptive vocabulary_ might be pretty big for many people. [...] So the _productive vocabulary_ of symbols will always be very, very small. I was thinking a similar thing, and I'm inclined to agree. But I know of

Re: Suggestion for new dingbats/symbols

2013-05-28 Thread Stephan Stiller
The Noun Project seem determined to create a pictogram for every noun, and many short phrases: See http://blog.thenounproject.com/ Huh. What are the constraints on the symbols; eg: what resolution can the symbols be (so that we don't simply use detailed high-res pictures)? Are there any

Re: SignWriting

2013-04-22 Thread Stephan Stiller
Sing-Writing has both a normative form, to be generated by computer programs, and a handwriting form allowing more freedom. It has been developed using signs that are not so complicate to reproduce in a meaningful way. Could you provide a link with signwritten sentences in the

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Stephan Stiller
[Charlie Ruland:] The Unicode Consortium is prepared to encode all characters that can be shown to be in actual use. Are you sure there is a precedent for what is essentially markup for a system of (alpha)numerical IDs? Stephan

Re: SignWriting

2013-04-22 Thread Stephan Stiller
what the western world knows as „calligraphie“, e.g., in Germany elementary school kids become graded for the prettiness of their handwriting. I've only ever encountered the word Kalligraphie (now preferred: Kalligrafie) in the meaning of artistic writing in Germany. If the word is also used

SignWriting (was: Encoding localizable sentences)

2013-04-21 Thread Stephan Stiller
sign-writing SignWriting is also difficult to write. naturelly evolved I will be very curious to see the result after a bit of evolution (I hope there will be some), with a system that can actually be written easily by hand (or at least input quickly with the right input method) and that

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Stephan Stiller
In India you could have telegrams containing such sentences delivered in any of the major Indian regional languages. This was a good idea in the days of the low-bandwidth telegraph And it was a domain-restricted application. Stephan

Re: SignWriting

2013-04-21 Thread Stephan Stiller
SignWriting is also difficult to write. Not necessarily more than those that learn writing Chinese. Learning how to write Chinese is difficult. It only takes like 6.5 years of schooling, and when students go abroad for college, they quickly forget how to write many characters. In fact,

Re: SignWriting

2013-04-21 Thread Stephan Stiller
Sing-Writung has both a normative form, to be generated by computer programs, and a handwriting form allowing more freedom. It has been developed using signs that are not so complicate to reproduce in a meaningful way. Could you provide a link with signwritten sentences in the /latest/

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread Stephan Stiller
I am wondering whether it would be a good idea for there to be a list of numbered preset sentences that are an international standard and then if Google chose to front end Google Translate with precise translations of that list of sentences made by professional linguists who are native

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-19 Thread Stephan Stiller
Not perfect, perhaps, but perfectly comprehensible. And the application will even do a very decent job of text to speech for you. and The quality of the translation for these kinds of applications has rapidly improved in recent years Not that the ability of MT to deal with

Re: Encoding localizable sentences

2013-04-19 Thread Stephan Stiller
As regards any possible case for encoding localizable sentences *as characters*, in my opinion, the train long ago left the station for that one. Indeed, people have been devising systems for representing words and sentences via ordinary numbers that worked just fine for at least 170

Re: In 2013, there are still programs with huge Unicode bugs :-(

2013-03-22 Thread Stephan Stiller
This one is incredible: https://bugzilla.redhat.com/show_bug.cgi?id=922433 This sort of failure to perform input validation and/or escaping is also a sign of bad software engineering in general. I recall an important CGI form of my university refusing to let me submit because I input an

  1   2   >