Re: Another take on the English apostrophe in Unicode

2015-06-05 Thread John D. Burger
On Jun 4, 2015, at 17:34 , Markus Scherer markus@gmail.com wrote: Looks all wrong to me. don’t is a contraction of two words, it is not one word. Yes it is. Is keyboard two words? How about newspaper? If don't is two words, please tell me what two words make up won't? (Hint, neither

Re: Usage stats?

2015-03-27 Thread John D. Burger
On Mar 27, 2015, at 15:57 , Michael Norton michaelanortons...@gmail.com mailto:michaelanortons...@gmail.com wrote: Why wouldn't Unicode itself have it? Because as Ken explained, acquiring (and constantly updating) such statistics would require roughly the effort that Google puts into its

Re: Unicode block for programming related symbols and codepoints?

2015-02-09 Thread John D Burger
- Indentation codepoint, with no fixed defined graphical representation. For indentation based programming languages. That wouldn’t be compliant with existing languages and future languages might use any existing character. Because: -- specific clients may want to show it different

Hanzi trad-simp folding and z-variants

2013-06-06 Thread John D. Burger
Hi there - I'm working on an information retrieval application for a collection of Chinese documents, which appear to use a mix of traditional and simplified characters. My intuition is that it makes sense to do traditional to simplified folding for indexing and query processing (when the

Re: Public Review Issue 232 Proposed Update UAX #9, Unicode Bidirectional Algorithm (Copy of email sent to the list; also posted by me to unicode feedback/public review issue-- but this has not yet po

2013-01-31 Thread John D. Burger
Stephan Stiller wrote: I sometimes have a closing dash and sometimes not And let's not forget that one often has what is semantically a pair of parenthetical dashes, either the opening or the closing component of which is eaten up by the beginning or the end of the sentence, resp. These

Re: First known use of the word, email (1978)

2012-11-27 Thread John D. Burger
What has this to do with Unicode??? - John Burger MITRE On Nov 27, 2012, at 05:14 , N. Ganesan wrote: There are interviews in Tamil and English language media about V. A. Shiva Ayyadurai and his work in high school and later with respect to electronic mail. A statement issued by MIT

Re: Romanized Singhala - Think about it again

2012-07-05 Thread John D Burger
Naena Guru wrote: I know you do not care about a language of a 15 milllion people, but it matters to them. These kinds of straw man arguments are rude and counter-productive. Such a characterization is highly unlikely to be true for anyone on this list, and you've just ensured that few of

Re: Unicode for words?

2004-12-05 Thread John D. Burger
So here is the idea: why not use the unused part (231 - 221 = 2,145,386,496) to encode all the words of all the languages as well. You could then send any word with a few bytes. This would reduce the bandwidth necessary to send text. (You need at most six bytes to address all 231 code points,

Re: base16k - Efficient Binary Data Encoding in Unicode Text

2004-05-29 Thread John D. Burger
Interesting, but perhaps more compelling if the demo worked. For me, the demo sequence: 01 23 45 67 89 ab cd ef aa 55 a5 5a becomes: 12? which round-trips to: c0 ff 03 20 12 34 56 78 9a bc de fa I assume that, instead, the original hex sequence is expected. - John Burger MITRE

OT: Notice of Change to Unicode mail list posting

2004-05-28 Thread John D. Burger
James Kass wrote: The final freebie concerns the practice of copying the author and others when replying to a posting. Many people have indicated that they dislike getting multiple copies, while others have stated that they actually like getting copied. ... But, I don't really need to get two

Re: FW: New version of TR29:

2002-08-20 Thread John D. Burger
John Cowan wrote: What I've never understood is why Unicode is so adamant that the ' of English words is a punctuation mark, not a letter; why when disambiguating U+0027, English apostrophe is to be mapped to U+2019 and not U+02BC. It's true that historically isn't is derived from is not,

Re: Saying characters out loud (derives from hash, pound,octothorpe?)

2002-07-11 Thread John D. Burger
Suzanne M. Topping wrote: There was a comedian in the 1970's (I remember him from the children's public television show Electric Company) who pronounced punctuation phonetically while reading various passages. So it wasn't words for the symbols, it was sounds. Victor Borge - very funny bit.