On Jun 4, 2015, at 17:34 , Markus Scherer markus@gmail.com wrote:
Looks all wrong to me.
don’t is a contraction of two words, it is not one word.
Yes it is. Is keyboard two words? How about newspaper?
If don't is two words, please tell me what two words make up won't? (Hint,
neither
On Mar 27, 2015, at 15:57 , Michael Norton michaelanortons...@gmail.com
mailto:michaelanortons...@gmail.com wrote:
Why wouldn't Unicode itself have it?
Because as Ken explained, acquiring (and constantly updating) such statistics
would require roughly the effort that Google puts into its
- Indentation codepoint, with no fixed defined graphical representation. For
indentation based programming languages.
That wouldn’t be compliant with existing languages and future languages might
use any existing character.
Because:
-- specific clients may want to show it different
Hi there -
I'm working on an information retrieval application for a collection of Chinese
documents, which appear to use a mix of traditional and simplified characters.
My intuition is that it makes sense to do traditional to simplified folding for
indexing and query processing (when the
Stephan Stiller wrote:
I sometimes have a closing dash and sometimes not
And let's not forget that one often has what is semantically a pair of
parenthetical dashes, either the opening or the closing component of which is
eaten up by the beginning or the end of the sentence, resp. These
What has this to do with Unicode???
- John Burger
MITRE
On Nov 27, 2012, at 05:14 , N. Ganesan wrote:
There are interviews in Tamil and English language media about
V. A. Shiva Ayyadurai and his work in high school
and later with respect to electronic mail.
A statement issued by MIT
Naena Guru wrote:
I know you do not care about a language of a 15 milllion people, but it
matters to them.
These kinds of straw man arguments are rude and counter-productive. Such a
characterization is highly unlikely to be true for anyone on this list, and
you've just ensured that few of
So here is the idea: why not use the unused part (231 - 221 =
2,145,386,496) to encode all the words of all the languages as well.
You
could then send any word with a few bytes. This would reduce the
bandwidth necessary to send text. (You need at most six bytes to
address
all 231 code points,
Interesting, but perhaps more compelling if the demo worked. For me,
the demo sequence:
01 23 45 67 89 ab cd ef aa 55 a5 5a
becomes:
12?
which round-trips to:
c0 ff 03 20 12 34 56 78 9a bc de fa
I assume that, instead, the original hex sequence is expected.
- John Burger
MITRE
James Kass wrote:
The final freebie concerns the practice of copying the author and
others when replying to a posting. Many people have indicated that
they dislike getting multiple copies, while others have stated that
they actually like getting copied.
...
But, I don't really need to get two
John Cowan wrote:
What I've never understood is why Unicode is so adamant that the ' of
English words is a punctuation mark, not a letter; why when disambiguating
U+0027, English apostrophe is to be mapped to U+2019 and not U+02BC.
It's true that historically isn't is derived from is not,
Suzanne M. Topping wrote:
There was a comedian in the 1970's (I remember him from the children's
public television show Electric Company) who pronounced punctuation
phonetically while reading various passages. So it wasn't words for
the symbols, it was sounds.
Victor Borge - very funny bit.
12 matches
Mail list logo