Re: A research idea for entering characters

2013-04-06 Thread Jon Hanna
On 04/06/2013 09:36 AM, William_J_G Overington wrote: Text is for reading by humans. QR codes are for reading by computers. I wondered if it would be possible to have images that could be read by both humans and computers. Sure. Just set the error-correction high, and write over the top

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-30 Thread Jon Hanna
On 07/30/2012 02:12 PM, Doug Ewell wrote: Please, no more conspiracy theories. Yes. If this goes on, I'll find it impossible to refrain from telling you all my theories about the ANSI-INCITS 154-1988 (R1999) keyboard. And nobody wants that.

Re: name change

2011-11-29 Thread Jon Hanna
On 2011-11-23 10:38, Jeremie Hornus wrote: I was thinking the ID being the code point value itself, and the name a human readable description of it. They are both IDs. One is from the range of numbers from 0 to 1114111 (10 base 16), the other is from the range of strings of characters

Re: charset parameter in Google Groups

2010-06-30 Thread Jon Hanna
António MARTINS-Tuválkin wrote: If the EU can tell Britain that it can't sell eggs by the dozen any more, Yesterday I bought a dozen eggs (2 racks of 6, set 2×3) here in Portugal. This must be an incredibly new regulation. The Daily Mail isn't as easily available in Portugal. It's one of

RE: outside decomposed, inside precomposed

2004-10-13 Thread Jon Hanna
, but will not export UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal storage form). Odd indeed. Regards, Jon Hanna http://www.selkieweb.com/

RE: Saudi-Arabian Copyright sign

2004-09-19 Thread Jon Hanna
For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif Looks like {U+062D, U+20DD}

RE: Saudi-Arabian Copyright sign

2004-09-19 Thread Jon Hanna
For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif Looks like {U+062D, U+20DD} Yes, it does look like that. But it forms a separate entity, just like its precedents COPYRIGHT SIGN or SOUND RECORDING COPYRIGHT SIGN or REGISTERED. All of which were in existing standards, so

Re: Combining across markup? (Was: RE: sign for anti-neutrino - g ree k nu with diacritical line aboveworkaround ?)

2004-08-10 Thread Jon Hanna
The W3C Character Model does not, or will not since it's not yet a Recommendation, allow text nodes or attribute values to begin with defective combining character sequences. -- Jon Hanna http://www.hackcraft.net/ What's a false move? Is it very different from a real one?

Re: Combining across markup? (Was: RE: sign for anti-neutrino - g ree k nu with diacritical line aboveworkaround ?)

2004-08-10 Thread Jon Hanna
Quoting Philipp Reichmuth [EMAIL PROTECTED]: Jon Hanna schrieb: The W3C Character Model does not, or will not since it's not yet a Recommendation, allow text nodes or attribute values to begin with defective combining character sequences. What am I supposed do when I need a black

RE: Combining across markup? (Was: RE: sign for anti-neutrino - g ree k nu with diacritical line aboveworkaround ?)

2004-08-10 Thread Jon Hanna
as the character U+226F. By the rules of XML replacing #x338; with U+226F would mean the document was no longer well-formed. So even without an explicit spec saying otherwise the above would be problematic. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words

Re: Looking for transcription or transliteration standards latin- arabic

2004-07-09 Thread Jon Hanna
typewriter keyboard inventor in their line, but no famous composers. -- Jon Hanna http://www.hackcraft.net/ Write a wise saying and your name will live forever - Anonymous

Re: alphabetic sorting of IPA and other derived letters

2004-07-08 Thread Jon Hanna
on the sort dialog. -- Jon Hanna http://www.hackcraft.net/ It is the most shattering experience of a young man's life when he awakes and quite reasonably says to himself, 'I will never play The Dane.'

Re: Latin long vowels

2004-06-22 Thread Jon Hanna
WITH MACRON AND DIAERESIS U+1E7B LATIN SMALL LETTER U WITH MACRON AND DIAERESIS If so, would anyone know from where a Windows XP font containing these five characters could be download? Arial Unicode has at least some of them. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said

Re: Proposal to encode dominoes and other game symbols

2004-06-02 Thread Jon Hanna
of their similar use in O'Reilly Associates publications. But sure, go and look for examples (not in driver's testing materials - the point there is to represent what one would see while driving, so they're clearly pictures in that context). -- Jon Hanna http://www.hackcraft.net/ …it has been truly

Re: Proposal to encode dominoes and other game symbols

2004-06-02 Thread Jon Hanna
demonstration that the Gods laugh at all plans. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Re: Proposal to encode dominoes and other game symbols

2004-05-25 Thread Jon Hanna
is variable. Are they very variable? I can only think of the one substitution suggested by Crowley. Are there others, outside of toy decks? Plain text is going to end up a lot less plain... -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment

RE: Proposal to encode dominoes and other game symbols

2004-05-25 Thread Jon Hanna
encryption algorithm. http://www.schneier.com/solitaire.html -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Jon Hanna
they are finalised. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Re: Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Jon Hanna
Quoting Michael Everson [EMAIL PROTECTED]: At 15:39 +0100 2004-05-21, Jon Hanna wrote: Were the headers correct? It is plain text. HTTP has headers separate to the content (the headers come first and the content comes next). These headers can contain encoding information and other details

Re: Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Jon Hanna
Quoting [EMAIL PROTECTED] [EMAIL PROTECTED]: Jon Hanna scripsit: [T]he default encoding on the server (which really should be utf-8 on www.unicode.org at this stage). Currently it is, but there are sticky issues: in particular, a default encoding overrides information in HTML meta

Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-18 Thread Jon Hanna
entirety, to the bottom, not as something starting at the top and continuing towards the bottom. In summary, TTB, not T2B, please. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

2004-05-17 Thread Jon Hanna
composed of a BTT passage, a LTR passage and a TTB passage, but of a single passage which follows a path which changes through those three directions. Paths are not a plain text matter. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment

Re: any unicode conversion tools?

2004-05-07 Thread Jon Hanna
+002F SOLIDUS. [1]Indeed the format of UTF-8 would make it possible to unambiguously encode any value up to 0xFF but this exceeds the ISO 10646 codepoint space and it would break one of UTF-8's design goals in requiring the use of the octet FE. -- Jon Hanna http://www.hackcraft.net/ …it has

RE: Just if and where is the then?

2004-05-06 Thread Jon Hanna
of a custom encoding to do what they want. If you think of the users of an encoding as a social network then we would expect something like Metcalf's or Reed's law to affect it. The bigger the network the better off they'll be. Unicode has the biggest network. -- Jon Hanna http://www.hackcraft.net

Re: Just if and where is the then?

2004-05-05 Thread Jon Hanna
for European languages, never mind any others) but those problems are considerably less than existed previously and ISO-8859-17+ is always going to be inferior to UTF-8 or UTF-16. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures

Re: Just if and where is the then?

2004-05-05 Thread Jon Hanna
, never mind any other use of that encoding. Do you really think the same would be true of ISO 8859-17? -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Re: Just if and where is the then?

2004-05-05 Thread Jon Hanna
in developping and using a new 8-bit encoding. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

RE: [OT] Even viruses are now i18n!

2004-04-23 Thread Jon Hanna
as possible, anything that gets more than 50% accuracy should be considered a successful approach in that context. If the authorities find the author I doubt the robustness of the content-language heuristic will be top of the list of things they want to discuss. -- Jon Hanna http://www.hackcraft.net

ZX80 (was: Fixed Width Spaces (was: Printing and Displaying DependentVowels))

2004-04-01 Thread Jon Hanna
, that brings me back. All those characters that were BASIC keywords compressed into one octet. How could we have neglected to encode such important legacy characters, this unnecessarily complicates round-trip conversion between ZX80s and Unicode. -- Jon Hanna http://www.hackcraft.net/ …it has been truly

Re: ZX80 (was: Fixed Width Spaces (was: Printing and Displaying DependentVowels))

2004-04-01 Thread Jon Hanna
forward to reading it. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Re: [OT] C-sharp

2004-03-23 Thread Jon Hanna
made into mountains. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

Re: [OT] C-sharp

2004-03-23 Thread Jon Hanna
This clause is informative. (...) The name C# is pronounced C Sharp. The name C# is written as the LATIN CAPITAL LETTER C (U+0043) followed by the NUMBER SIGN # (U+000D). End of informative text. Gotta love a language with a carriage return in it's name :) -- Jon

Re: Irish dotless I (was: Languages with letters that always take diacriticals

2004-03-19 Thread Jon Hanna
an Irish person writes an i without a dot, an English person writes it with a dot, or a 12 year old girl penning a valentine card writes it with a heart it is still the letter i. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures

Re: Irish dotless I

2004-03-19 Thread Jon Hanna
Fine. I concede that this is the case. Therefore, let's change the underlying form of 0069 to a dotless i and let English speakers change it to a dotted i with the font. I am happy to inform you that the underlying form doesn't have a dot. -- Jon Hanna http://www.hackcraft.net/ …it has

(no subject)

2004-03-18 Thread Jon Hanna
Quoting Marion Gunn [EMAIL PROTECTED]: how to guarantee continuance, in the specific context of Irish text computing, of the traditional restriction of the Irish diacritic dot (having only one single function in Irish) to the consonants to which it belongs? A spell checker. -- Jon Hanna

Re: OT? Languages with letters that always take diacriticals

2004-03-16 Thread Jon Hanna
in The Hunt for Red October or my bad handwriting. I agree with you that the pseudo-Irish script is unsightly, and i is not the most abused, though it does run the risk of being confused with í. However I suspect that a large number are not non-native, but were in fact created here. -- Jon Hanna

Re: OT? Languages with letters that always take diacriticals

2004-03-16 Thread Jon Hanna
can only bring the language-independent ones to mind right now. There is a language-independent decomposition of LATIN CAPITAL LETTER I WITH DOT ABOVE to LATIN CAPITAL LETTER I and COMBINING DOT ABOVE. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words

Re: #37;

2004-03-15 Thread Jon Hanna
be safely placed straight into the source. -- Jon Hanna http://www.hackcraft.net/ …it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people. - jargon.txt

RE: websites

2004-02-24 Thread Jon Hanna
to it as Unicode and Unicode (Big Endian) depending on which of the two pages I viewed. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: websites

2004-02-23 Thread Jon Hanna
ISO 8859-1 and even a few that get downright confused by anything that isn't ASCII. Who knows, maybe there are even people using them! In any case, browsers that don't support UTF-8 and UTF-16 are now a very small minority. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes

Re: websites

2004-02-23 Thread Jon Hanna
of this very small minority which don't support UTF-8 _and_ UTF-16 ? Or it might just be that it's relatively hard to mis-identify UTF-16, and hence it doesn't need to be given as a user-override. Have you tested with it? -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: inconsistent behaviour in windows

2004-02-19 Thread Jon Hanna
of sharing data rather than passing the data directly as a parameter. Neither of these are ideal, if something better occurs to me I'll let you know. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: extracting code page of current locale

2004-02-12 Thread Jon Hanna
) returns the code page of the locale set by setlocale I'm not sure, but GetLocaleInfo seems to allow you to obtain codepage info if you know the locale id. http://msdn.microsoft.com/library/en-us/intl/nls_34rz.asp -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: Astrological symbols

2004-02-05 Thread Jon Hanna
other features which individual astrologers have invented symbols for). Though it has made me think that it would be nice to gloss U+206A ASCENDING NODE with Dragon's Head and U+206B DESCENDING NODE with Dragon's Tail, if only because the terms are so poetic. -- Jon Hanna http://www.hackcraft.net

RE: Panther PUA behavior

2004-02-03 Thread Jon Hanna
dealt with bureaucracies using such a system in the past. It's all become clear now. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
By the way, I don't think that there's an official reference that attributes the acronym UTF-9 to any of these encoding forms. I think that if UTF-9 is used it should be agreed by Unicode as being an official unique representation. I refuse to rename my UTF-81920! -- Jon Hanna http

[OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
Quoting Marco Cimarosti [EMAIL PROTECTED]: Jon Hanna wrote: I refuse to rename my UTF-81920! Doug, Shlomi, there's a new one out there! Jon, would you mind describing it? There are two different UTF-81920s (the resultant ambiguity is very much in the spirit of UTF-81920). The first

Re: [OT] UTF-81920 was RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Jon Hanna
Quoting Philippe Verdy [EMAIL PROTECTED]: From: Jon Hanna [EMAIL PROTECTED] Quoting Marco Cimarosti [EMAIL PROTECTED]: Jon Hanna wrote: I refuse to rename my UTF-81920! Doug, Shlomi, there's a new one out there! Jon, would you mind describing it? There are two different

Re: Unicode forms for internal storage

2004-01-21 Thread Jon Hanna
in the 1.1 spec if they appear as character references - so this no longer holds (unless you store them as references or otherwise escaped, which would bring its own issues). -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: Cuneiform Free Variation Selectors

2004-01-20 Thread Jon Hanna
it to be on the safe side. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: UTF8 locale shell encoding

2004-01-16 Thread Jon Hanna
will be UTF-8 in the default locale. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: UTF8 locale shell encoding

2004-01-16 Thread Jon Hanna
The windows name for en_US.UTF8 is English_United States.65001, .65001 will be UTF-8 in the default locale. More on this at the MS documentation for setlocale http://msdn.microsoft.com/library/en-us/vclib/html/_crt_setlocale.2c_._wsetlocale.asp -- Jon Hanna http://www.hackcraft.net/ *Thought

Re: UTF8 locale shell encoding

2004-01-16 Thread Jon Hanna
this is so beyond the names of the locales. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: Klingon

2004-01-15 Thread Jon Hanna
about having the word ghoti for fish isn't as funny. -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

Re: Klingon

2004-01-15 Thread Jon Hanna
to it in the Klingon lexicon is funny (now if it was spelt ghoti but pronounced fish then it would be silly). -- Jon Hanna http://www.hackcraft.net/ *Thought provoking quote goes here*

[Still OT] RE: UTC vs GMT (was [way OT] Beer measurement...)

2003-08-20 Thread Jon Hanna
I have no idea whether that's the same conference, but in early 1970's it's also decided that the abbreviation 'GMT' would be deprecated and 'UTC' should be used in its place. ... There are two subtly different definitions of GMT, one which is synonymous with UTC and one which differs from

RE: Hexadecimal never again

2003-08-20 Thread Jon Hanna
From a practical standpoint, I think it is more likely that the base will change rather than the hex characters. After all, digits have been constant for a long time, but the base has changed. Initially it was binary, then it was octal, and now hex arithmetic is common. No, first it was

RE: Hexadecimal never again

2003-08-20 Thread Jon Hanna
Jon I was mostly being tongue in cheek and contrasting that relative to needing new hex digits, a base change was more likely. However, I wasn't saying that a base change is likely. And I was being tongue in cheek (and ignorant of Ethiopian script) in suggesting the use of base 256. However we

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Jon Hanna
OK, it's safe, but it is a misuse of Unicode. As space plus combining character is a unit in Unicode, it should be treated as a unit by higher level protocols. If higher level protocols are allowed to do arbitrary things within Unicode units, there is no end to the possible confusion. See for

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Jon Hanna
the solution with SPACE is really tricky due to the special treatment of SPACE notably in HTML, SGML, XML I disagree. There are a few different things that happen with whitespace in such technologies. Some of these only apply to elements that do not allow any character data apart from

RE: Conflicting principles

2003-08-14 Thread Jon Hanna
what code are we talking about that has to work from the positions of the combining marks back to the underlying representation? Such code is not just common and widespread, it is practically ubiquitous. The principle of base characters always coming first are used: Whenever you need to

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Jon Hanna
3) In attribute values that have a declared type other than CDATA, multiple spaces are compressed to a single space, and leading and trailing spaces are removed. After this is done, there can be no spaces in attributes of type ID, IDREF, ENTITY, NMTOKEN, NOTATION, or enumerated

RE: ADO, SQL-Server and VB6

2003-08-14 Thread Jon Hanna
I might be able to help. Two questions: 1. How firmly have you tracked down the point at which this conversion happens? 2. What is the datatype in the database? (text BLOB?, ntext BLOB? varchar?)

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Jon Hanna
The only way to bypass this would be to use entitiy references to encode the base space needed by the Unicode convention, so this is related to what Unicode defines as a higher level protocol, needed here to bypass the limitations of basic text. However it still creates a problem within CDATA

RE: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

2003-08-14 Thread Jon Hanna
(provided that the whitespace normalization algorithm will not include ZWSP in the whitespaces sequence and treat it isolately, something that a conforming HTML or XML processor should not do, as it should unify only sequences of SPACE, TAB, CR, LF, and only according to the context of the

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Jon Hanna
Of course one is not required to build an actual DOM tree, however XML, HTML and alike is now defined in terms of the DOM, where the text/xml syntax is just a serialization, which is the only place where whitespaces normalization is defined (such normalization does not occur at the DOM

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Jon Hanna
For me the term difficult is inappropriate. In fact it is invalid for interoperability (even though it is valid, not forbidden, for ISO10646/Unicode, as an string fragment for intermediate processing), and such sequence should not occur in actual documents, out of any external processing

RE: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Jon Hanna
should should be taken as giving an obligation or only a recommendation? I like the way that RFCs have a well defined meaning for should or recommended in certain contexts as defined by RFC 2119. I such contexts these words are taken to mean that, while there might be a valid reason not to do

RE: [OT] French Government Bans the Term 'E-Mail'

2003-07-21 Thread Jon Hanna
eBook, e-mail, eBay, e-money, and all that gunk. I suppose we could do without them. Even Apple's gone weird about it. I don't know what the i in the iLifestyle suite (iChat, iPhoto, iBook, iThis, iThat) means. e-jit, iDiot, iMbecile.

RE: Combining diacriticals and Cyrillic

2003-07-11 Thread Jon Hanna
The Win32 Text APIs (such as TextOut) actually DO support UniScribe transparently on Windows XP... In most applications, this means that the UniScribe support works without requiring explicit calls to the Uniscribe API. And Windows2000. However some ways of using the Text APIs will meant that

RE: Deprecated vs. strongly discouraged?

2003-07-09 Thread Jon Hanna
Discouraged = We think this is a bad thing. Strongly discouraged = We think this is a very bad thing. Deprecated = We think this is a bad thing, see no reason to continue using it, and wish it would go away, but it won't so we have to leave it in the standard/spec/table/system/format/programming

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Jon Hanna
According to XML the default encoding scheme is UTF-8. Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or* UTF-16BE, it's trivial to tell which of these an XML document is in by looking at the first few bytes, as described in Appendix F of the XML Spec

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Jon Hanna
On Tuesday, July 08, 2003 2:22 PM, Jon Hanna [EMAIL PROTECTED] wrote: According to XML the default encoding scheme is UTF-8. Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or* UTF-16BE, Wrong also: UTF-16LE and UTF16-BE are not in the default encoding

RE: UTF-8 to UTF-16LE

2003-07-08 Thread Jon Hanna
And cannot in the first few characters (legally), since these must be ?xml . Wrong: the XML declaration is NOT mandatory, only recommanded. So a XML document can directly start with its actual content which may be whitespaces, a XML comment (starting by !--), or the start tag of the root