On 04/06/2013 09:36 AM, William_J_G Overington wrote:
Text is for reading by humans.
QR codes are for reading by computers.
I wondered if it would be possible to have images that could be read by both
humans and computers.
Sure. Just set the error-correction high, and write over the top
On 07/30/2012 02:12 PM, Doug Ewell wrote:
Please, no more conspiracy theories.
Yes. If this goes on, I'll find it impossible to refrain from telling
you all my theories about the ANSI-INCITS 154-1988 (R1999) keyboard.
And nobody wants that.
On 2011-11-23 10:38, Jeremie Hornus wrote:
I was thinking the ID being the code point value itself, and the name
a human readable description of it.
They are both IDs. One is from the range of numbers from 0 to 1114111
(10 base 16), the other is from the range of strings of characters
António MARTINS-Tuválkin wrote:
If the EU can tell Britain that it can't sell eggs by the dozen any
more,
Yesterday I bought a dozen eggs (2 racks of 6, set 2×3) here in Portugal.
This must be an incredibly new regulation.
The Daily Mail isn't as easily available in Portugal. It's one of
, but will not export
UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
storage form).
Odd indeed.
Regards,
Jon Hanna
http://www.selkieweb.com/
For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif
Looks like {U+062D, U+20DD}
For a sample, see http://www.uni-mainz.de/~knappen/saudi.gif
Looks like {U+062D, U+20DD}
Yes, it does look like that. But it forms a separate entity,
just like its precedents COPYRIGHT SIGN or SOUND RECORDING
COPYRIGHT SIGN or REGISTERED.
All of which were in existing standards, so
The W3C Character Model does not, or will not since it's not yet a
Recommendation, allow text nodes or attribute values to begin with defective
combining character sequences.
--
Jon Hanna
http://www.hackcraft.net/
What's a false move? Is it very different from a real one?
Quoting Philipp Reichmuth [EMAIL PROTECTED]:
Jon Hanna schrieb:
The W3C Character Model does not, or will not since it's not yet a
Recommendation, allow text nodes or attribute values to begin with
defective
combining character sequences.
What am I supposed do when I need a black
as the
character U+226F.
By the rules of XML replacing #x338; with U+226F would mean the document was
no longer well-formed.
So even without an explicit spec saying otherwise the above would be
problematic.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words
typewriter keyboard inventor in their line, but no famous composers.
--
Jon Hanna
http://www.hackcraft.net/
Write a wise saying and your name will live forever - Anonymous
on the sort dialog.
--
Jon Hanna
http://www.hackcraft.net/
It is the most shattering experience of a young man's life when he awakes
and quite reasonably says to himself, 'I will never play The Dane.'
WITH MACRON AND DIAERESIS
U+1E7B LATIN SMALL LETTER U WITH MACRON AND DIAERESIS
If so, would anyone know from where a Windows XP font containing these five
characters could be download?
Arial Unicode has at least some of them.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said
of
their similar use in O'Reilly Associates publications. But sure, go and look
for examples (not in driver's testing materials - the point there is to
represent what one would see while driving, so they're clearly pictures in that
context).
--
Jon Hanna
http://www.hackcraft.net/
it has been truly
demonstration that the
Gods laugh at all plans.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
is variable.
Are they very variable? I can only think of the one substitution suggested by
Crowley. Are there others, outside of toy decks?
Plain text is going to end up a lot less plain...
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment
encryption algorithm. http://www.schneier.com/solitaire.html
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
they are finalised.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
Quoting Michael Everson [EMAIL PROTECTED]:
At 15:39 +0100 2004-05-21, Jon Hanna wrote:
Were the headers correct?
It is plain text.
HTTP has headers separate to the content (the headers come first and the content
comes next). These headers can contain encoding information and other details
Quoting [EMAIL PROTECTED] [EMAIL PROTECTED]:
Jon Hanna scripsit:
[T]he default encoding on the server (which really should be utf-8
on www.unicode.org at this stage).
Currently it is, but there are sticky issues: in particular, a default
encoding
overrides information in HTML meta
entirety, to the bottom, not as something starting at the top and continuing
towards the bottom.
In summary, TTB, not T2B, please.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
composed of a BTT
passage, a LTR passage and a TTB passage, but of a single passage which follows
a path which changes through those three directions.
Paths are not a plain text matter.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment
+002F SOLIDUS.
[1]Indeed the format of UTF-8 would make it possible to unambiguously encode
any
value up to 0xFF but this exceeds the ISO 10646 codepoint space and it
would break one of UTF-8's design goals in requiring the use of the octet FE.
--
Jon Hanna
http://www.hackcraft.net/
it has
of a custom encoding to do what they want.
If you think of the users of an encoding as a social network then we would
expect something like Metcalf's or Reed's law to affect it. The bigger the
network the better off they'll be. Unicode has the biggest network.
--
Jon Hanna
http://www.hackcraft.net
for European languages, never mind any others) but those
problems are considerably less than existed previously and ISO-8859-17+ is
always going to be inferior to UTF-8 or UTF-16.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures
, never mind any other use of that encoding.
Do you really think the same would be true of ISO 8859-17?
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
in developping and using a
new 8-bit encoding.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
as possible, anything that gets more than 50% accuracy
should be considered a successful approach in that context.
If the authorities find the author I doubt the robustness of the
content-language heuristic will be top of the list of things they want to
discuss.
--
Jon Hanna
http://www.hackcraft.net
, that brings me back. All those characters that were BASIC keywords
compressed into one octet. How could we have neglected to encode such important
legacy characters, this unnecessarily complicates round-trip conversion between
ZX80s and Unicode.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly
forward to reading it.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
made into mountains.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
This clause is informative.
(...)
The name C# is pronounced C Sharp.
The name C# is written as the LATIN CAPITAL LETTER C (U+0043) followed
by the NUMBER SIGN # (U+000D).
End of informative text.
Gotta love a language with a carriage return in it's name :)
--
Jon
an Irish person writes an i without a dot, an English person writes it
with a dot, or a 12 year old girl penning a valentine card writes it with a
heart it is still the letter i.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures
Fine. I concede that this is the case. Therefore, let's change the
underlying
form of 0069 to a dotless i and let English speakers change it to a
dotted
i with the font.
I am happy to inform you that the underlying form doesn't have a dot.
--
Jon Hanna
http://www.hackcraft.net/
it has
Quoting Marion Gunn [EMAIL PROTECTED]:
how to guarantee continuance,
in the specific context of Irish text computing, of the traditional
restriction of the Irish diacritic dot (having only one single function in
Irish) to the consonants to which it belongs?
A spell checker.
--
Jon Hanna
in The Hunt
for Red October or my bad handwriting.
I agree with you that the pseudo-Irish script is unsightly, and i is not the
most abused, though it does run the risk of being confused with í. However I
suspect that a large number are not non-native, but were in fact created
here.
--
Jon Hanna
can only bring the
language-independent ones to mind right now.
There is a language-independent decomposition of LATIN CAPITAL LETTER I WITH DOT
ABOVE to LATIN CAPITAL LETTER I and COMBINING DOT ABOVE.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words
be safely placed
straight into the source.
--
Jon Hanna
http://www.hackcraft.net/
it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people. - jargon.txt
to it as Unicode and Unicode (Big Endian) depending on which of
the two pages I viewed.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
ISO 8859-1 and even a few that get downright confused by
anything that isn't ASCII. Who knows, maybe there are even people using them!
In any case, browsers that don't support UTF-8 and UTF-16 are now a very small
minority.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes
of this very small minority which don't support
UTF-8 _and_ UTF-16 ?
Or it might just be that it's relatively hard to mis-identify UTF-16, and hence
it doesn't need to be given as a user-override.
Have you tested with it?
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
of
sharing data rather than passing the data directly as a parameter.
Neither of these are ideal, if something better occurs to me I'll let you know.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
) returns the code page of the locale set by
setlocale
I'm not sure, but GetLocaleInfo seems to allow you to obtain codepage info if
you know the locale id.
http://msdn.microsoft.com/library/en-us/intl/nls_34rz.asp
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
other features which individual astrologers have invented
symbols for). Though it has made me think that it would be nice to gloss U+206A
ASCENDING NODE with Dragon's Head and U+206B DESCENDING NODE with Dragon's
Tail, if only because the terms are so poetic.
--
Jon Hanna
http://www.hackcraft.net
dealt with bureaucracies using such a system in the past.
It's all become clear now.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
By the way, I don't think that there's an official reference that attributes
the acronym UTF-9 to any of these encoding forms. I think that if UTF-9
is used it should be agreed by Unicode as being an official unique
representation.
I refuse to rename my UTF-81920!
--
Jon Hanna
http
Quoting Marco Cimarosti [EMAIL PROTECTED]:
Jon Hanna wrote:
I refuse to rename my UTF-81920!
Doug, Shlomi, there's a new one out there!
Jon, would you mind describing it?
There are two different UTF-81920s (the resultant ambiguity is very much in the
spirit of UTF-81920).
The first
Quoting Philippe Verdy [EMAIL PROTECTED]:
From: Jon Hanna [EMAIL PROTECTED]
Quoting Marco Cimarosti [EMAIL PROTECTED]:
Jon Hanna wrote:
I refuse to rename my UTF-81920!
Doug, Shlomi, there's a new one out there!
Jon, would you mind describing it?
There are two different
in the 1.1 spec if they appear as character references - so
this no longer holds (unless you store them as references or otherwise escaped,
which would bring its own issues).
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
it to be on the
safe side.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
will be UTF-8 in the default locale.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
The windows name for en_US.UTF8 is English_United States.65001, .65001
will be UTF-8 in the default locale.
More on this at the MS documentation for setlocale
http://msdn.microsoft.com/library/en-us/vclib/html/_crt_setlocale.2c_._wsetlocale.asp
--
Jon Hanna
http://www.hackcraft.net/
*Thought
this is so beyond the names of the locales.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
about having the word ghoti for
fish isn't as funny.
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
to it in the Klingon
lexicon is funny (now if it was spelt ghoti but pronounced fish then it
would be silly).
--
Jon Hanna
http://www.hackcraft.net/
*Thought provoking quote goes here*
I have no idea whether that's the same conference, but in early 1970's
it's also decided that the abbreviation 'GMT' would be deprecated
and 'UTC' should be used in its place. ...
There are two subtly different definitions of GMT, one which is synonymous with UTC
and one which differs from
From a practical standpoint, I think it is more likely that the base will
change rather than the hex characters.
After all, digits have been constant for a long time, but the base has
changed. Initially it was binary, then it was octal, and now hex
arithmetic is
common.
No, first it was
Jon I was mostly being tongue in cheek and contrasting that relative to
needing new hex digits, a base change was more likely. However, I wasn't
saying that a base change is likely.
And I was being tongue in cheek (and ignorant of Ethiopian script) in
suggesting the use of base 256. However we
OK, it's safe, but it is a misuse of Unicode. As space plus combining
character is a unit in Unicode, it should be treated as a unit by higher
level protocols. If higher level protocols are allowed to do arbitrary
things within Unicode units, there is no end to the possible confusion.
See for
the
solution with
SPACE is really tricky due to the special treatment of SPACE notably
in HTML, SGML, XML
I disagree. There are a few different things that happen with whitespace in
such technologies. Some of these only apply to elements that do not allow
any character data apart from
what code are we talking about that has to work from the
positions of the combining marks back to the underlying representation?
Such code is not just common and widespread, it is practically ubiquitous.
The principle of base characters always coming first are used:
Whenever you need to
3) In attribute values that have a declared type other than
CDATA, multiple
spaces are compressed to a single space, and leading and
trailing spaces
are removed. After this is done, there can be no spaces in attributes
of type ID, IDREF, ENTITY, NMTOKEN, NOTATION, or enumerated
I might be able to help. Two questions:
1. How firmly have you tracked down the point at which this conversion
happens?
2. What is the datatype in the database? (text BLOB?, ntext BLOB? varchar?)
The only way to bypass this would be to use entitiy references to encode
the base space needed by the Unicode convention, so this is related to
what Unicode defines as a higher level protocol, needed here to bypass
the limitations of basic text. However it still creates a problem within
CDATA
(provided that the whitespace normalization algorithm will not
include ZWSP in the whitespaces sequence and treat it
isolately, something that a conforming HTML or XML processor
should not do, as it should unify only sequences of SPACE,
TAB, CR, LF, and only according to the context of the
Of course one is not required to build an actual DOM tree,
however XML, HTML
and alike is now defined in terms of the DOM, where the text/xml syntax is
just a serialization, which is the only place where whitespaces
normalization is defined (such normalization does not occur at the DOM
For me the term difficult is inappropriate. In fact it is invalid for
interoperability (even though it is valid, not forbidden, for
ISO10646/Unicode, as an string fragment for intermediate processing),
and such sequence should not occur in actual documents, out of any
external processing
should should be taken as
giving an obligation or only a recommendation?
I like the way that RFCs have a well defined meaning for should or
recommended in certain contexts as defined by RFC 2119.
I such contexts these words are taken to mean that, while there might be a
valid reason not to do
eBook, e-mail, eBay, e-money, and all that gunk.
I suppose we could do without them. Even Apple's
gone weird about it. I don't know what the i in
the iLifestyle suite (iChat, iPhoto, iBook,
iThis, iThat) means.
e-jit, iDiot, iMbecile.
The Win32 Text APIs (such as TextOut) actually DO support
UniScribe transparently on Windows XP... In most applications,
this means that the UniScribe support works without requiring
explicit calls to the Uniscribe API.
And Windows2000. However some ways of using the Text APIs will meant that
Discouraged = We think this is a bad thing.
Strongly discouraged = We think this is a very bad thing.
Deprecated = We think this is a bad thing, see no reason to continue using it, and
wish it would go away, but it won't so we have to leave it in the
standard/spec/table/system/format/programming
According
to XML the
default encoding scheme is UTF-8.
Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or*
UTF-16BE, it's trivial to tell which of these an XML document is in by
looking at the first few bytes, as described in Appendix F of the XML Spec
On Tuesday, July 08, 2003 2:22 PM, Jon Hanna [EMAIL PROTECTED] wrote:
According
to XML the
default encoding scheme is UTF-8.
Not strictly true. The default encoding scheme's is UTF-8 *or*
UTF-16LE *or* UTF-16BE,
Wrong also: UTF-16LE and UTF16-BE are not in the default encoding
And cannot in the first few characters (legally), since these must be
?xml .
Wrong: the XML declaration is NOT mandatory, only recommanded.
So a XML document can directly start with its actual content
which may be whitespaces, a XML comment (starting by !--), or
the start tag of the root
74 matches
Mail list logo