On 18/05/2017 1:58 am, Alastair Houghton via Unicode wrote:
On 18 May 2017, at 07:18, Henri Sivonen via Unicode wrote:
the decision complicates U+FFFD generation when validating UTF-8 by state
machine.
It *really* doesn’t. Even if you’re hell bent on using a pure state
Here's an icosahedral dice from the Ptolemaic period:
http://www.metmuseum.org/collection/the-collection-online/search/551070
I find myself idly wondering whether the identities of the characters
are all known and encoded ...
Cheers
—Jonathan
___
Just curious ... Is there a free font anywhere that contains these
characters? Either in the PUA of a Unicode font or encoded somehow in an
8-bit font?
Thanks
Jonathan Coxhead
72 Rock Harbor Ln, Foster City CA 94404
+1-650-430-6564 (m)
On 2010-06-30 9:26 am, Mark Davis ☕ wrote
. Rtl and ltr are only meaningful
in horizontal writing.
But the LTR text would be rotated clockwise, and the RTL anticlockwise,
wouldn't it? So there is *an* interaction---just not the same one.
--
/| Jonathan Coxhead
o o o (_|/
/| Sunnyvale CA
(_/
in the desert.
--
/| Jonathan Coxhead
o o o (_|/
/| Sunnyvale CA USA
(_/
titled UTF-16 should have contents
FE FF 44 37
or
FF FE 37 44
The box titled UTF-32 should have contents
00 00 FE FF
00 00 44 37
or
FF FE 00 00
37 44 00 00
Cheers ...
--
/| Jonathan Coxhead
o o o (_|/
/|
(_/
Michael, especially given the minimal
examples of usage and justification for encoding provided in the proposal.
Andrew
--
/| Jonathan Coxhead
o o o (_|/
/| Kucinich for President! http://www.kucinich.us
(_/
My take on Cleanicode, the Atomic Theory of Unicode, can be found at
http://www.doves.demon.co.uk/atomic.html. It is very much a software engineer's
view of character coding.
The characters START GROUP and POP DIRECTIONAL FORMATTING are used as
brackets. Yes, it could involve arbitrary
On 22 Oct 2003, at 6:53, John Cowan wrote:
Kent Karlsson scripsit:
All of CR, LF, CR, LF, NEL, LS, PS, and EOF(!). (Assuming that the
encoding of the text file is recognised.)
XML 1.0 treats CR, LF, and CR, LF as line terminators and reports
them as LF.
XML 1.1 will treat CR, LF,
On 21 Oct 2003, at 12:01, Jill Ramonsky wrote:
I would be more than grateful if someone could point me in the direction
of a DEFINITVE specification which claims this is not the case, that the
interpretion of \n as anything other than LF may be considered
conformant behaviour.
The C
On 28 Jul 2003, at 16:49, Kenneth Whistler wrote:
Part of the specification of the Unicode normalization algorithm
is idempotency *across* versions, so that addition of new
characters to the standard, which require extensions of the
tables for decomposition, recomposition, and composition
That's a very long-winded way of writing it!
How about this:
#!/usr/bin/perl -pi~ -0777
# program to remove a leading UTF-8 BOM from a file
# works both STDIN - STDOUT and on the spot (with filename as argument)
s/^\xEF\xBB\xBF//s;
which uses perl's -p, -i and -0
On 26 Jul 2002, at 23:23, Curtis Clark wrote:
Are you saying that, even though Unicode defines U+0027 as
punctuation, other, I could use it as a glottal stop and create a locale
that would treat it as a letter (and still be Unicode compliant,
whatever that is?).
If my name is
The way these concepts were explained to me was as visual order (the order
as you see it with your eyes, as defined by the writing system) and aural
order (the order you hear it with your ears, as defined by pronunciation of
the spoken language).
Neither of these is more or less logical
On 27 Feb 2002, at 14:42, John Cowan wrote:
[EMAIL PROTECTED] wrote:
There is a point of view that says that *all* such identifiers (for
countries, languages, etc.) should just be randomly generated strings for the
kind of reasons mentioned. (Not that I'm arguing for that.)
It is
I can't resist transcribing the following, which is a quotation from
_Love_and_Sleep_ by John Crowley (Bantam Books, 1994). (It's fiction.)
|There are many Monarchs, and many Princes, but only one Emperor. Rudolf
| II, King in his own right of Hungary and Bohemia, Archduke of Austria,
On 28 Mar 01, at 12:02, Marco Cimarosti wrote:
struct MyWysiwygGlyph
{
wchar_t GlyphCode;
int EmbeddingLevel;
};
I think that Roozbeh had something quite similar in mind.
Yes. I was not sure that if that's enough, but after this
It would be very entertaining to do the same job with the ideographs (down
to the radical level) and count the number of atoms. I suspect the resulting
"character set" would contain less than 2000 atoms altogether.
MichKa replied ...
More than just entertaining, one would definitely find
Suzanne M Topping wrote,
In hunting around for negative opinions about Unicode, ...
Markus Scherer wrote,
Let me add one complaint to your list:
Thai is not stored/used in logical order in Unicode.
and Michael Kaplan wrote,
And your suggestion for characters that sort
UTR 19 paragraph D36c(a) contains a reference to 'UTF-32BE' that should read
'UTF-32', I think.
/|
o o o (_|/
/|
(_/
Oh, by the way, if 12 is a dozen and 144 is a gross,
what are 16 and 256?
272
In TeX, the difference is that an EM QUAD (\qquad) and an EN QUAD
(\quad) provide spaces that are legitimate breakpoints for lines within a
paragraph; while EM SPACE, EN SPACE (\enspace) and THIN SPACE (\thinspace)
produce horizontal space that cannot cause a line-break.
My assumption
22 matches
Mail list logo