Somya asked:
I have unicode C application. I am using the following macro
to define my string
to 2 byte width characters.
#ifdef UNICODE
#define _T(x) L##x
But I see that GCC compiler maps 'L' to wchar_t, which is 4 byte on Linux. I
have used -fshort-wchar option
on Linux but I
FA47 is a compatibility character, and would have a compatibility mapping.
Faulty syllogism.
FA47 is a CJK Compatibility character, which means it was encoded
for compatibility purposes -- in this case to cover the round-trip
mapping needed for JIS X 0213.
However, it has a *canonical*
Asmus replied:
On 11/15/2010 2:24 PM, Kenneth Whistler wrote:
FA47 is a compatibility character, and would have a
compatibility mapping.
Faulty syllogism.
Formally correct answer but only because of something of a design flaw
in Unicode. When the type of mapping was decided
Mark Davis wrote:
What are also tricky are the 'almost' supersets, where there are only a few
different characters. Those definitely cause problems because the difference
in data is almost undetectable.
For example, Mark is referring to cases such as ISO 8859-1 and 8859-15.
Those share all
Nagesh Chigurupati asked:
I have a question regarding some of the contextual rules in RFC5892. For
example the contextual rule in appendix A.4 Greek Lower Numeral Sign
(U+0375), states the following:
If Script(After(cp)) .eq. Greek Then True;
If the Greek Lower Numeral Sign (U+0375) is
Gy. Dobner asked:
But my original question was not how to encode a combining macron in one
more possible way but how to encode a length mark that would display as
something _visually_ _distinguishable_ _from_ _a_ _macron_ (because the
macron is functionally ambiguous and hence unsuitable for
What is the position regarding the 32-bit code point space
above U+10 please?
Does the Unicode Consortium and/or ISO or indeed anyone else
make any claims upon it?
Yes, the claim is that if you use it, you're generating invalid Unicode.
Don't do it, don't contemplate it,
Karl Williamson asked:
The Unicode standard only gives numeric values to rational numbers. Is
the reason for this merely because of the difficulty of representing
irrational ones?
No. Primarily it is because the Unicode Standard is a *character*
encoding standard, and not a standard for
Asmus,
I'm curious if any thought was given to this, and what code points I'm
missing in my analysis.
U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN
SMALL LETTER E), also used for Euler's number. See also U+2147.
Now you are confusing Euler's constant - also depicted with
Exploring the dictionary with the search engine (which is operational
since today morning ...) I discovered two occurences of an unexplained
abbreviation which refers to a language in which silvir means
silver and ses means six. The name of the language is
abbreviated as Kimr.
Any ideas
I am thinking of where a poet might specify an ending version
of a glyph at the end of the last word on some lines, yet not
on others, for poetic effect. I think that it would be good
if one could specify that in plain text.
Why can't a poet find a poetic means of doing that, instead of
But an approach that abstracts the name, then tries to re-imagine a
representation from scratch is, in my view, very much misguided.
Recall that many of the emojis 1) have changed glyphs quite a lot from
the source glyphs, and 2) are to quite an extent defined from the *source*
That statement is incorrect. The UCA currently specifies that
ill-formed code unit sequences and *noncharacters* are mapped
to [....], but unassigned code points are not.
This is exactly equivalent: if you use strength level 3, they are
both [...], ...
You have
Martin,
In a discussion about a new protocol, there was some issue about how to
replace illegal bytes in UTF-8 with U+FFFD. That let me remember that
there was once a Public Review Issue about this, and that as a result, I
added something to the Ruby (programming language) codebase. I
Philippe Verdy said:
Implicit weights for unassigned code points and other characters that
are NOT ill-formed are suboptimal, as noted in the proposed update.
To follow up on Mark's response on this thread...
It should take into account their existing default properties, notably :
[ long
Frédéric Grosshans asked:
Why did you chose the fleur words ? The question discussed about the
accent do not seem to arise here.
I was struck by the issues about space, hyphen (or lack thereof)
and alternate spellings that could be illustrated by that
stretch of topics, so used that as the
Philippe Verdy noted:
Everywhere below, the Unicode property value alias is missing an 'l'.
- In HTML table 1:
Egyp 050 Egyptian hieroglyphshiéroglyphes égyptiens Egyptian
_Hierogyphs 2009-06-01
etc.
These errors in the tables have been corrected by the Registration
C. E. Whitehead said:
I've not gone through many character charts though so I can't
really speak as an expert as you all can; sorry I've not gotten
to more; I will try to ...
For people who wish to pursue this issue further, the relevant
information is neatly summarized in the extracted
Karl Williamson asked:
Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT
does?
They are U+2107 and U+210E respectively.
Because U+210E PLANCK CONSTANT is, to quote the standard,
simply a mathematical italic h. It serves as the filler for
the gap in the run of
Sharma asked:
I have a question about VS characters and the default ignorable property.
TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable.
Ch 5.21 states that default ignorable characters are to be ignored in
rendering (except in specialized modes which show hidden
On this date, Unicode had received proposals for same purpose
form non-insiders too -- as you know this is true because India
is a nation of over a billion populations.
I have seen no other proposals to encode the character, submitted
either to the UTC or to WG2.
Actually, there has
Philippe Verdy said:
A side note about this preliminary proposal for allocating blocks in
the SMP for the two Pau Cin Hau scripts (including one for the large
logographic script, with 1050 signs):
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3865.pdf
(authored by Anshuman Pandey, in MIT)
If
So what do we do with all these names?
Can't we ask Mark to use a lottery to pick one and go from there? ...
So whaddya say, Mark? Have a go at the roulette wheel?
Ladies and gentlemen... step right up and place your bets!!
Bengali, Bangla, Bengalese, Bangladeshi, Bengalian, Bengalish,
Philippe Verdy said:
If we don't limit the backwards reordering, then all accents in the
full sentences will be reordered, so this is the final word that will
drive the order. not only this is incorrect,
I understand that you think that the ordering should be done
word-by-word, with the
Philippe Verdy wrote:
Kenneth Whistler k...@sybase.com wrote:
Huh? That is just preprocessing to delete portions of strings
before calculating keys. If you want to do so, be my guest,
but building in arbitrary rules of content suppression into
the UCA algorithm itself is a non-starter
[ snipping all the word breaking discussion, which I am not going
to comment on ... ]
CE Whitehead said:
I collate as follows (note that i' is equivalent to i with accent grave):
(EXAMPLE 1 -- my sort)
di Silva, Fred,
di Silva, John
di Si'lva, Fred
di Si'lva, John
Disilva, Fred
William Overington asked:
Will the Unicode Standard version 6.0 include mention of
the unification of characters from the emoji set used in
mobile telephones with earlier Unicode characters, also
including a list of those characters of the emoji set
that have been unified and where to
On Fri, 25 Jun 2010, I wrote
Even in the year 2010, the euro sign (¤) doesn't work reliably.
in both the Unicode list and in the newsgroup de.test.
unicode.org shows a euro sign:
http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html
groups.google.com shows a currency
John - If I define a symbol (variable or constant) named ɸ and some
user types 'Ï' or 'Ï' instead, it won't match.
Can you please post the names for the other two, i.e., 'Ï' or 'Ï' ?
John was referring to:
U+0278 LATIN SMALL LETTER PHI
U+03C6 GREEK SMALL LETTER PHI
U+03D5 GREEK PHI
Steve,
All of this writing can be encoded using 1280 code points. I
have a 12-bit encoding with bi-directional conversion with UTF-8 working
for planes 1, 15, or 16.
A minor point, but I suggest you not use bi-directional
in that context.
Bidirectional is a term of art in Unicode
On Friday 04 June 2010 08:51:05 am Otto Stolz wrote:
In any case, you have to know the base of every number
you are going to parse. This stems from the fact that
the same digits are used for all number systems.
Luke-Jr replied:
But you first need to know if it is a number or a word.
But again, I'm not talking about programming. My four year old can grasp
tonal
just as well as she could decimal had I been teaching that. Now if I were
using the a-f notation, she would be (reasonably) confused as to why *some*
numbers are unique, but *other* numbers are also letters.
Note that as of 1993, the only LAMDA or LAMBDA characters
in the standard were:
039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER
LAMBDA;;;03BB;
03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER
LAMBDA;;039B;;039B
019B;LATIN SMALL LETTER LAMBDA WITH
I'm not sure how much longer we should continue to wait for Tengwar and
Cirth.
Three words: Squeaky wheel -- grease.
Don't expect this to just happen. The corporate members of
the Unicode Consortium are mostly concerned about economically
significant sets of characters that impact their
John Dlugosz asked:
Why does the code chart call the plain Greek letter (upper and
lower case) LAMDA rather than LAMBDA?
Because ISO 8859-7 called it LAMDA rather than LAMBDA.
Note that Unicode 1.0 called it LAMBDA, but synchronization
of names for Unicode 1.1 (in 1993) was towards ISO
Robert Abel noted:
It seems U+019B is the only instance where lambda is used. All other
instances use lamda. So it seems the slip-up is the other way around,
whatever the initial reasoning for using lamda was.
It was not a slip-up. It was deliberate at the time (1993).
Note that as of
Why not? I thought the names of some things have changed
between versions, and other database items have changed substantially.
See Name Stability on the Unicode Character Encoding Stability Policy
page:
http://www.unicode.org/policies/stability_policy.html
--Ken
Names sometimes don't
Lars said:
According to UTC, you need to keep processing
the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8
function is allowed to reject invalid sequences. Basically, you are not
supposed to use strcpy to process filenames.
This is a very misleading set of statements.
Lars Kristan stated:
I said, the choice is yours. My proposal does not prevent you from doing it
your way. You don't need to change anything and it will still work the way
it worked before. OK? I just want 128 codepoints so I can make my own
choice.
You have them: U+EE80..U+EEFF, which are
If any
criticism was present, it referred to the redundant US- prefix in
US-ASCII, not to Unicode, and even that wasn't really criticism, just my
lack of understanding /why/.
In addition to Doug's historical clarification, you need to
understand this as a perfectly normal linguistic
Tim Greenwood asked:
... a perfectly normal linguistic process of
attributive disambiguation of a term which had grown ambiguous
in usage.
Is that like the 'Please RSVP' that I see all too often? Or should
that not be excused?
*grins* Well, technically, that is not a case of
Philippe,
RSVP is a French acronym for Répondez, s'il vous plait.
Yes, we know that.
But it is also a reanalyzed English verb which means
reply to a message (or invitation).
That it has been morphological reanalyzed is demonstrated by the
fact that it takes regular English verb endings, as
Peter Kirk noted:
I was reviewing the Roadmap for the SMP
(http://www.unicode.org/roadmaps/smp/), in comparison with the list of
proposed new scripts, and found a few anomalies.
Hittite (Anatolian) Hieroglyphs/Luvian is listed as a proposed new
script, with a draft proposal, but seems
John Cowan responded:
Storage of UNIX filenames on Windows databases, for example,
^^
O.k., I just quoted this back from the original email, but
it really is a complete misconception of the issue for
databases. Windows databases is a
Marcin asked:
The general trouble is that numeric character references can only
encode individual code points
By design.
rather than graphemes (is this a correct
term for a non-combining code point with a sequence of combining code
points?).
No. The correct term is combining character
Lars responded:
... Whatever the solutions
for representation of corrupt data bytes or uninterpreted data
bytes on conversion to Unicode may be, that is irrelevant to the
concerns on whether an application is using UTF-8 or UTF-16
or UTF-32.
The important fact is that if you have an
Philippe stated, and I need to correct:
UTF-24 already exists as an encoding form (it is identical to UTF-32), if
you just consider that encoding forms just need to be able to represent a
valid code range within a single code unit.
This is false.
Unicode encoding forms exist by virtue of
Philippe continued:
As if Unicode had to be bound on
architectural constraints such as the requirement of representing code units
(which are architectural for a system) only as 16-bit or 32-bit units,
Yes, it does. By definition. In the standard.
ignoring the fact that technologies do
Lars,
I'm going to step in here, because this argument seems to
be generating more heat than light.
I never said it doesn't violate any existing rules. Stating that it does,
doesn't help a bit. Rules can be changed.
I ask you to step back and try to see the big picture.
First, I'm going to
John Cowan clarified the JTC1 process:
The result of a
no vote is that the process loops until all such votes are resolved.
All comments on a formal JTC1 ballot receive a *disposition*.
As far as possible, that disposition is done by committee consensus,
which usually means, in practice, the
Peter,
This was in fact my question: will the amendment be
passed automatically if there is a majority in favour, or does it go
back for further discussion until a consensus is reached? You have
clarified that the latter is true. And I am glad to hear it.
The relevant applicable clauses
Otoo Stolz asked:
In German, however, a ligature must not span a syllable break.
How should I code plain text, w.r.t. hyphenation and ligatures?
- Huf + ZWNJ + lattich
- Huf + SYH + lattich
- Huf + SYH + ZWNJ + lattich
- Huf + ZWNJ + SYH + lattich
You should code it as:
Huflattich
Philippe Verdy responded to John Cowan:
From: John Cowan [EMAIL PROTECTED]
the need to encode Dutch
ij as a single character, which is neither necessary nor practical.
(U+0132 and U+0133 are encoded for compatibility only.) In cases where
ij is a digraph in Dutch text, i+ZWNJ+j will be
Mark Davis said (in reference to a long set of comments by
Philippe Verdy on this thread):
The statements below are incorrect
And Philippe asked:
Which statements? My message is mostly a read as a question, not as an
affirmation...
And I will attempt the fact-finding...
CGJ is a
Michael Norton (a.k.a. Flarn) asked:
What's an ideograph? Also, what's a radical?
Are they the same thing?
No, they aren't.
In the Unicode context, the simplest answer is that
an ideograph or a CJK ideograph is simply to be
taken as a synonym for a Chinese character.
A radical is one of a
John Hudson responded to Jony Rosenne:
The idea that the position of such text on a page -- as a marginal
note -- somehow demotes
it from being text, is particularly nonsensical.
I think you two (Jony and John) are talking at cross-purposes
on this particular point.
The *content* of
Allen Haaheim provided some further detailed clarification:
Note that Han characters are logographic, not ideographic. That is,
they are graphemes that represent words (or at least morphemes),
not ideas.
This correctly states the situation for the normal case for
Chinese characters used
Tim Greenwood asked:
All of the spacing combining marks (general category Mc) except
musical symbols have a canonical combining class of 0. So, for example
0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left
of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on the right)
Harshal Trivedi asked:
How can i make sure that UTF-8 format string has terminated while
encoding it, as compared to C program string which ends with '\0'
(NULL) character?
You don't need to do anything special at all when using UTF-8
in C programs, as far as string termination goes. UTF-8
Peter Kirk suggested:
I am suggesting that the best way to get the job done properly is to lay
the conceptual foundation properly first, instead of trying to build a
structure on a foundation which doesn't match...
Part of the problem that I think some people are having here,
including Peter,
Elaine Keown asked:
Supposedly this list has 600 people.
Just of curiosity, how many of you are NOT font
designers?
And since a number of people are declaring their
backgrounds, I'll chime in, too. ;-)
I am not a font designer, although I have designed fonts
(many years ago) for
Theo,
Further following up from what Mark Davis responded...
Mark Davis wrote:
All comments are reviewed at the next UTC meeting. Due to the volume, we
don't reply to each and every one what the disposition was. If actions were
taken, they are recorded in the minutes of the meetings.
Elaine,
[Feel free to forward this on to the Hebrew lists you
copied on your original inquiry, if you think it appropriate.]
Peter Constable replied on the Unicode list:
Which items? There were three at the June meeting:
- atnah hafukh
- lower dot and nun hafukha
- qamats qatan
Jon Hanna wrote:
imported UTF-8 sequences like [U+0065][U+0303] e, tilde get
remapped
internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
Is this kind of behavior what one would expect?
That's conformant, if it causes problems with any other process (including
other
At 06:04 PM 9/30/2004, Michael Everson wrote:
see no reason given for us not to unify the handwritten symbol we have
seen with BREVE ABOVE.
and Asmus responded:
Functionally, the symbol is not a breve. Visually, the sample does not look
like a standard breve, and the font resource
Kent wrote:
Kenneth Whistler wrote:
Second, there is the question of cursive joining for Arabic.
I don't know anything in the Unicode Standard that states that
a combining enclosing mark breaks cursive ligation. It stands
to reason that it *should*, but I don't know anything
Asmus responded:
It's a simple combining character. Even if you can't do arbitrary circles
around characters, you can take one character sequence and map it to the
glyph in a font. Systems that can't do even that need to be fixed.
In other words, you would like to treat this as a mandatory
Michael Everson responded to Christopher Fynn's question:
At 13:46 +0100 2004-09-19, Christopher Fynn wrote:
So, am I right in assuming that were someone put together a decent
proposal for one or more shorthand scripts, there is no particular
reason in principle why it would be rejected?
Incidentally, for those interested, the website of the National
Court Reporters Association has a brief history of
shorthand (skewed of course to the English language-based
developments):
http://www.ncraonline.org/about/history/shorthand.shtml
A summary of the development of the Stenograph
Philippe waxed lyrical about the advantages of platform-independent
development:
Isn't Java hiding most of these platform details, by providing unified
support for platform-specific look and feel? Aren't there now many PLAF and
themes manager available with automatic default selection of the
Philippe asked:
http://www.omniglot.com/writing/albanian.htm
shows two historic scripts that have been used to write Albanian (Shqip):
- the Elsaban script in the 18th century, which looks like Old Greek for the
language Tosk variant. However there are lots of unique letter forms, and
On 05/09/2004 18:27, John Cowan wrote:
The following links show L-shaped marks, apparently combining
characters, that indicate the change-of-pitch position in Japanese
words written in romaji. Are these novel characters, or can they
be identified with existing Unicode characters? Are
Peter Kirk wrote:
At 11:02 AM 7/13/2004, Peter Kirk wrote:
I was surprised to see that WG2 has accepted a proposal made by the
US National Body to use CGJ to distinguish between Umlaut and Tréma
in German bibliographic data.
And Asmus responded:
You raise some interesting
Peter Kirk continued:
I did read it, but it didn't deal with the issue I was concerned about,
of multiple combining marks. And I was concerned about that issue
because that was the major concern expressed in the earlier discussion
on variation selectors, and presented as the decisive
Subject: Re: Changing UCA primarly weights (bad idea)
Correcting the subject, just because it bugs me...
You are certainly right that this is not a slam-dunk; there are reasons for
and against it. And
Subject: Impotance of diacritics (was: Looking for transcription ...)
^
It's a good thing this discussion of the impotence of diacritics
from bushmanush didn't also mention \/|å.G4ä, and talked about
*tran*scription, instead of *pre*scription, or my spam filter
would
Peter Kirk said:
I made a serious point, not apparently made in the UTR draft, that
diacritic folding may be useful for spam filtering and similar
applications including finding misleading URIs.
This seems like a reasonable point to make and to add to the discussion
of folding in UTR #30.
the versions in the main Greek and
Coptic block (or has it been officially renamed just Greek?)
No, the block name won't be changed, in part because changing
block names is another destabilization in the standard that
really serves nobody well, but mostly because the existing
14 Coptic letters
I like to use the decomposed version of Unicode characters Ð, ð, £ and
³ (U+0110, U+0111, U+0141 and U+0142).
For example, d followed by a combining_diacritical_mark should generate
ð (d with stroke).
What combining_diacritical_mark should be used for this case ?
As Michael and Clark
Elain asked:
Quotes below from the SMP .pdf---I can't put the three
quotes below together intelligibly.
Do the quotes mean that the Linear B syllabary and Old
Italic and Ugaritic are already in permanent locations
in the SMP, or do they mean something else?
You should start with the
I have a (hopefully) short question about polytonic Greek support.
Does anyone know what the idea was behind encoding Greek vowel+acute
combinations (without apirates, etc.) twice: first in the Basic
Greek section as vowel+tonos, for the second time in the Extended
Greek section as
Peter Constable wrote,
Don't forget canonical equivalence (I forgot about this as well): the
double-width diacritics have a combining class of 234 rather than 230.
This means that 0251 0361 0302 028A is canonically equivalent to 0251
0302 0361 028A. Therefore, the first (for better or
On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote:
Depite the oft-mentioned cutesy Hong Kong race horse names,
idiosyncratic
invented Han ideographs are a negligible component of the encoded CJK
repertoire. In my opinion there are thousands, possibly tens of
thousands, of
Michael,
And now you are answering arguments with irrelevancies.
But the argument in this particular case hinges on a particular,
nonce set of characters.
You use nonce very easily.
Nonce: Occurring, used, or made only once or for a special occasion.
You can, of course, quibble that this
Peter,
There is no consensus that this Phoenician proposal is necessary. I
and others have also put forward several mediating positions e.g.
separate encoding with compatibility decompositions
Which was rejected by Ken for good technical reasons.
I don't remember any technical reasons,
António noted:
Dunno about the others, but spanish play cards suit symbols are
clearly style variations of U+2660, U+2663, U+2665 and U+2666.
(BTW, I'm right asuming that U+2660, U+2663, U+2665 and U+2666 are the
actual suit symbols, while U+2661, U+2662, U+2664 and U+2667 are
just
Peter Constable responded to Peter Kirk:
From: Peter Kirk [mailto:[EMAIL PROTECTED]
Sent: Friday, May 28, 2004 1:40 PM
Well, I understood the semantic content of a text to be the meaning of
the words...
[Kirk continuing, to provide more context...
, not the indication of which
Dean Snyder parried (and missed):
James Kass wrote at 4:37 PM on Wednesday, May 26, 2004:
Shemayah Phillips of ebionite.org
It has some
differences in representing Hebrew because square script has more
characters (e.g., shin/sin) than Palaeo.
Not a relevant argument - Spanish has more
Peter,
There is no consensus that this Phoenician proposal is necessary. I
and others have also put forward several mediating positions e.g.
separate encoding with compatibility decompositions
Which was rejected by Ken for good technical reasons.
I don't remember any technical
John Cowan asked:
Doug Ewell scripsit:
So is [VIQR] a 7-bit encoding, or a scheme layered on top of ASCII?
It's a scheme layered on top of ASCII
And what is KOI-7?
A true 7-bit encoding for Russian, in which Cyrillic letters (small and
capital respectively) were encoded in
Archaic Greek could be written right-to-left, left-to-right, or boustrophedon.
I'm asking for technical advice as to how such variability in writing
direction streams in the same script can be, and should be, handled in
Unicode, and how it should be dealt with in a Unicode proposal.
TUS
Dean Snyder asked:
Archaic Greek exhibits variable glyph stance, that is, glyphs can be
flipped horizontally or even vertically, usually dependent upon the
direction of the writing stream.
How should variable glyph stance for the same characters in the same
script be dealt with in Unicode
John Hudson asked:
I would
like to know what the presumed purpose of U+2616 and U+2617 is.
In Unicode? To map to JIS X 0213. You need to ask the JSC what *their*
intent was in adding these two characters to the Japanese standard.
Not so. Both sides has four generals: two 'gold' and two
[EMAIL PROTECTED] (James Kass) writes:
And we use language tagging in plain text how?
I seem to remember the Japanese asking that.
It wasn't the Japanese that asked for it.
And I seem to remember
Unicode encoding the Plane 14 tags for that.
Plane 14 language tags were encoded to
Philippe asked:
In fact, any existing
MCW/ASCII-encoded file of Hebrew text is, in fact, also
MCW/Unicode-encoded since the representation of Basic Latin
characters at the character encoding form and character
encoding scheme levels is exactly the same for ASCII as it is
for Unicode:
Doug asked:
I'm sure this is a dumb question, but why would there be any pages in
non-Unicode charsets on the Unicode Web site?
Legacy, just as for many sites.
The question is whether it makes sense to go back to
older, archived material and:
a. delete it, because it is in Latin-1 or CP
Peter Kirk suggested:
Similarly, I suppose, with the proposed Phoenician script: each
character could be given a compatibility decomposition to the equivalent
Hebrew letter. This implies automatic interleaved collation. Now, while
I don't expect Michael Everson to jump at this suggestion,
Dean continued:
Or (making the missed point explicit):
I attempted to bring this thread back on track yesterday, but
since it seems to have veered off into the ditch again, we
may as well spin our wheels some more, I guess. :-(
If the UTC did consider the potential for large numbers of users
Patrick said:
In this case, I think it's important to be picky because there are
no current Unicoding practices for Phoenician.
You may mean that the Unicode book does not document how Phoenician (or
Paleo-Hebrew) may be encoded. This is not to say that no one is using
Unicode to encode
Ernest indicated:
Whether using variation sequences to separate
Phoenician from Square Hebrew would be daft
would depend upon a number of factors.
How often would both glyph repertoires appear in
the same document?
How frequently would non-Square Hebrew glyphs
be used?
How important
1 - 100 of 750 matches
Mail list logo