, "inverse",
"inverted", "reversed", "rotated" etc. Also the difference between "digraph" and
"ligature", etc.
Although I've searched the FAQ files and the rest
of the unicode.org site, I haven't been able to find this info as yet. T
John Hudson wrote:
In the Slovak orthography, the lowercase d, l and t are normally written
with the 'apostrophe' form of the accent.
Then why does UnicodeData break them down as (e.g.) 0064 030C rather than
0064 0315?
Pim Blokland
form, where their appearance depends on the font used.
Pim Blokland
good job.
For instance, the Danish ae (U+00E6) is not designated a ligature, but the
Dutch ij (U+0133) is, even though the a and e are clearly fused
together, while the i and j aren't.
Pim Blokland
.
Somebody actually made a SEPARATE entry for ij?
Yes, just as there are separate entries for the Latin E, the Greek E, the
Cyrillic E, etc. Even though those look exactly alike!
Pim Blokland
.
That may be a kerning problem.
Pim Blokland
.
Not sure about other platforms.
Pim Blokland
. But then where would it
end?
Pim Blokland
arfegi. However I don't speak Icelandic,
so I've no idea if this is a combination of two subwords.
Pim Blokland
able to find any occurrences of qj though. Maybe I
was a bit hasty in suggesting qj might be needed as a ligature.
Pim Blokland
mark, allowing the i to be decomposed into U+0131 and U+0307.
Pim Blokland
the font's ligature tables are OK, it's
just as legal to have an fj at, say, U+E70B, as it is to have an fi
at U+FB01.
As long as you don't actually put a 0xE70B character in your text.
This is what Joop meant, I think.
Pim Blokland
arted.
I still think everybody
were to benefit if the Unicode Consortium would publish a list of words like
these and what is meant by them exactly. And if everybody would abide by that
list, of course.
Pim Blokland
Jain, Pankaj (MED, TCS) schreef:
But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving
ndash in
html.
In html? No way! Html can't interpret series of hex bytes. Try
ndash; or #8211;.
Pim Blokland
Jain, Pankaj (MED, TCS) schreef:
I modified my program as per your suggestion(modified to
byChunk127) ,
Sorry, I was much too hasty with my reply. First of all, I should
have written byChunk255. And secondly, solutions like the one
Markus proposes are much better thought out.
My apologies.
Pim
be positioned vertically,
relative to normal text? Etc.
The same goes for other shapes, of course. For instance, what
criteria exist for, when creating a text, choosing between U+25B6,
U+25B8 and U+25BA?
Are there URLs available shich discuss these issues?
Pim Blokland
thickness or vertical
position as the horizontal lines in the box drawings range.
Pim Blokland
character
sets?
I once wrote a conversion routine, under Windows, that can convert
from UCS-2 to any codepage Windows supports and back, by using the
info in the appropriate *.nls files.
If this is what you want, and you can't find it elsewhere, I can
upload those routines somewhere if you like.
Pim
conversion do you need? LE byte order to BE and back?
Canonical decomposing? Fallback character substitutions? BOM
insertion? What?
Pim Blokland
to this character and your output buffer, and
you end up with the bytes 0xE4, 0x8C, 0xA1 in your output buffer.
You can then dump this buffer to the file you mentioned.
However, you have said this is not what you want!
So what is it that you do want?
Pim Blokland
codepoints are not fixed, how do you display this
character? I.E. if a certain character is not fixed at U+EF00, you
cannot simply have a text file with the value 0xEF00 in it and
expect it to print.
Pim Blokland
of
characters in TrueType fonts are PostScript names, not HTML names,
so that a character like periodcentered should be addressed as
middot;. But these are details, details...)
Pim Blokland
P.S.
, new codepoints were
introduced for the archaic koppa, in order to show both variants
and avoid confusion.
Pim Blokland
other characters) without
needing a long list of ENTITY entries in the XML.
Anyone else think this would be a good idea?
Pim Blokland
is
willing to rewrite an Internet browser, then I'll be happy to
continue this discussion by private e-mail...
Pim Blokland
glyph variants?
Pim Blokland
latin crosses
(U+271D).
Pim Blokland
or
guidelines for the situation where two files are joined, and the
second one has a BOM, but the first one hasn't. Should the resulting
file have a BOM? I.E. should a BOM be added to what was the contents
of the first file?
Pim Blokland
Note on the COPY command: it seems some versions of Windows seem to
be BOM-aware; at least Windows2000, when concatenating two text
files, does remove the second's BOM.
Pim Blokland
.
- If one of the files has a BOM, assume all files have the same byte
order as that one.
- Or indeed, check out the contents.
Pim Blokland
, removing the BOM that would end up somewhere in the
middle is the natural thing to do, just as removing the EOF marker
at the end of the first file is.
I'm not going into the implementation part; just pointing out that
this issue is not something an operating system can ignore.
Pim Blokland
would be easy; there would be only two variants,
UTF-24LE and UTF-24BE, and that's it. No juggling with bits like in
UTF-8 and UTF-16 or anything complicated like that. Just the plain
character values, just like in UTF-32, only with 75% of the storage
needed.
Comments anyone?
Pim Blokland
that characters such as these are ALWAYS frowned
upon in plain text; are only meant to be used in specialized
environments such as scientific formula editors?
Pim Blokland
, bijectie).
Pim Blokland
Peter Constable schreef:
Whatever happened to CGJ?
Too new, probably.
People (and software applications) aren't used to this one yet.
Pim Blokland
citations would be
entered ''like this'' instead of like this. That's two characters
each.
And there is of course the colloquial habit of speaking the words
quote and unquote to delimit a citation. These words making up 5 and
7 characters, respectively.
Pim Blokland
; it is identical to
Unicode (that is, the words ISO 10646 and Unicode are
interchangable); it is a paper describing a standard.
So where can I find the formal definition and how can I tell that
is the formal definition and why doesn't everybody agree?
Pim Blokland
no
control over what font makers make their fonts look like.
Pim Blokland
coding, just as using greek Iotas or combinations of U+2160
and U+0049 would be.
Pim Blokland
and dotless j and a single dot above,
centered between them. Can you give examples?
Pim Blokland
, but now it's there, there's no reason to ignore it when
refining the rules, to deprecate it practically.
Pim Blokland
something?
Pim Blokland
bit!
Pim Blokland
-
no more messing around with felt tip pens, just a simple, clean,
hardware solution. People would love that. Or is that just me?
Pim Blokland
probably disagree, saying that it's alright with Dutch
being another language and such, and as we're extremely off topic by
now, I propose not continuing this thread on this list.
If you do want to argue, you know my email address.
Pim Blokland
which was
intended to ensure clean typography. Sigh.
I have absolutely no idea what you are talking about.
Pim Blokland
it a letter rather than a symbol. I'd expect
if it was put in for completeness, to complement the degrees
Fahrenheit and degree Celcius, it would have had the same category
as those two?
Pim Blokland
metres...
4.28 actually.
But are you serious about lengthening the yard to be the same size
as the meter?
Ha! Fat chance! You might as well suggest we abolish the yard
altogether!
Pim Blokland
depending on which mile one is talking about!
Well, not all of those measurements are the same size. Disregarding
nautical miles, there's still the matter of the yards. Did I mention
that my front yard is not the same size as my back yard?
Pim Blokland
; if so, where?
Pim Blokland
, currency and
language codes free of charge.
Pim Blokland
does Unicode support?)
Pim Blokland
this into UTF-16 would yield U+DB7A U+DC0D, which is what
you got in your output.
Pim Blokland
No-Break Space].
Is this something that has slipped by the editors? Or am I missing
something?
Pim Blokland
54 matches
Mail list logo