Looking for information on the UnicodeData file

2003-03-05 Thread Pim Blokland
, "inverse", "inverted", "reversed", "rotated" etc. Also the difference between "digraph" and "ligature", etc. Although I've searched the FAQ files and the rest of the unicode.org site, I haven't been able to find this info as yet. T

Re: Caron / Hacek?

2003-03-05 Thread Pim Blokland
John Hudson wrote: In the Slovak orthography, the lowercase d, l and t are normally written with the 'apostrophe' form of the accent. Then why does UnicodeData break them down as (e.g.) 0064 030C rather than 0064 0315? Pim Blokland

Re: Caron / Hacek?

2003-03-07 Thread Pim Blokland
form, where their appearance depends on the font used. Pim Blokland

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Pim Blokland
good job. For instance, the Danish ae (U+00E6) is not designated a ligature, but the Dutch ij (U+0133) is, even though the a and e are clearly fused together, while the i and j aren't. Pim Blokland

Re: FAQ entry

2003-03-07 Thread Pim Blokland
. Somebody actually made a SEPARATE entry for ij? Yes, just as there are separate entries for the Latin E, the Greek E, the Cyrillic E, etc. Even though those look exactly alike! Pim Blokland

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-07 Thread Pim Blokland
. That may be a kerning problem. Pim Blokland

Re: Caron / Hacek?

2003-03-07 Thread Pim Blokland
. Not sure about other platforms. Pim Blokland

Ligatures (was: FAQ entry)

2003-03-09 Thread Pim Blokland
. But then where would it end? Pim Blokland

Re: Ligatures (was: FAQ entry)

2003-03-10 Thread Pim Blokland
arfegi. However I don't speak Icelandic, so I've no idea if this is a combination of two subwords. Pim Blokland

Re: Ligatures (was: FAQ entry)

2003-03-10 Thread Pim Blokland
able to find any occurrences of qj though. Maybe I was a bit hasty in suggesting qj might be needed as a ligature. Pim Blokland

Re: Ligatures (was: FAQ entry)

2003-03-10 Thread Pim Blokland
mark, allowing the i to be decomposed into U+0131 and U+0307. Pim Blokland

Re: Ligatures (qj)

2003-03-11 Thread Pim Blokland
the font's ligature tables are OK, it's just as legal to have an fj at, say, U+E70B, as it is to have an fi at U+FB01. As long as you don't actually put a 0xE70B character in your text. This is what Joop meant, I think. Pim Blokland

Re: Ligatures

2003-03-11 Thread Pim Blokland
arted. I still think everybody were to benefit if the Unicode Consortium would publish a list of words like these and what is meant by them exactly. And if everybody would abide by that list, of course. Pim Blokland

Re: Unicode character transformation through XSLT

2003-03-11 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving ndash in html. In html? No way! Html can't interpret series of hex bytes. Try ndash; or #8211;. Pim Blokland

Re: Unicode character transformation through XSLT

2003-03-13 Thread Pim Blokland
Jain, Pankaj (MED, TCS) schreef: I modified my program as per your suggestion(modified to byChunk127) , Sorry, I was much too hasty with my reply. First of all, I should have written byChunk255. And secondly, solutions like the one Markus proposes are much better thought out. My apologies. Pim

geometric shapes

2003-03-13 Thread Pim Blokland
be positioned vertically, relative to normal text? Etc. The same goes for other shapes, of course. For instance, what criteria exist for, when creating a text, choosing between U+25B6, U+25B8 and U+25BA? Are there URLs available shich discuss these issues? Pim Blokland

Re: geometric shapes

2003-03-13 Thread Pim Blokland
thickness or vertical position as the horizontal lines in the box drawings range. Pim Blokland

Re: Need encoding conversion routines

2003-03-14 Thread Pim Blokland
character sets? I once wrote a conversion routine, under Windows, that can convert from UCS-2 to any codepage Windows supports and back, by using the info in the appropriate *.nls files. If this is what you want, and you can't find it elsewhere, I can upload those routines somewhere if you like. Pim

Re: Need encoding conversion routines

2003-03-14 Thread Pim Blokland
conversion do you need? LE byte order to BE and back? Canonical decomposing? Fallback character substitutions? BOM insertion? What? Pim Blokland

Re: Need encoding conversion routines

2003-03-14 Thread Pim Blokland
to this character and your output buffer, and you end up with the bytes 0xE4, 0x8C, 0xA1 in your output buffer. You can then dump this buffer to the file you mentioned. However, you have said this is not what you want! So what is it that you do want? Pim Blokland

Custom fonts (was: Tolkien wanta-be)

2003-03-15 Thread Pim Blokland
codepoints are not fixed, how do you display this character? I.E. if a certain character is not fixed at U+EF00, you cannot simply have a text file with the value 0xEF00 in it and expect it to print. Pim Blokland

Re: Custom fonts (was: Tolkien wanta-be)

2003-03-16 Thread Pim Blokland
of characters in TrueType fonts are PostScript names, not HTML names, so that a character like periodcentered should be addressed as middot;. But these are details, details...) Pim Blokland P.S.

Re: U+00D0, U+01b7 -- variants or distinct chars?

2003-03-18 Thread Pim Blokland
, new codepoints were introduced for the archaic koppa, in order to show both variants and avoid confusion. Pim Blokland

Re: Custom fonts (was: Tolkien wanta-be)

2003-03-18 Thread Pim Blokland
other characters) without needing a long list of ENTITY entries in the XML. Anyone else think this would be a good idea? Pim Blokland

Re: Custom fonts

2003-03-19 Thread Pim Blokland
is willing to rewrite an Internet browser, then I'll be happy to continue this discussion by private e-mail... Pim Blokland

Re: Help needed with Davanagari glyph

2003-03-21 Thread Pim Blokland
glyph variants? Pim Blokland

Re: Which cross is this?

2003-03-22 Thread Pim Blokland
latin crosses (U+271D). Pim Blokland

Re: Several BOMs in the same file

2003-03-23 Thread Pim Blokland
or guidelines for the situation where two files are joined, and the second one has a BOM, but the first one hasn't. Should the resulting file have a BOM? I.E. should a BOM be added to what was the contents of the first file? Pim Blokland

Re: Several BOMs in the same file

2003-03-23 Thread Pim Blokland
Note on the COPY command: it seems some versions of Windows seem to be BOM-aware; at least Windows2000, when concatenating two text files, does remove the second's BOM. Pim Blokland

Re: Several BOMs in the same file

2003-03-23 Thread Pim Blokland
. - If one of the files has a BOM, assume all files have the same byte order as that one. - Or indeed, check out the contents. Pim Blokland

Re: Several BOMs in the same file

2003-03-25 Thread Pim Blokland
, removing the BOM that would end up somewhere in the middle is the natural thing to do, just as removing the EOF marker at the end of the first file is. I'm not going into the implementation part; just pointing out that this issue is not something an operating system can ignore. Pim Blokland

UTF-24

2003-04-03 Thread Pim Blokland
would be easy; there would be only two variants, UTF-24LE and UTF-24BE, and that's it. No juggling with bits like in UTF-8 and UTF-16 or anything complicated like that. Just the plain character values, just like in UTF-32, only with 75% of the storage needed. Comments anyone? Pim Blokland

Re: Exciting new software release!

2003-04-03 Thread Pim Blokland
that characters such as these are ALWAYS frowned upon in plain text; are only meant to be used in specialized environments such as scientific formula editors? Pim Blokland

Re: Dutch IJ, again

2003-05-28 Thread Pim Blokland
, bijectie). Pim Blokland

Re: Dutch IJ, again

2003-05-29 Thread Pim Blokland
Peter Constable schreef: Whatever happened to CGJ? Too new, probably. People (and software applications) aren't used to this one yet. Pim Blokland

Re: book end or enclosing characters in most languages?

2003-05-30 Thread Pim Blokland
citations would be entered ''like this'' instead of like this. That's two characters each. And there is of course the colloquial habit of speaking the words quote and unquote to delimit a citation. These words making up 5 and 7 characters, respectively. Pim Blokland

Stupid question: ISO 10646

2003-06-03 Thread Pim Blokland
; it is identical to Unicode (that is, the words ISO 10646 and Unicode are interchangable); it is a paper describing a standard. So where can I find the formal definition and how can I tell that is the formal definition and why doesn't everybody agree? Pim Blokland

Re: Caron / Hacek?

2003-06-12 Thread Pim Blokland
no control over what font makers make their fonts look like. Pim Blokland

Re: Roman numerals in non-latin text

2003-06-12 Thread Pim Blokland
coding, just as using greek Iotas or combinations of U+2160 and U+0049 would be. Pim Blokland

Accented ij ligatures (was: Unicode Public Review Issues update)

2003-06-30 Thread Pim Blokland
and dotless j and a single dot above, centered between them. Can you give examples? Pim Blokland

Re: Accented ij ligatures (was: Unicode Public Review Issues update)

2003-07-01 Thread Pim Blokland
, but now it's there, there's no reason to ignore it when refining the rules, to deprecate it practically. Pim Blokland

Re: [OT] When is a character a currency sign?

2003-07-08 Thread Pim Blokland
something? Pim Blokland

Re: [OT] French Government Bans the Term 'E-Mail'

2003-07-21 Thread Pim Blokland
bit! Pim Blokland

Re: [OT?] LCD/LED Keyboard

2003-07-25 Thread Pim Blokland
- no more messing around with felt tip pens, just a simple, clean, hardware solution. People would love that. Or is that just me? Pim Blokland

Re: Handwritten EURO sign (off topic?)

2003-08-10 Thread Pim Blokland
probably disagree, saying that it's alright with Dutch being another language and such, and as we're extremely off topic by now, I propose not continuing this thread on this list. If you do want to argue, you know my email address. Pim Blokland

Re: Handwritten EURO sign

2003-08-14 Thread Pim Blokland
which was intended to ensure clean typography. Sigh. I have absolutely no idea what you are talking about. Pim Blokland

Re: Hexadecimal

2003-08-16 Thread Pim Blokland
it a letter rather than a symbol. I'd expect if it was put in for completeness, to complement the degrees Fahrenheit and degree Celcius, it would have had the same category as those two? Pim Blokland

Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Pim Blokland
metres... 4.28 actually. But are you serious about lengthening the yard to be the same size as the meter? Ha! Fat chance! You might as well suggest we abolish the yard altogether! Pim Blokland

Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Pim Blokland
depending on which mile one is talking about! Well, not all of those measurements are the same size. Disregarding nautical miles, there's still the matter of the yards. Did I mention that my front yard is not the same size as my back yard? Pim Blokland

Mailing lists

2003-08-23 Thread Pim Blokland
; if so, where? Pim Blokland

Re: ISO pulls back

2003-10-01 Thread Pim Blokland
, currency and language codes free of charge. Pim Blokland

Re: Web Form: Other Question: British pound sign - U+00A3

2003-10-01 Thread Pim Blokland
does Unicode support?) Pim Blokland

Re: Problems encoding the spanish o

2003-11-17 Thread Pim Blokland
this into UTF-16 would yield U+DB7A U+DC0D, which is what you got in your output. Pim Blokland

BOM as WJ?

2003-11-19 Thread Pim Blokland
No-Break Space]. Is this something that has slipped by the editors? Or am I missing something? Pim Blokland