Suggestions for next print edition
1. Unicode points are NUMBERS. Numbers can be written in ANY base. Knowing decimal values of codepoints is sometimes useful, so please print them in the next edition of the Unicode book. 2. There was a Shift-JIS index for kanji. I don't know much about kanji, but it seems to me that they are arranged in a-i-u-e-o order of on'yomi. Why not print little hiragana letters at the top to aid people searching for a kanji? Remember how I could not find the ran of randamu before? Let's see this time... Aha! There is is! I know it was somewhere between mo(kuyoubi) and (fu)ro. Better than stroke / radical, I wonder? * Disclaimer: From what I hear, the Japanese do NOT write randamu as U+4E71 U+3060 U+3080. They use U+30E9 U+30F3 U+30C0 U+30E0. But the first is cuter. ^_^ -- ___ Get your free email from http://www.ranmamail.com Powered by Outblaze
Re: C with bar for with
- Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 2 december 2001 02:16 Subject: C with bar for with Someone said that in English, c-with-underbar means with. My mom writes this as c-with-overline. Well, then I suppose this is a glyph variant of the c with underbar… Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Writing/finding a UTF8, UTF16, UTF32 converter
Hi UniCode list, I am dealing with unicode for XML. I'm sorry if this bothers a few people, but reading the technical information is not very easy. The crossings out and underlinings don't help, the information seems a bit scattered, and the usually interesting information is not linked to in easy to find places. I think I have finally found what I wanted, the table: Table 3.1. UTF-8 Bit Distribution on http://www.unicode.org/unicode/reports/tr27/ Basically, I want to write some code that can convert UTF8, UTF16, and UTF32 to any of the other two formats. I suppose I could use UTF32 as a go-between to reduce the conversion possibilities. Anyhow, does anyone know of any existing source code that does this transformation? I don't feel like using Apple's UniCode converter because it seems so complex it will probably take MORE work for me to access it, than just write the conversion code myself. And even then I hear it doesn't do UTF32, so there is no use. And even then I have to compile my code for Win32 also, so its even more no use. If anyone knows of some existing code that does the transformation, that would help. I might end up re-writing it myself and just use the code as a working example. All that bitshifting and bitmasking such should slow down my UTF8/UTF16 processing, is there any accepted good way to speed this up? Some form of table perhaps? -- This email was probably cleaned with Email Cleaner, by: Theodore H. Smith - Macintosh Consultant / Contractor. My website: www.elfdata.com/
RE: C with bar for with
It may even be a glyph variant of the w with forward slash... YA -Original Message- From: Stefan Persson [mailto:[EMAIL PROTECTED]] Sent: Sunday, December 02, 2001 3:19 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: C with bar for with - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 2 december 2001 02:16 Subject: C with bar for with Someone said that in English, c-with-underbar means with. My mom writes this as c-with-overline. Well, then I suppose this is a glyph variant of the c with underbar... Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Are these characters encoded?
- Original Message - From: John Hudson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 1 december 2001 21:01 Subject: Re: Are these characters encoded? 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and) with a line below. In handwritten text it is almost always used instead of , in machine-written text I don't think I've ever seen it. This is, as your analysis suggests, a glyph variant, not a distinct character. Well, this character is *only* used in Swedish, while is used in most (all?) languages using Roman letters, so it has a partially different usage! Using this character in, for example, an English text would be *wrong*! Or is α a glyph variant of a and あ? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Are these characters encoded?
- Original Message - From: John Hudson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 1 december 2001 21:01 Subject: Re: Are these characters encoded? 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and) with a line below. In handwritten text it is almost always used instead of , in machine-written text I don't think I've ever seen it. This is, as your analysis suggests, a glyph variant, not a distinct character. Well, this character is *only* used in Swedish, while is used in most (all?) languages using Roman letters, so it has a partially different usage! Using this character in, for example, an English text would be *wrong*! Or is α a glyph variant of a and あ? Or even better, what about A and Α? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Are these characters encoded?
- Original Message - From: John Hudson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 1 december 2001 21:01 Subject: Re: Are these characters encoded? 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and) with a line below. In handwritten text it is almost always used instead of , in machine-written text I don't think I've ever seen it. This is, as your analysis suggests, a glyph variant, not a distinct character. Well, this character is *only* used in Swedish, while is used in most (all?) languages using Roman letters, so it has a partially different usage! Using this character in, for example, an English text would be *wrong*! Or is α a glyph variant of a and あ? Or even better, what about A and Α? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Are these characters encoded?
- Original Message - From: John Hudson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: den 1 december 2001 21:01 Subject: Re: Are these characters encoded? 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and) with a line below. In handwritten text it is almost always used instead of , in machine-written text I don't think I've ever seen it. This is, as your analysis suggests, a glyph variant, not a distinct character. Well, this character is *only* used in Swedish, while is used in most (all?) languages using Roman letters, so it has a partially different usage! Using this character in, for example, an English text would be *wrong*! Or is α a glyph variant of a and あ? Or even better, what about A and Α? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
RE: Are these characters encoded?
1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and) with a line below. In handwritten text it is almost always used instead of , in machine-written text I don't think I've ever seen it. This might be a character in its own right, as different from the ampersand as U+204A TIRONIAN SIGN ET. Or it might be simply a glyph variant of the ampersand. No. If you have never seen o-underbar in machine-written text, I doubt that this will help your cause much. You might try U+006F U+0332, Yes. (But some write o., esp. in the rare event this is typed.) Similarly, COMBINING OVERLINE and COMBINING LOW LINE should be used, together with ordinary I, V etc. (when possible) to get lined roman numerals. though this will probably not give you the vertical spacing you expect. It is certainly not a glyph variant of an ampersand. An ampersand is a ligature of e and t. True (both). (ampersand is somewhat of a misnomer.) This is certainly an abbreviation of och. That both mean and is NOT a reason for unifying different signs. Having said that, it seems to me that U+00B0 would represent Stefan's character easily enough. No. It's not a degree sign. Nor is 00BA appropriate: the underlined o is not superscripted/raised (much, if at all). Kind regards /kent k
RE: Are these characters encoded?
At 17:12 +0100 2001-12-02, Kent Karlsson wrote: Similarly, COMBINING OVERLINE and COMBINING LOW LINE should be used, together with ordinary I, V etc. (when possible) to get lined roman numerals. What? Surely this is a font matter, and using combining characters a hack here. In Quark one might just draw a line and align it with the font. It is certainly not a glyph variant of an ampersand. An ampersand is a ligature of e and t. True (both). (ampersand is somewhat of a misnomer.) It derives from and per se and, apparently. This is certainly an abbreviation of och. That both mean and is NOT a reason for unifying different signs. Having said that, it seems to me that U+00B0 would represent Stefan's character easily enough. No. It's not a degree sign. Nor is 00BA appropriate: the underlined o is not superscripted/raised (much, if at all). Sorry, I did mean U+00BA, and subscription or superscription of the glyph in that character is a matter of glyph choice. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Are these characters encoded?
At 06:17 12/2/2001, Stefan Persson wrote: Well, this character is *only* used in Swedish, while is used in most (all?) languages using Roman letters, so it has a partially different usage! Using this character in, for example, an English text would be *wrong*! Which is why I went on to suggest that the Swedish manuscript ampersand form (the 'och' abbreviation) might be substituted 'in Swedish text'. The OpenType glyph substitution model, for example, associates lookups with particular script and language system combination, so it is possible to to have something like this: Latin latn Swedish SWE Stylistic Alternates salt ampersand - ampersand.swe This substitution would only be applied in Swedish text. Now, this particular aspect of OpenType is not well supported yet, but it is a viable mechanism for the kind of substitution that the 'och' glyph requires. Please note that I am not saying that the 'och' should not be encoded, only that there may well be good reasons to consider this form as a glyph variant and existing technologies for dealing with it as such. In order to make a case for encoding the 'och' ampersand, I think you will need to demonstate a need to distinguish it from the regular ampersand in plain text documents. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: C with bar for with
The lower case 'c' with either and overscore or an underscore is used in medical terminology. It means "with" and comes from the Latin "cum". The English version is lower case 'w' with a solidus "w/" Seán
Re: Writing/finding a UTF8, UTF16, UTF32 converter
There is code for doing UTF8/16/32 conversions: ftp://www.unicode.org/Public/PROGRAMS/CVTUTF Rick
Re: Are these characters encoded?
At 10:05 -0800 2001-12-02, John Hudson wrote: At 14:14 12/1/2001, Michael Everson wrote: It is certainly not a glyph variant of an ampersand. An ampersand is a ligature of e and t. This is certainly an abbreviation of och. That both mean and is NOT a reason for unifying different signs. The fact that is accepted by Swedish readers as a substitute for the 'och' sign, and that the latter seems to be limited to manuscript, suggests a glyph variant. I do not consider the fact that both mean 'and' to be a reason for unifying different signs. I ponder whether two different signs that are apparently used *interchangeably* might be unified? Um, I accept etc. and c. and 7c. (the last with a Tironian et, admittedly peculiar to most readers of English) as meaning the same thing but that doesn't mean that and 7 are the same character. They have different origins which are well known. You don't unify that kind of thing. In Irish many people accept srl and rl and 7rl as meaning the same thing as well. The form with the actual is considered peculiar. o. and o-with-underscore are NOT glyph variants of a ligature of e and t (at a character level), no matter what they mean. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Are these characters encoded?
Then why not unify DIGIT THREE with HAN DIGIT THREE? -Original Message- From: John Hudson [EMAIL PROTECTED] Date: Sun, 02 Dec 2001 10:05:36 -0800 To: Michael Everson [EMAIL PROTECTED] Subject: Re: Are these characters encoded? At 14:14 12/1/2001, Michael Everson wrote: It is certainly not a glyph variant of an ampersand. An ampersand is a ligature of e and t. This is certainly an abbreviation of och. That both mean and is NOT a reason for unifying different signs. The fact that is accepted by Swedish readers as a substitute for the 'och' sign, and that the latter seems to be limited to manuscript, suggests a glyph variant. I do not consider the fact that both mean 'and' to be a reason for unifying different signs. I ponder whether two different signs that are apparently used *interchangeably* might be unified? John Hudson Tiro Typeworkswww.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin -- ___ Get your free email from http://www.ranmamail.com Powered by Outblaze
Re: Are these characters encoded?
In a message dated 2001-12-02 11:00:32 Pacific Standard Time, [EMAIL PROTECTED] writes: o. and o-with-underscore are NOT glyph variants of a ligature of e and t (at a character level), no matter what they mean. I suggested that Stefan's o-underscore and might OR might not be a variation of the ampersand, in all its many existing glyph variants. The glyph variant side is bolstered by the argument that it's a symbol, just like , used to mean and without any translation necessarily taking place; that it's only used in Swedish; and that users consider it equivalent to and use different forms depending on whether the text is handwritten or typed. The separate character side can point to the fact that its derivation is completely different from that of ; that it looks nothing like any of the existing forms of (like TIRONIAN SIGN ET); and that it's only used in Swedish (cf. GREEK QUESTION MARK). I don't think there is one obvious answer to this. I will say this, however: The majority of posts stating that some character or other is not in Unicode turn out to be bogus; the proposed character is really a glyph variant or presentation form. Stefan's original post had the following three points: 1. Swedish o-underscore -- maybe, maybe not 2. Fraction slash -- already encoded 3. Roman numerals -- overextension of compatibility forms; rendering issue When two of three proposals can be quickly blown off, it is human nature that sometimes it is difficult to see the potential virtue in the third. I also want to say that, although Michael is of course correct that was originally a ligature of e and t, many, many of the glyphs seen today do not even remotely resemble such a ligature. Consider the top three glyphs in the attached GIF (only 290 bytes). The first is obviously still an e-t ligature, the second is one with centuries of typographical evolution applied to it (and today more closely resembles a treble clef), the third is not at all. If traceability to the original Latin et were what made these characters the same or different, then that might have spoken against the separate encoding of TIRONIAN SIGN ET. I never think of as meaning et, even the glyph variants that do look like an e-t ligature. I assume that practically all users of this symbol treat it as a logograph meaning and in the language of the surrounding text. (I have, rarely, seen used in Spanish text, which strikes me as funny since the Spanish words for and (y and e) would not seem to need abbreviating.) So the question might be posed, do Swedish users think of o-underscore as a logograph meaning och or as an abbreviation for the spelled-out word och? In a message dated 2001-12-02 9:23:51 Pacific Standard Time, [EMAIL PROTECTED] writes: Having said that, it seems to me that U+00B0 would represent Stefan's character easily enough. No. It's not a degree sign. Nor is 00BA appropriate: the underlined o is not superscripted/raised (much, if at all). Sorry, I did mean U+00BA, and subscription or superscription of the glyph in that character is a matter of glyph choice. I think, though, that use of U+00BA MASCULINE ORDINAL INDICATOR would be a classic example of hijacking a character for an unintended and inappropriate purpose simply because its glyph looks close enough. This would be like using U+003B at the end of a Greek question. I stick to my original suggestion of U+006F U+0332, crossing my fingers that rendering engines will handle this correctly. -Doug Ewell Fullerton, California
Re: Are these characters encoded?
At 15:16 12/2/2001, [EMAIL PROTECTED] wrote: Then why not unify DIGIT THREE with HAN DIGIT THREE? I don't know enough about the Han encoding to answer that. Because they are distinguished in existing character sets? Because someone has a need to distinguish them in plain text? I'm not saying that the Swedish och sign should automatically be unified with the ampersand. I'm simply pointing out that, as described to date on this list, it is not clear that this sign needs to be separately encoded. We know that is can be treated as a language-specific glyph variant because Swedish readers apparently accept both forms to means exactly the same thing. Whether such treatment is sufficient depends on whether there is also need to distinguish the two forms, and to do so in plain text. I think Michael Everson made a strong case for separate encoding of the Tironian et sign, and I think a similarly strong case would need to be made for separately encoding the Swedish och sign. I'm perfectly happy to include the och sign in my fonts, whether it is encoded or not, and to provide mechanisms to access the glyph. At the moment, though, I don't think it is clear whether it is best for this sign to be encoded or not. What might be the impact on Swedish keyboard drivers? Is the intention that a new och sign character should replace the ampersand character in Swedish text processing, or should both be used? What is the impact on existing documents? John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin
Re: Are these characters encoded?
Perhaps they should be. I wonder: When transcribing a foreign name (like a business name) that includes the ampersand, would a Swede use the och sign? I can't answer that. In other words, does there exist a case where the ampersand and the och sign are not interchangeable? -Original Message- From: John Hudson [EMAIL PROTECTED] Date: Sun, 02 Dec 2001 16:33:04 -0800 To: [EMAIL PROTECTED] Subject: Re: Are these characters encoded? At 15:16 12/2/2001, [EMAIL PROTECTED] wrote: Then why not unify DIGIT THREE with HAN DIGIT THREE? I don't know enough about the Han encoding to answer that. Because they are distinguished in existing character sets? Because someone has a need to distinguish them in plain text? I'm not saying that the Swedish och sign should automatically be unified with the ampersand. I'm simply pointing out that, as described to date on this list, it is not clear that this sign needs to be separately encoded. We know that is can be treated as a language-specific glyph variant because Swedish readers apparently accept both forms to means exactly the same thing. Whether such treatment is sufficient depends on whether there is also need to distinguish the two forms, and to do so in plain text. I think Michael Everson made a strong case for separate encoding of the Tironian et sign, and I think a similarly strong case would need to be made for separately encoding the Swedish och sign. I'm perfectly happy to include the och sign in my fonts, whether it is encoded or not, and to provide mechanisms to access the glyph. At the moment, though, I don't think it is clear whether it is best for this sign to be encoded or not. What might be the impact on Swedish keyboard drivers? Is the intention that a new och sign character should replace the ampersand character in Swedish text processing, or should both be used? What is the impact on existing documents? John Hudson Tiro Typeworkswww.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin -- ___ Get your free email from http://www.ranmamail.com Powered by Outblaze
Re: Are these characters encoded?
At 21:33 12/1/2001, Asmus Freytag wrote: If the character can be shown to have as much justification for existence as coded character as similar characters in the standard, i.e. if it's ever used in printed handwriting, etc., etc., than we will have a tough time coming up with a unification that's not (far) worse than just adding it by itself. Indeed. If it is not suitable to treat the och sign as a variant form of the ampersand, it would be better to give it its own codepoint rather than try to unify it with some other character(s) that would require more convoluted rendering. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] ... es ist ein unwiederbringliches Bild der Vergangenheit, das mit jeder Gegenwart zu verschwinden droht, die sich nicht in ihm gemeint erkannte. ... every image of the past that is not recognized by the present as one of its own concerns threatens to disappear irretrievably. Walter Benjamin