Suggestions for next print edition

2001-12-02 Thread juuichiketajin

1. Unicode points are NUMBERS. Numbers can be written in ANY base. Knowing decimal 
values of codepoints is sometimes useful, so please print them in the next edition of 
the Unicode book.

2. There was a Shift-JIS index for kanji. I don't know much about kanji, but it seems 
to me that they are arranged in a-i-u-e-o order of on'yomi. Why not print little 
hiragana letters at the top to aid people searching for a kanji?

Remember how I could not find the ran of randamu before? Let's see this time... 
Aha! There is is!
I know it was somewhere between mo(kuyoubi) and (fu)ro. Better than stroke / 
radical, I wonder?
* Disclaimer: From what I hear, the Japanese do NOT write randamu as U+4E71 U+3060 
U+3080. They use U+30E9 U+30F3 U+30C0 U+30E0. But the first is cuter. ^_^
-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: C with bar for with

2001-12-02 Thread Stefan Persson

- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 2 december 2001 02:16
Subject: C with bar for with


 Someone said that in English, c-with-underbar means with. My mom writes
this as c-with-overline.

Well, then I suppose this is a glyph variant of the c with underbar…

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Writing/finding a UTF8, UTF16, UTF32 converter

2001-12-02 Thread Theo

Hi UniCode list,

I am dealing with unicode for XML. I'm sorry if this bothers a few
people, but reading the technical information is not very easy. The
crossings out and underlinings don't help, the information seems a bit
scattered, and the usually interesting information is not linked to in
easy to find places.

I think I have finally found what I wanted, the table:

Table 3.1. UTF-8 Bit Distribution

on http://www.unicode.org/unicode/reports/tr27/

Basically, I want to write some code that can convert UTF8, UTF16, and
UTF32 to any of the other two formats. I suppose I could use UTF32 as a
go-between to reduce the conversion possibilities.

Anyhow, does anyone know of any existing source code that does this
transformation?

I don't feel like using Apple's UniCode converter because it seems so
complex it will probably take MORE work for me to access it, than just
write the conversion code myself. And even then I hear it doesn't do
UTF32, so there is no use. And even then I have to compile my code for
Win32 also, so its even more no use.

If anyone knows of some existing code that does the transformation,
that would help. I might end up re-writing it myself and just use the
code as a working example.

All that bitshifting and bitmasking such should slow down my UTF8/UTF16
processing, is there any accepted good way to speed this up? Some form
of table perhaps?

--
This email was probably cleaned with Email Cleaner, by:
Theodore H. Smith - Macintosh Consultant / Contractor.
My website: www.elfdata.com/





RE: C with bar for with

2001-12-02 Thread Yves Arrouye

It may even be a glyph variant of the w with forward slash...
YA

 -Original Message-
 From: Stefan Persson [mailto:[EMAIL PROTECTED]]
 Sent: Sunday, December 02, 2001 3:19 AM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: C with bar for with
 
 - Original Message -
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: den 2 december 2001 02:16
 Subject: C with bar for with
 
 
  Someone said that in English, c-with-underbar means with. My mom
 writes
 this as c-with-overline.
 
 Well, then I suppose this is a glyph variant of the c with underbar...
 
 Stefan
 
 
 _
 Do You Yahoo!?
 Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: John Hudson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and)
 with a line below. In handwritten text it is almost always used instead
of
 , in machine-written text I don't think I've ever seen it.

 This is, as your analysis suggests, a glyph variant, not a distinct
 character.

Well, this character is *only* used in Swedish, while  is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is α a glyph variant of a and あ?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: John Hudson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and)
 with a line below. In handwritten text it is almost always used instead
of
 , in machine-written text I don't think I've ever seen it.

 This is, as your analysis suggests, a glyph variant, not a distinct
 character.

Well, this character is *only* used in Swedish, while  is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is α a glyph variant of a and あ? Or even better, what about A and
Α?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: John Hudson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and)
 with a line below. In handwritten text it is almost always used instead
of
 , in machine-written text I don't think I've ever seen it.

 This is, as your analysis suggests, a glyph variant, not a distinct
 character.

Well, this character is *only* used in Swedish, while  is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is α a glyph variant of a and あ? Or even better, what about A and
Α?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Are these characters encoded?

2001-12-02 Thread Stefan Persson

- Original Message -
From: John Hudson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: den 1 december 2001 21:01
Subject: Re: Are these characters encoded?


 1.) Swedish ampersand (see .bmp). It's an o (for och, i.e. and)
 with a line below. In handwritten text it is almost always used instead
of
 , in machine-written text I don't think I've ever seen it.

 This is, as your analysis suggests, a glyph variant, not a distinct
 character.

Well, this character is *only* used in Swedish, while  is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*! Or
is α a glyph variant of a and あ? Or even better, what about A and
Α?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





RE: Are these characters encoded?

2001-12-02 Thread Kent Karlsson


   1.) Swedish ampersand (see .bmp). It's an o (for 
 och, i.e. and)
   with a line below. In handwritten text it is almost 
 always used instead of
   , in machine-written text I don't think I've ever seen it.
 
 This might be a character in its own right, as different 
 from the ampersand
 as U+204A TIRONIAN SIGN ET.  Or it might be simply a glyph 
 variant of  the
 ampersand.

No.

 If you have never seen o-underbar in machine-written text, I
 doubt that this will help your cause much.  You might try 
 U+006F U+0332,

Yes. (But some write o., esp. in the rare event this is typed.)

Similarly, COMBINING OVERLINE and COMBINING LOW LINE
should be used, together with ordinary I, V etc. (when possible)
to get lined roman numerals.

 though this will probably not give you the vertical spacing you expect.
 
 It is certainly not a glyph variant of an ampersand. An ampersand is 
 a ligature of e and t. 

True (both). (ampersand is somewhat of a misnomer.)

 This is certainly an abbreviation of och. That 
 both mean and is NOT a reason for unifying different signs.
 
 Having said that, it seems to me that U+00B0 would represent Stefan's 
 character easily enough.

No. It's not a degree sign.  Nor is 00BA appropriate: the underlined o is
not superscripted/raised (much, if at all).

Kind regards
/kent k





RE: Are these characters encoded?

2001-12-02 Thread Michael Everson

At 17:12 +0100 2001-12-02, Kent Karlsson wrote:

Similarly, COMBINING OVERLINE and COMBINING LOW LINE
should be used, together with ordinary I, V etc. (when possible)
to get lined roman numerals.

What? Surely this is a font matter, and using combining characters a 
hack here. In Quark one might just draw a line and align it with the 
font.

   It is certainly not a glyph variant of an ampersand. An ampersand is
   a ligature of e and t.

True (both). (ampersand is somewhat of a misnomer.)

It derives from and per se and, apparently.

   This is certainly an abbreviation of och. That
   both mean and is NOT a reason for unifying different signs.
  
   Having said that, it seems to me that U+00B0 would represent Stefan's
   character easily enough.

No. It's not a degree sign.  Nor is 00BA appropriate: the underlined o is
not superscripted/raised (much, if at all).

Sorry, I did mean U+00BA, and subscription or superscription of the 
glyph in that character is a matter of glyph choice.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 06:17 12/2/2001, Stefan Persson wrote:

Well, this character is *only* used in Swedish, while  is used in most
(all?) languages using Roman letters, so it has a partially different usage!
Using this character in, for example, an English text would be *wrong*!

Which is why I went on to suggest that the Swedish manuscript ampersand 
form (the 'och' abbreviation) might be substituted 'in Swedish text'. The 
OpenType glyph substitution model, for example, associates lookups with 
particular script and language system combination, so it is possible to to 
have something like this:

 Latin latn
 Swedish SWE
 Stylistic Alternates salt
 ampersand - ampersand.swe

This substitution would only be applied in Swedish text. Now, this 
particular aspect of OpenType is not well supported yet, but it is a viable 
mechanism for the kind of substitution that the 'och' glyph requires.

Please note that I am not saying that the 'och' should not be encoded, only 
that there may well be good reasons to consider this form as a glyph 
variant and existing technologies for dealing with it as such. In order to 
make a case for encoding the 'och' ampersand, I think you will need to 
demonstate a need to distinguish it from the regular ampersand in plain 
text documents.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: C with bar for with

2001-12-02 Thread Wm Seán Glen



The lower case 'c' with either and overscore or an underscore 
is used in medical terminology. It means "with" and comes from the Latin "cum". 
The English version is lower case 'w' with a solidus "w/"
Seán


Re: Writing/finding a UTF8, UTF16, UTF32 converter

2001-12-02 Thread Rick McGowan

There is code for doing UTF8/16/32 conversions:

ftp://www.unicode.org/Public/PROGRAMS/CVTUTF

Rick





Re: Are these characters encoded?

2001-12-02 Thread Michael Everson

At 10:05 -0800 2001-12-02, John Hudson wrote:
At 14:14 12/1/2001, Michael Everson wrote:

It is certainly not a glyph variant of an ampersand. An ampersand 
is a ligature of e and t. This is certainly an abbreviation of och. 
That both mean and is NOT a reason for unifying different signs.

The fact that  is accepted by Swedish readers as a substitute for 
the 'och' sign, and that the latter seems to be limited to 
manuscript, suggests a glyph variant. I do not consider the fact 
that both mean 'and' to be a reason for unifying different signs. I 
ponder whether two different signs that are apparently used 
*interchangeably* might be unified?

Um, I accept etc. and c. and 7c. (the last with a Tironian et, 
admittedly peculiar to most readers of English) as meaning the same 
thing but that doesn't mean that  and 7 are the same character. They 
have different origins which are well known. You don't unify that 
kind of thing.

In Irish many people accept srl and rl and 7rl as meaning the 
same thing as well. The form with the actual  is considered peculiar.

o. and o-with-underscore are NOT glyph variants of a ligature of 
e and t (at a character level), no matter what they mean.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Are these characters encoded?

2001-12-02 Thread juuichiketajin

Then why not unify DIGIT THREE with HAN DIGIT THREE?


-Original Message-
From: John Hudson [EMAIL PROTECTED]
Date: Sun, 02 Dec 2001 10:05:36 -0800
To: Michael Everson [EMAIL PROTECTED]
Subject: Re: Are these characters encoded?


 At 14:14 12/1/2001, Michael Everson wrote:
 
 It is certainly not a glyph variant of an ampersand. An ampersand is a 
 ligature of e and t. This is certainly an abbreviation of och. That both 
 mean and is NOT a reason for unifying different signs.
 
 The fact that  is accepted by Swedish readers as a substitute for the 
 'och' sign, and that the latter seems to be limited to manuscript, suggests 
 a glyph variant. I do not consider the fact that both mean 'and' to be a 
 reason for unifying different signs. I ponder whether two different signs 
 that are apparently used *interchangeably* might be unified?
 
 John Hudson
 
 Tiro Typeworkswww.tiro.com
 Vancouver, BC [EMAIL PROTECTED]
 
 ... es ist ein unwiederbringliches Bild der Vergangenheit,
 das mit jeder Gegenwart zu verschwinden droht, die sich
 nicht in ihm gemeint erkannte.
 
 ... every image of the past that is not recognized by the
 present as one of its own concerns threatens to disappear
 irretrievably.
Walter Benjamin
 
 
 

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: Are these characters encoded?

2001-12-02 Thread DougEwell2

In a message dated 2001-12-02 11:00:32 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

 o. and o-with-underscore are NOT glyph variants of a ligature of 
 e and t (at a character level), no matter what they mean.

I suggested that Stefan's o-underscore and might OR might not be a 
variation of the ampersand, in all its many existing glyph variants.

The glyph variant side is bolstered by the argument that it's a symbol, 
just like , used to mean and without any translation necessarily taking 
place; that it's only used in Swedish; and that users consider it equivalent 
to  and use different forms depending on whether the text is handwritten or 
typed.

The separate character side can point to the fact that its derivation is 
completely different from that of ; that it looks nothing like any of the 
existing forms of  (like TIRONIAN SIGN ET); and that it's only used in 
Swedish (cf. GREEK QUESTION MARK).

I don't think there is one obvious answer to this.  I will say this, however: 
The majority of posts stating that some character or other is not in 
Unicode turn out to be bogus; the proposed character is really a glyph 
variant or presentation form.  Stefan's original post had the following three 
points:

1.  Swedish o-underscore -- maybe, maybe not
2.  Fraction slash -- already encoded
3.  Roman numerals -- overextension of compatibility forms; rendering issue

When two of three proposals can be quickly blown off, it is human nature that 
sometimes it is difficult to see the potential virtue in the third.

I also want to say that, although Michael is of course correct that  was 
originally a ligature of e and t, many, many of the  glyphs seen today do 
not even remotely resemble such a ligature.  Consider the top three glyphs in 
the attached GIF (only 290 bytes).  The first is obviously still an e-t 
ligature, the second is one with centuries of typographical evolution applied 
to it (and today more closely resembles a treble clef), the third is not at 
all.  If traceability to the original Latin et were what made these 
characters the same or different, then that might have spoken against the 
separate encoding of TIRONIAN SIGN ET.

I never think of  as meaning et, even the glyph variants that do look like 
an e-t ligature.  I assume that practically all users of this symbol treat it 
as a logograph meaning and in the language of the surrounding text.  (I 
have, rarely, seen  used in Spanish text, which strikes me as funny since 
the Spanish words for and (y and e) would not seem to need 
abbreviating.)

So the question might be posed, do Swedish users think of o-underscore as a 
logograph meaning och or as an abbreviation for the spelled-out word och?

In a message dated 2001-12-02 9:23:51 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

 Having said that, it seems to me that U+00B0 would represent Stefan's
 character easily enough.

 No. It's not a degree sign.  Nor is 00BA appropriate: the underlined o is
 not superscripted/raised (much, if at all).

 Sorry, I did mean U+00BA, and subscription or superscription of the 
 glyph in that character is a matter of glyph choice.

I think, though, that use of U+00BA MASCULINE ORDINAL INDICATOR would be a 
classic example of hijacking a character for an unintended and inappropriate 
purpose simply because its glyph looks close enough.  This would be like 
using U+003B at the end of a Greek question.  I stick to my original 
suggestion of U+006F U+0332, crossing my fingers that rendering engines will 
handle this correctly.

-Doug Ewell
 Fullerton, California




Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 15:16 12/2/2001, [EMAIL PROTECTED] wrote:

Then why not unify DIGIT THREE with HAN DIGIT THREE?

I don't know enough about the Han encoding to answer that. Because they are 
distinguished in existing character sets? Because someone has a need to 
distinguish them in plain text?

I'm not saying that the Swedish och sign should automatically be unified 
with the ampersand. I'm simply pointing out that, as described to date on 
this list, it is not clear that this sign needs to be separately encoded. 
We know that is can be treated as a language-specific glyph variant because 
Swedish readers apparently accept both forms to means exactly the same 
thing. Whether such treatment is sufficient depends on whether there is 
also need to distinguish the two forms, and to do so in plain text. I think 
Michael Everson made a strong case for separate encoding of the Tironian et 
sign, and I think a similarly strong case would need to be made for 
separately encoding the Swedish och sign.

I'm perfectly happy to include the och sign in my fonts, whether it is 
encoded or not, and to provide mechanisms to access the glyph. At the 
moment, though, I don't think it is clear whether it is best for this sign 
to be encoded or not. What might be the impact on Swedish keyboard drivers? 
Is the intention that a new och sign character should replace the ampersand 
character in Swedish text processing, or should both be used? What is the 
impact on existing documents?

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Are these characters encoded?

2001-12-02 Thread juuichiketajin

Perhaps they should be. I wonder: When transcribing a foreign name (like a business 
name) that includes the ampersand, would a Swede use the och sign?
I can't answer that.

In other words, does there exist a case where the ampersand and the och sign are not 
interchangeable?


-Original Message-
From: John Hudson [EMAIL PROTECTED]
Date: Sun, 02 Dec 2001 16:33:04 -0800
To: [EMAIL PROTECTED]
Subject: Re: Are these characters encoded?


 At 15:16 12/2/2001, [EMAIL PROTECTED] wrote:
 
 Then why not unify DIGIT THREE with HAN DIGIT THREE?
 
 I don't know enough about the Han encoding to answer that. Because they are 
 distinguished in existing character sets? Because someone has a need to 
 distinguish them in plain text?
 
 I'm not saying that the Swedish och sign should automatically be unified 
 with the ampersand. I'm simply pointing out that, as described to date on 
 this list, it is not clear that this sign needs to be separately encoded. 
 We know that is can be treated as a language-specific glyph variant because 
 Swedish readers apparently accept both forms to means exactly the same 
 thing. Whether such treatment is sufficient depends on whether there is 
 also need to distinguish the two forms, and to do so in plain text. I think 
 Michael Everson made a strong case for separate encoding of the Tironian et 
 sign, and I think a similarly strong case would need to be made for 
 separately encoding the Swedish och sign.
 
 I'm perfectly happy to include the och sign in my fonts, whether it is 
 encoded or not, and to provide mechanisms to access the glyph. At the 
 moment, though, I don't think it is clear whether it is best for this sign 
 to be encoded or not. What might be the impact on Swedish keyboard drivers? 
 Is the intention that a new och sign character should replace the ampersand 
 character in Swedish text processing, or should both be used? What is the 
 impact on existing documents?
 
 John Hudson
 
 Tiro Typeworkswww.tiro.com
 Vancouver, BC [EMAIL PROTECTED]
 
 ... es ist ein unwiederbringliches Bild der Vergangenheit,
 das mit jeder Gegenwart zu verschwinden droht, die sich
 nicht in ihm gemeint erkannte.
 
 ... every image of the past that is not recognized by the
 present as one of its own concerns threatens to disappear
 irretrievably.
Walter Benjamin
 
 
 

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: Are these characters encoded?

2001-12-02 Thread John Hudson

At 21:33 12/1/2001, Asmus Freytag wrote:

If the character can be shown to have as much justification for existence
as coded character as similar characters in the standard, i.e. if it's
ever used in printed handwriting, etc., etc., than we will have a tough
time coming up with a unification that's not (far) worse than just adding
it by itself.

Indeed. If it is not suitable to treat the och sign as a variant form of 
the ampersand, it would be better to give it its own codepoint rather than 
try to unify it with some other character(s) that would require more 
convoluted rendering.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin