Re: Are these characters encoded?

2001-12-03 Thread Roozbeh Pournader

On Sun, 2 Dec 2001 [EMAIL PROTECTED] wrote:

 [...] (cf. GREEK QUESTION MARK).
 
 [...] This would be like using U+003B at the end of a Greek question.

Sorry, but U+037E GREEK QUESTION MARK is cannonically equivalent to U+003B 
SEMICOLON. I guess it is there only because ISO 8859-7 wanted to disunify 
them.


-- 
Note: If you want me to read a message, please make sure you include my
address in To or CC fields. I may not be able to follow all the
discussions on the mailing lists I subscribe. Sorry. (No, there's no
problem to receive duplicates.) --roozbeh





RE: Are these characters encoded?

2001-12-03 Thread Marco Cimarosti

Asmus Freytag wrote:
 Overloading the existing 00BA º is tempting, but would likely 
 result in
 incorrect output unless special purpose (read private use) 
 fonts are used,
 or unless it became common to have a Swedish glyph overrides 
 in fonts and
 rendering engines that applied them. Since the usage and typographic
 convention for 'och' and the raised o for numbering are not 
 related, this
 unification smells more of shoehorning than encoding.

Perhaps there is also a logical difference.

The Swedish o represents the *first* letter of a word (och), and can thus
be interpreted as o. (o *followed* by a dot); 00BA represents the *last*
letter of a word (it abbreviates ordinal adjectives like primero, segundo,
tercero... primo, secondo, terzo...), so it may logically be interpreted as
.o (o *preceded* by a dot).

_ Marco




Re: Indic editing (was: RE: The real solution)

2001-12-03 Thread Arjun Aggarwal

Hi Everybody

The statement by Mr. John Hudson that the system of  the fact that phonetic
keyboarding, while the norm for the Indian publishing and typesetting
industries, was not the norm for typewriters  is not entirely correct.
It was not the norm earlier but is the current norm for many years now.

Moreover, the concept of la = half la + danda may be natural for people
who are used to typewriters and typography. Which is, some of the people
who are more likely to switch to computers.

I fully agree with Mr. Marco Cimarosti in this regard.

This is the point to which i really wanted everybody to focus on i.e. the
problem of encoding as well as display .

Yes, there are many easy solutions. The fact is that this are worth nothing
until Unicode officially adopts one of them.

This is the ultimate truth and this was the main point with which i
initiated this dicussion .

With Regards
Arjun Aggarwal
[EMAIL PROTECTED]







Re: Indic editing (was: RE: The real solution)

2001-12-03 Thread Michael \(michka\) Kaplan

From: Arjun Aggarwal [EMAIL PROTECTED]

 Moreover, the concept of la = half la + danda may be natural for people
 who are used to typewriters and typography. Which is, some of the people
 who are more likely to switch to computers.

 I fully agree with Mr. Marco Cimarosti in this regard.

 This is the point to which i really wanted everybody to focus on i.e. the
 problem of encoding as well as display .

Well, you do need to understand that you could actually create input methods
that would allow people who wish to type this way to do so -- and the
underlyhing data could still be stored using the current encoding.

The needs of those who wish to keep their keyboards can be met without
trying to undo all the implementations that have been done.

--
MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/






RE: Indic editing (was: RE: The real solution)

2001-12-03 Thread Marco Cimarosti

Arjun Aggarwal wrote:
 Moreover, the concept of la = half la + danda may be 
 natural for people
 who are used to typewriters and typography. Which is, some 
 of the people
 who are more likely to switch to computers.
 
 I fully agree with Mr. Marco Cimarosti in this regard.
 
 This is the point to which i really wanted everybody to focus 
 on i.e. the problem of encoding as well as display .

Therefore, you don't fully agree with me.

My opinion is that the encoding is OK as it is in ISCII and Unicode. I take
in consideration your way of splitting the graphemes *only* at the editing
level.

_ Marco




RE: Are these characters encoded?

2001-12-03 Thread Kent Karlsson



Summary answer to the question in the subject 
line: yes.

As I tried to express as succinctly as possible 
before is that:1)  ando̲(underlined o, 
sometimes used as an abbreviation for 'och', as is 'o.' 
(dictionaries)and 
'o', and even 'å') is 
definitely not a glyph variant issue, they are not interchangeable,even 
though the meaning is the same. Asmus gave an example. Further one can use 
without spaces around it (since the ligature is so highly ligated), but 
for o̲ there shouldalways be spaces around 
it. B.t.w. 
 is called et-tecken in Swedish. Getting et-teckenrendered 
aso̲ (underlined o) would be surprising 
indeed.2) o̲ (underlined o; it even displays fair, but not 
good, in the font I'm using right now) isalready 
perfectly well available in Unicode. There no need to encode it again. 
Raising ita little bit (not much)over the baseline (that some do in 
handwriting) would be fine tuningthat 
is not appropriate for a character 
encoding, but might be for a handwriting imitatingfont, or for 
typographic fine tuning markup.
3) The following ones are all 
inappropriate:00B0;DEGREE SIGN;So;0;ET;N;00BA;MASCULINE ORDINAL INDICATOR;Ll;0;L;super 
006FN;2070;SUPERSCRIPT ZERO;No;0;EN;super 
0030;0;0;0;N;SUPERSCRIPT DIGIT ZERO
the 
first and last are obviously(?) wrong. Why not 00BA? There are two 
reasons: the glyphfor 00BA is not always underlined (even though a plain o 
can be used for 'och' in sloppyhandwriting or (rare) "spell as you speak" 
texts), andthe glyph for 00BAis (always) 
raisedtoo much for the o̲ (underlined o for 'och') 
usage. (But, but for "numero", which is also 
usedhere, I would use Nº (004E, 
00BA) rather than № (2116) or No̲ (004E, 006F, 
0332.)
 
Kind regards
 /kent k




RE: Indic editing (was: RE: The real solution)

2001-12-03 Thread Marco Cimarosti

O, by the way, I forgot this...

Arjun Aggarwal wrote:
 Yes, there are many easy solutions. The fact is that this 
 are worth nothing
 until Unicode officially adopts one of them.
 
 This is the ultimate truth and this was the main point with which i
 initiated this dicussion .

Almost every sentence may become the ultimate truth, if you remove enough
context to make it meaningless.

I can say a lot of tupid things on my own, and I don't need anybody's help
to put more stupid things in my mouth. Thanks.

My sentence above referred to a very specific problem: finding a way of
mapping the ISCII sequence RA + HALANT + INV to Unicode.

Here is the sentence in its original context:

Marco Cimarosti wrote:
 Dhrubajyoti Banerjee wrote:
[...]
  Marco Cimarosti wrote:
[...]
  I am talking again about REPHA IN ISOLATION: ISCII has a way of 
  representing
  it, but Unicode does not. This is needed, even only for 
  encoding didactic
  texts, and a solution to encode it (with ZWJ, probably) 
  should be found.
  
  I think the same way it is done in ISCII would be quite okay.
  In ISCII you get it by typing the INV character after ra virama.
  A similiar solution may be provided for, in Unicode, by 
 using ZW(N)J.
 
 Yes, there are many easy solutions. The fact is that this are 
 worth nothing
 until Unicode officially adopts one of them.

_ Marco




Re: Are these characters encoded?

2001-12-03 Thread Stefan Persson

- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: den 3 december 2001 02:35
Subject: Re: Are these characters encoded?


 Perhaps they should be.

Er... So 3 and 三 are the same character...?

 I wonder: When transcribing a foreign name (like a business name) that
includes the ampersand, would a Swede use the och sign?

Sometimes yes, sometimes no.

 In other words, does there exist a case where the ampersand and the och
sign are not interchangeable?

No. At least not if the text is in Swedish.

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Suggestions for next print edition

2001-12-03 Thread juuichiketajin


 You can always search the big Unihan.txt file on the kJapaneseKun
 and kJapaneseOn fields, which provide whatever information we have
 on pronunciation of the characters in Japanese.
 
 If you are just stuck looking up stuff because it isn't marked up
 for Japanese, try getting Sanseido's Unicode Kanji
 Information Dictionary, which has the first 20,902 kanji in Unicode
 (the most useful set) all marked up with all the Japanese pronunciations
 (where they have any). 

The first suggestion is useless. The file is too freaking big so maybe I'll go with 
the second. Thanks.

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: Are these characters encoded?

2001-12-03 Thread Tom Gewecke

When I've seen the c-underbar in print, it has always meant circa, as
in circa 1800.
Jim

At 10:14 PM 2001-12-01 + Saturday, Michael Everson wrote:
(As a side note, this o-underbar form reminds me of the c-underbar which
is sometimes used in handwritten English to mean with.  Does anyone know
the origin of this symbol?  Is it possibly derived from the Latin word cum,
meaning with?  Does it have any claim to being a character in its own
right?)

Perhaps a corruption of c-overbar, which is a medical abbreviaton for
with, sometimes used by nurses, doctors, and pharmacies?






FW: Question about some MS IE options

2001-12-03 Thread Magda Danish (Unicode)
Title: Message





-Original Message-From: Robert M. Gerlach 
[mailto:[EMAIL PROTECTED]] Sent: Monday, December 03, 2001 3:24 
PMTo: [EMAIL PROTECTED]Subject: 
Question
Hi,

When saving a webpage from within Microsoft 
Internet Explorer, there are a few notable options... and I'm really unsure as 
to what the differences are, which is "better," etc. I know you're not 
Microsoft or technical support fTM, but I'm betting that you guys would know 
better thantheywould[...]
Here they are:

Unicode
Unicode (UTF-8)
Western European (ISO)
Western European (Windows)

Thanks a million!

-Rob :)


Unicode/Customizable Typing Tutors Apps?

2001-12-03 Thread Nesbitt, Gavin

I'm just curious if anyone out there has come across a typing tutor app (web
based or installed) that is customizable and Unicode savvy? It doesn't have
to be very complex so long as it can handle different Unicode scripts. 

Thanks,
-Gavin




Unicode 1.0 names for control characters

2001-12-03 Thread DougEwell2

I am surprised and puzzled by the Unicode 1.0 Name changes for some of the 
ASCII and Latin-1 control characters that were introduced in the latest beta 
version of the Unicode 3.2 data file (UnicodeData-3.2.0d5.txt):

U+0009  HORIZONTAL TABULATION  ==  CHARACTER TABULATION
U+000B  VERTICAL TABULATION  ==  LINE TABULATION
U+001C  FILE SEPARATOR  ==  INFORMATION SEPARATOR FOUR
U+001D  GROUP SEPARATOR  ==  INFORMATION SEPARATOR THREE
U+001E  RECORD SEPARATOR  ==  INFORMATION SEPARATOR TWO
U+001F  UNIT SEPARATOR  ==  INFORMATION SEPARATOR ONE
U+008B  PARTIAL LINE DOWN  ==  PARTIAL LINE FORWARD
U+008C  PARTIAL LINE UP  ==  PARTIAL LINE BACKWARD

Were these new names (e.g. CHARACTER TABULATION) really the original 
Unicode 1.0 names?  I don't have my 1.0 book close at hand, but I know that 
they were *not* the names used in 1.1, according to the file namesall.lst 
from that version.  (Aha, didn't think anyone still had that dusty old thing 
lying around?)

IMHO, the new names CHARACTER TABULATION and LINE TABULATION are much less 
intuitive than HORIZONTAL TABULATION and VERTICAL TABULATION.  Sometimes you 
even see the abbrevations HT and VT for these two characters.  The new names 
appear to have been invented by someone who imagined a lack of clarity in the 
old names.

I have seen the names IS4, IS3, IS2, and IS1 before, but they do not convey 
the same information as FS, GS, RS, and US.  The latter names are more 
specific.

The old names for these six control characters were used as far back as the 
original 1963 version of ASCII, according to Mackenzie (pp. 245-247).

I don't know about the history of U+008B and U+008C, but again it seems 
strange that the Unicode 1.0 name for these characters is being changed at 
this late date.

I know this 1.0 name field is not subject to the same rule of no changes, 
ever that applies to the regular Character Name field, but why should these 
names be changed at all?

On this same topic, parenthesized abbreviations have been added to the 1.0 
names for U+000A LIFE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE 
RETURN (CR), and U+0085 NEXT LINE (NEL).  Does the addition of these 
abbreviations mean that they are now part of the official 1.0 name, and if 
so, why?  Other characters typically don't have abbreviations as part of 
their names, even if they are as meaningful and as commonly used as these, 
and again it is a change from the 1.0 name we have seen for a decade.

Perhaps I've been checking the beta files a bit TOO carefully.

-Doug Ewell
 Fullerton, California




Re: Are these characters encoded?

2001-12-03 Thread DougEwell2

In a message dated 2001-12-03 12:20:46 Pacific Standard Time, [EMAIL PROTECTED] 
writes:

  Perhaps a corruption of c-overbar, which is a medical abbreviaton for
  with, sometimes used by nurses, doctors, and pharmacies?

Thanks to everyone who, directly or indirectly, corrected me on this 
character.  Yes, you are all right: the character used in (as it turns out) 
the medical field to mean with is, in fact, c-overbar and not c-underbar.  
In Unicode we would say U+0063 U+0305.

So to get back to my original questions about this thing, (a) is it a 
character in its own right, (b) if so, is there any justification in encoding 
it separately rather than using a combining sequence, and (c) is this not 
*exactly* the same set of issues as the question of encoding the Swedish 
o-underbar?

-Doug Ewell
 Fullerton, California