Re: N4106

2011-11-07 Thread Szelp A. Szabolcs
I welcome that finally the combining parentheses are encoded as such,
and not by precomposed diacritics, especially, as I had evidence for
8-10 additional parenthesed diacritics (of course from linguistic
material, where else from!) to those presented in the earlier
Teuthonista proposals.

I was wondering whether COMBINING DOUBLE PARENTHESES *BELOW* wouldn't
be added for symmetry? (Being fully aware that analogy usually is
not taken into account, as e.g. combining small letters above are
still an open set), but I also seem to remember Teuthonista material
showing double parenthesis below in a chart. Is its missing an
editorial oversight, or does my memory trick me?

Concerning the explanatory paragraph dealing with the fused carons, I
have some reservations.
In fact, the current behaviour with carons changing to apostrophes
unconditionally, i.e. without the requirement of langauge codes not
unlike the case of Russian vs. Serbian Cyrillic italics poses some
problems already concerning the encodability of {d, t, l, L}+caron
whith an actual (detached) caron to display. Such representation is
not unheard of. While this can be solved on the font level in high
quality typography, the main question concerning the fused caron is
indeed, whether the fused carons are distinct from the detached carons
themselves.
In fact, most attached combining marks are considered as
_letter_modifications_ in Unicode, as far as I see, be it b with right
hook or the IPA symbols. This could warrant the encoding of what is
called d/k/t with attached caron as {D, K, T} WITH SPLIT TOP STEM or
something similar as singular codepoints. Of course, if the mentioned
additional fused carons in Landsmålsalfabetet give us letters more
by an order of magnitude, a combining mark solution would become more
desirable.

Szabolcs

--
Szelp, André Szabolcs

+43 (650) 79 22 400



On Sun, Nov 6, 2011 at 22:46, Kent Karlsson kent.karlsso...@telia.com wrote:

 Den 2011-11-05 04:23, skrev António Martins-Tuválkin tuval...@gmail.com:

 I'm going through N4106 ( http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4106.pdf
 ),
 ...

 I see the following characters being put forward for proposing to be
 encoded:

 1ABB COMBINING PARENTHESES ABOVE
 1ABC COMBINING DOUBLE PARENTHESES ABOVE
 1ABD COMBINING PARENTHESES BELOW
 1ABE COMBINING PARENTHESES OVERLAY

 Well, COMBINING DOUBLE PARENTHESES ABOVE seems to be the same as COMBINING
 PARENTHESES
 ABOVE, COMBINING PARENTHESES ABOVE. And COMBINING PARENTHESES OVERLAY seems
 to be just
 a tiny parenthesis before and a tiny parenthesis after; no need for a
 combining mark, especially one with
 a splitting behaviour.

 Otherwise, I think COMBINING ((DOUBLE)) PARENTHESES ABOVE/BELOW are an
 entirely new brand of
 characters in Unicode (if accepted as proposed). They are supposed to split
 (ok, we have split
 vowels in some Indic scripts, more on that below), but these split around
 *another combining mark*.
 So despite being given (as proposed) vanilla above/below mark properties,
 they do not stack the
 way such characters normally do, but is supposed to invoke an entirely new
 behaviour.

 Split vowels are not new, but they split around base characters (or more
 generally, around combining
 sequences), not around (a) combining character(s) only. Indeed, one can
 split these vowels into two
 characters (sometimes by canonical decomposition, when done right; sometime
 by cheating a bit and
 split into another character and the supposedly split vowel character but
 not interpreted as the
 second part of the decomposition; in principle one may need to cheat even
 more and use PUA characters
 in order to do this at the character level, but then that is really bad).

 That supposedly stacking combining marks *sometimes* (more a font dependence
 than a character
 dependence) don't stack but instead are laid out linearly is not new. But to
 *require* non-stacking
 behaviour for certain characters is new.

 So we have a combination of:

 1. Splitting. (Normally only used for some Indic scripts).

 2. Indeed splitting with no other characters to use for the decomposition,
 thus requiring the use of
    PUA characters, to stay compliant, for representing the result of the
 split at the character level.
    (This is entirely new, as far as I can tell.)

 3. The split is entirely *within* the sequence of combining characters
 (except for COMBINING
    PARENTHESES OVERLAY, which behaves as split vowels normally do, but still
 with issue 2), not
    around the combining sequence including the base. (This is entirely new.)

 4. Requiring (if at all supported) to use linear layout of combining
 characters instead of stacking.
    (This is entirely new.)

 This makes these proposed characters entirely unique in their display
 behaviour, IMO.

 This could be alleviated by encoding COMBINING BEGIN/END PARENTHESIS
 ABOVE/BELOW.
 That way the issues with split, as listed above, can be avoided. There is
 still the issue of requiring
 (when at all 

Re: N4106

2011-11-07 Thread vanisaac
From: Kent Karlsson kent.karlsson14_at_telia.com
Den 2011-11-05 04:23, skrev António Martins-Tuválkin tuvalkin_at_gmail.com:

  I'm going through N4106 ( http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4106.pdf ),
 ...
 
 I see the following characters being put forward for proposing to be
 encoded:
 
 1ABB COMBINING PARENTHESES ABOVE
 1ABC COMBINING DOUBLE PARENTHESES ABOVE
 1ABD COMBINING PARENTHESES BELOW
 1ABE COMBINING PARENTHESES OVERLAY
 
 Well, COMBINING DOUBLE PARENTHESES ABOVE seems to be the same as COMBINING
 PARENTHESES
 ABOVE, COMBINING PARENTHESES ABOVE. And COMBINING PARENTHESES OVERLAY seems
 to be just
 a tiny parenthesis before and a tiny parenthesis after; no need for a
 combining mark, especially one with
 a splitting behaviour.
 
 Otherwise, I think COMBINING ((DOUBLE)) PARENTHESES ABOVE/BELOW are an
 entirely new brand of
 characters in Unicode (if accepted as proposed). They are supposed to split
 (ok, we have split
 vowels in some Indic scripts, more on that below), but these split around
 *another combining mark*.
 So despite being given (as proposed) vanilla above/below mark properties,
 they do not stack the
 way such characters normally do, but is supposed to invoke an entirely new
 behaviour.

I agree, except that if we give them any but a ccc=220/230, then canonical 
reordering will separate them from the modifier letters that they are attached 
to. I think this is one of those cases where a definition needs to expand in 
order to accommodate architecture. We do already have some non-stacking 
behaviour defined for these characters in order to accommodate polytonic Greek, 
so we do have some experience with disparate appearances of consecutive marks.

 That supposedly stacking combining marks *sometimes* (more a font dependence
 than a character
 dependence) don't stack but instead are laid out linearly is not new. But to
 *require* non-stacking
 behaviour for certain characters is new.

Then think of it as the non-spacing version of stacking behaviour.

 So we have a combination of:
 
 1. Splitting. (Normally only used for some Indic scripts).
 
 2. Indeed splitting with no other characters to use for the decomposition,
 thus requiring the use of
PUA characters, to stay compliant, for representing the result of the
 split at the character level.
(This is entirely new, as far as I can tell.)

I cannot imagine in any way how this requires PUA characters.

 3. The split is entirely *within* the sequence of combining characters
 (except for COMBINING
PARENTHESES OVERLAY, which behaves as split vowels normally do, but still
 with issue 2), not
around the combining sequence including the base. (This is entirely new.)
 
 4. Requiring (if at all supported) to use linear layout of combining
 characters instead of stacking.
(This is entirely new.)

If I were designing a font, I would simply make the in/out mark attachment 
point near the top/middle of the parentheses, so that it drops down around the 
base mark, and then attaches any subsequent marks as if the parentheses 
weren't there. I think you're making this too complicated.

 This makes these proposed characters entirely unique in their display
 behaviour, IMO.

I do, however, agree totally with this assessment, I just believe it is more 
manageable than you paint it.

[snip]
 /Kent K 

I do, myself, have a couple of concerns in regards to several proposed 
characters in N4106 as well. Namely, I believe that U+1DF2, U+1DF3, and U+1DF4 
should require significant justification as to why they should not be encoded 
as U+0363 + U+0308, U+0366 + U+0308, and U+0367 + U+0308. I have similar 
concerns about U+A799, U+AB30, U+AB33, U+AB38, U+AB3E, U+AB3F, etc.

Van A




Re: N4106

2011-11-07 Thread Kent Karlsson

Den 2011-11-07 10:34, skrev vanis...@boil.afraid.org
vanis...@boil.afraid.org:

 So despite being given (as proposed) vanilla above/below mark properties,
 they do not stack the
 way such characters normally do, but is supposed to invoke an entirely new
 behaviour.
 
 I agree, except that if we give them any but a ccc=220/230, then canonical
 reordering will separate them from the modifier letters that they are attached

Nit: modifier letters (as that term is used in Unicode) are not combining
marks; here you mean combining marks.

 to. I think this is one of those cases where a definition needs to expand in
 order to accommodate architecture. We do already have some non-stacking
 behaviour defined for these characters in order to accommodate polytonic
 Greek, 
 so we do have some experience with disparate appearances of consecutive marks.

Yes, but that they have special behavior needs to be made explicit.

 That supposedly stacking combining marks *sometimes* (more a font dependence
 than a character
 dependence) don't stack but instead are laid out linearly is not new. But to
 *require* non-stacking
 behaviour for certain characters is new.
 
 Then think of it as the non-spacing version of stacking behaviour.

Would not be sufficient. See below.

 So we have a combination of:
 
 1. Splitting. (Normally only used for some Indic scripts).
 
 2. Indeed splitting with no other characters to use for the decomposition,
 thus requiring the use of
PUA characters, to stay compliant, for representing the result of the
 split at the character level.
(This is entirely new, as far as I can tell.)
 
 I cannot imagine in any way how this requires PUA characters.

Splitting is usually done at the character level... I know, some say that
this should always be done at the glyph level (somehow), but IIUC that is
not so in practice. And I think it is preferable to do it at the character
level, so that is not just handwaved away (oh, the font should do this...)
leaving it up to each and every font designer to do this odd-ball extra
(and thus won't be done most of the time, even if the font framework may
support it). Laying out linearly instead of stacking is quite enough
odd-ball extra.

 3. The split is entirely *within* the sequence of combining characters
 (except for COMBINING
PARENTHESES OVERLAY, which behaves as split vowels normally do, but still
 with issue 2), not
around the combining sequence including the base. (This is entirely new.)
 
 4. Requiring (if at all supported) to use linear layout of combining
 characters instead of stacking.
(This is entirely new.)
 
 If I were designing a font, I would simply make the in/out mark attachment
 point near the top/middle of the parentheses, so that it drops down around the
 base mark, and then attaches any subsequent marks as if the parentheses
 weren't there. I think you're making this too complicated.

But glyphs for combining marks may be of different widths, for example a
(glyph for a) dot below is much narrower than a (proposed) wiggly line
below. Or, consider LENIS MARK and DOUBLE LENIS MARK (both for Teuthonista,
and both apparently used together with parentheses). The usual, and general,
way of handling that is to actually split the
character-that-goes-on-both-sides of something that may have different
widths in different instances. Of course you also need width info for
combining marks. I would still consider splitting to be a needless
complication here, and instead encode begin/end pairs of combining
parentheses instead of what is in N4106.

 
 This makes these proposed characters entirely unique in their display
 behaviour, IMO.
 
 I do, however, agree totally with this assessment, I just believe it is more
 manageable than you paint it.
 
 [snip]
 /Kent K 
 
 I do, myself, have a couple of concerns in regards to several proposed
 characters in N4106 as well. Namely, I believe that U+1DF2, U+1DF3, and U+1DF4
 should require significant justification as to why they should not be encoded
 as U+0363 + U+0308, U+0366 + U+0308, and U+0367 + U+0308.

There is the issue of whether the diaeresis applies to the base letter (plus
something) or if it applies to the combining mark just under the diaeresis.

/Kent K


 I have similar 
 concerns about U+A799, U+AB30, U+AB33, U+AB38, U+AB3E, U+AB3F, etc.
 
 Van A
 
 





Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-07 Thread anbu
Hi! Unicode List

Code2000 supports most BMP code points of Unicode 5.2. It is open sourced
from September:

http://code2000.sourceforge.net/

Regards,

Anbu7

On Mon, 7 Nov 2011 13:45:47 +0600, Christopher Fynn
chris.f...@gmail.com wrote:
 Hard to keep track of these things - but shouldn't affect the fact
 that one can safely implement OpenType rendering without a licence
 from Adobe or Microsoft.
 
 
 On 7 November 2011 13:21, Philippe Verdy verd...@wanadoo.fr wrote:
 2011/11/7 Christopher Fynn chris.f...@gmail.com:
 I'm sure people like RedHat, Debian, and Sun/Oracle (who use it in
 OpenOffice) - have satisfied themselves that the open type rendering
 they use is unencumbered.

 Actually now, this (OpenOffice) should no longer be Sun/Oracle but
 Apache. Oracle has donated OpenOffice to Apache that accepted it.
 Since the ecqusiation of Sun by Oracle, some OpenOffice developers
 were unhappy with Oracle and splitted the project in LibreOffice; not
 sure that both projects will merge again now that OpenOffice goes to
 Apache, but Apache has stated that both projects could live now (there
 are some differences in the GUI, but many developers are already
 trying to make modules that works on both projects). For now
 OpenOffice is still branced by Oracle in the current distribution,
 this may change in the next major relase showing the Apache branch,
 once all IPR issues are solved between Oracle and Apache.





Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-07 Thread Ed Trager
On Mon, Nov 7, 2011 at 12:50 AM, Christopher Fynn chris.f...@gmail.com wrote:
 On 6 November 2011 15:33, Mahesh T. Pai paiva...@gmail.com wrote:
 ...
 You can update the firmware yourselves - go the custom ROM
 way. (errr... I am afraid the carriers' representatives may do
 something to me for advising this). :-D

 You can continue to be with the existing carrier.

 Look around. If all else fails, ask google - the search engine; not
 the customer support. (what a paradox). There are ways to root and
 de-root the phone for warranty purposes.

 We can probably all do things to get our own phones to work - but what
 is needed is that if *any* person buys *any* Android phone in India it
 just works with Indic scripts (even if the UI is not localized into
 all regional languages)


Not just if you buy a phone in India.  In this modern world, there are plenty of
students and immigrants all over the world who might enjoy communicating with
their friends and families in their native languages.

So I would say, if anyone buys *any* Android phone *anywhere* in the world,
it should just automatically support complex text rendering for all
Indic and Indic-derived
scripts ...

 I don't think it is an unreasonable expectation for consumers in these
 countries   when they shell out a lot of money for the latest phone to
 be able to send and receive messages in their script. .

 Another problem is that reviews of phones, even those published or
 broadcast in India, rarely mention whether the phone supports Indic
 scripts - so it is difficult for a consumer to take this into account
 when deciding which phone to buy. A lot of consumers just expect it to
 be there, since even much cheaper phones have offered this for years,.


I wonder if perhaps India needs to legislate script support on computing devices
and cell phones?

 I hope the latest Windows 7 phones don't have this limitation when we
 start to see them.

 - C






Code2000 on SourceForge (was Re: [indic] Re: Lack of Complex script rendering support on Android)

2011-11-07 Thread Andrew West
On 7 November 2011 08:34,  a...@peoplestring.com wrote:

 Code2000 supports most BMP code points of Unicode 5.2. It is open sourced
 from September:

 http://code2000.sourceforge.net/

I have doubts as to whether this project was actually created by James
Kass.  The project comprises the last public version of code2000.ttf
and a 210MB code2000.asm file which turns out to be a dump of the
ttf file in human-readable form, both of which could easily have been
put onto SourceForge in contravention of copyright and license by
someone pretending to be James who wants the font to be open source
now that the official Code2000 site has disappeared.  James once told
me that Code2000 was maintained as a 66 megabyte dBASE III database
file which is not what is on SourceForge.

Andrew



Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-07 Thread Andrew Cunningham
I'd agree with Ed, its a broader problem than just India, and a problem not
just based on market segments. I use Android devices often, but can not use
them as a serious tool for work because of what I would classify as serious
limitations in the OS and its internationalisation model. At moment mobile
devices are still toys unable to deal with most community languages I need
to work with or support in Australia, including many Latin script languages.

This is a limitation in all mobile OSes not just Android. I just have to
look through my facebook and google+ accounts to see many messages in
African and South East Asian languages that will not display.

I do love the irony of Android devices not being able to display all
content available through Google services.

Andrew


Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-07 Thread Mahesh T. Pai
Ed Trager said on Mon, Nov 07, 2011 at 11:11:19AM -0500,:
  Not just if you buy a phone in India.  In this modern world, there are 
  plenty of
  students and immigrants all over the world who might enjoy communicating with
  their friends and families in their native languages.
  
  So I would say, if anyone buys *any* Android phone *anywhere* in the world,
  it should just automatically support complex text rendering for all
  Indic and Indic-derived
  scripts ...

Arrey Bhai!!!

Aap samjah karo; hum hindustani log pura Indic bhasha ASCII mein likha
hai. Humko Unicode ya UTF-8 mumbu jumbo nahin chahiye. Sirf ek bhasha;
sirf ek keyboard lay out - ASCII.

(translation - ASCII is enough for us Indians to communicate in any
Indic language - only one Keyboard, only one language)

/just venting my frustration at all those device makers, and I am not
 very fluent in HI.

And the second paragraph you quoted above - those are almost exact
words I used in one of my comments on one issue over at Android's
issue tracker site.

  I wonder if perhaps India needs to legislate script support on
  computing devices and cell phones?

May help. 

OTOH, when I got my last nokia phone, the sales lady simply handed it
over it to me, after fiddling with it for some time. After I got out
of the shop, realised that the GUI was in some other language (either
MR or GJ). It was an XpressMusic series phone. She apparently could
not come out of the unfamiliar script. I finally looked up the user
manual (one of those rare events in my life when I read a user manual
before using a product), and followed the logos to come out. The X2
too has the same problem.  Otherwise, both were very good phones. But
ML renering was very poor.

To be fair, when I flash one of the EU firmwares on my Android device,
it boots up into Russian. I tried the Asian firmwares, only once - and
it booted up into Chinese/ Korean / Japanese / one of those East Asian
languages!!!  ;-D

-- 
Mahesh T. Pai   ||
End Users are just friends who haven't submitted a patch yet.



Re: N4106

2011-11-07 Thread Szelp A. Szabolcs

  If I were designing a font, I would simply make the in/out mark
 attachment
  point near the top/middle of the parentheses, so that it drops down
 around the
  base mark, and then attaches any subsequent marks as if the parentheses
  weren't there. I think you're making this too complicated.

 But glyphs for combining marks may be of different widths, for example a
 (glyph for a) dot below is much narrower than a (proposed) wiggly line
 below. Or, consider LENIS MARK and DOUBLE LENIS MARK (both for Teuthonista,
 and both apparently used together with parentheses). The usual, and
 general,
 way of handling that is to actually split the
 character-that-goes-on-both-sides of something that may have different
 widths in different instances. Of course you also need width info for
 combining marks. I would still consider splitting to be a needless
 complication here, and instead encode begin/end pairs of combining
 parentheses instead of what is in N4106.


No, the usual and general way of handling this is, if the uni-width of
parenthesis is not desirable for esthetic reasons, to create precomposed
_glyphs_ of the parenthesised diacritic by the font designer which are
mapped to a character sequence.

Szabolcs


Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-07 Thread Ed Trager
On Mon, Nov 7, 2011 at 12:07 PM, Mahesh T. Pai paiva...@gmail.com wrote:

...


 To be fair, when I flash one of the EU firmwares on my Android device,
 it boots up into Russian. I tried the Asian firmwares, only once - and
 it booted up into Chinese/ Korean / Japanese / one of those East Asian
 languages!!!  ;-D


As Andrew Cunningham noted, the internationalization model --or really
what I would call
LACK of an intelligent I18N model-- on a lot of so-called smart
phones just goes to show
how immature these industry segments and markets still remain.

For a while I used a Chinese-made smart phone that was capable of
displaying more
languages than the phone I received from my American cell phone
carrier --and thus I conclude it
was designed for sale in many export markets-- BUT it had horribly
annoying shortcomings.  For example, one could only type an SMS
message in Chinese if
it was set to a Chinese locale.  But even then the Chinese input
methods were limited and were inseparably tied to specific Chinese
locales (i.e., pinyin was available for the Mainland locale, but
pinyin was *not* available for a Traditional Chinese locale).  Similar
problems for Greek, Arabic, and
other locale choices.

This model is completely wrong and ignores the realities of most major
urban centers in the world today where numerous immigrant communities
not only exist but are also in constant contact with
friends/families/merchants/suppliers/other service providers from all
over the world.

I should be able to set my locale and display language to anything I
want; and still be able to use and switch-on-the-fly to any input
method for any language I want independent of the current locale and
display language.

- Ed

 --
 Mahesh T. Pai   ||
 End Users are just friends who haven't submitted a patch yet.