from:"Khaled Hosny"

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Khaled Hosny via Unicode

On Fri, Feb 01, 2019 at 06:57:43PM +, Richard Wordingham via Unicode wrote:
> On Fri, 1 Feb 2019 13:02:45 +0200
> Khaled Hosny via Unicode  wrote:
> 
> > On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via
> > Unicode wrote:
> > > On Thu, 31 Jan 2019 12:46:48 +0100
> > > Egmont Koblinger  wrote:
> > > 
> > > No.  How many cells do CJK ideographs occupy?  We've had a strong
> > > hint that a medial BEH should occupy one cell, while an isolated
> > > BEH should occupy two.  
> > 
> > Monospaced Arabic fonts (there are not that many of them) are designed
> > so that all forms occupy just one cell (most even including the
> > mandatory lam-alef ligatures), unlike CJK fonts.
> > 
> > I can imagine the terminal restricting itself to monspaced fonts,
> > disable “liga” feature just in case, and expect the font to well
> > behave. Any other magic is likely to fail.
> 
> Of course, strictly speaking, a monospaced font cannot support harakat
> as Egmont has proposed.

There are two approaches for handling them in monospaced fonts;
combining them with base characters as usual, or as spacing characters
placed next to their bases. The later approach is a bit unusual, but
makes editing heavily voweled text a bit more pleasant. It requires good
OpenType support, though, so virtually no terminal supports it.

Regards,
Khaled

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Khaled Hosny via Unicode

On Thu, Jan 31, 2019 at 11:17:19PM +, Richard Wordingham via Unicode wrote:
> On Thu, 31 Jan 2019 12:46:48 +0100
> Egmont Koblinger  wrote:
> 
> No.  How many cells do CJK ideographs occupy?  We've had a strong hint
> that a medial BEH should occupy one cell, while an isolated BEH should
> occupy two.

Monospaced Arabic fonts (there are not that many of them) are designed
so that all forms occupy just one cell (most even including the mandatory
lam-alef ligatures), unlike CJK fonts.

I can imagine the terminal restricting itself to monspaced fonts,
disable “liga” feature just in case, and expect the font to well behave.
Any other magic is likely to fail.

Regards,
Khaled

Re: Encoding italic

2019-01-24 Thread Khaled Hosny via Unicode

On Thu, Jan 24, 2019 at 10:42:59PM +, Richard Wordingham via Unicode wrote:
> On Thu, 24 Jan 2019 18:24:07 +0200
> Khaled Hosny via Unicode  wrote:
> 
> > On Thu, Jan 24, 2019 at 03:54:29PM +, Andrew West via Unicode
> > wrote:
> >> On Thu, 24 Jan 2019 at 15:42, James Kass 
> >> wrote:  
> 
> >>> Going off topic a little, I saw this tweet from Marijn van Putten
> >>> today which shows examples of Arabic script from early Quranic
> >>> manuscripts with phonetic information indicated by the use of red
> >>> and green dots:
> >>> 
> >>> https://twitter.com/PhDniX/status/1088171783461703682
>  
> >> I would be interested to know how those should be represented in
> >> Unicode.
>  
> > It is possible to represent this by use of color fonts.
> 
> The limitations of rendering technology should not be an argument
> against an encoding.  We have characters that differ only in their
> properties, such as word-breaking and line-breaking.

They are already encoded, in their modern uncolored form. Some of the
modern forms like U+06E5 ARABIC SMALL WAW, U+06E5 ARABIC SMALL WAW, etc.
were even specifically “invented” in the previous century to overcome
the impracticality of printing in multiple colors, so the colored and
uncolored forms are different representations of the same underlying
characters.
 
> In this case, it may be argued that their colours apply only to their
> 'plain' colouring.  Who determines what their colour should be in blue
> text?  (Font technology seems to dictate that their colour is
> unaffected by the choice of foreground colour.)

The colors don’t change, the vowel marks are always red, the hamza is
always green/yellow.

Re: Encoding italic (was: A last missing link)

2019-01-24 Thread Khaled Hosny via Unicode

On Thu, Jan 24, 2019 at 03:54:29PM +, Andrew West via Unicode wrote:
> On Thu, 24 Jan 2019 at 15:42, James Kass  wrote:
> >
> > Here's a very polite reply from John Hudson from 2000,
> > http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML024/1042.html
> > ...and, over time, many of the replies to William Overington's colorful
> > suggestions were less than polite.  But it was clear that colors were
> > out-of-scope for a computer plain-text encoding standard.
> 
> Going off topic a little, I saw this tweet from Marijn van Putten
> today which shows examples of Arabic script from early Quranic
> manuscripts with phonetic information indicated by the use of red and
> green dots:
> 
> https://twitter.com/PhDniX/status/1088171783461703682
> 
> I would be interested to know how those should be represented in Unicode.

It is possible to represent this by use of color fonts. The green
(sometimes golden) dots are the hamza, the red ones are various vowel
marks. A color font would use colored glyphs for these instead of the
modern shapes. I did a color fonts that does a similar thing (but still
use the modern forms) and it is on my to do list to do a font using
archaic Kufi forms.

Regards,
Khaled

Re: A last missing link for interoperable representation

2019-01-13 Thread Khaled Hosny via Unicode

On Sun, Jan 13, 2019 at 04:52:25PM +, Julian Bradfield via Unicode wrote:
> On 2019-01-12, James Kass via Unicode  wrote:
> > This is an italicized word:
> > 푘푎푘푖푠푡표푐푟푎푐푦
> > ... where the "geek" hacker used Latin italics letters from the math 
> > alphanumeric range as though they were Latin italics letters.
> 
> It's a sequence of question marks unless you have an up to date
> Unicode font set up (which, as it happens, I don't for the terminal in
> which I read this mailing list). Since actual mathematicians don't use
> the Unicode math alphabets, there's no strong incentive to get updated
> fonts.

They do, but not necessarily by directly inputting them. LaTeX with the
“unicode-math” package will translate ASCII + font switches to the
respective Unicode math alphanumeric characters. Word will do the same.
Even browsers rendering MathML will do the same (though most likely the
MathML source will have the math alphanumeric characters already).

Regards,
Khaled

Re: Excessive emoji usage and TTS (was Re: A last missing link)

2019-01-10 Thread Khaled Hosny via Unicode

On Thu, Jan 10, 2019 at 09:54:59PM +0530, Shriramana Sharma via Unicode wrote:
> On Thu 10 Jan, 2019, 20:49 Arthur Reutenauer via Unicode <
> unicode@unicode.org wrote:
> 
> >
> >   On this topic, I was just pointed to
> >
> > https://twitter.com/kentcdodds/status/1083073242330361856
> >
> >   “You 혵혩혪혯혬 it's 풸퓊퓉ℯ to 현헿헶혁헲 your tweets and usernames
> > 햙햍햎햘 햜햆햞. But
> > have you 홡홞홨황홚홣홚홙 to what it 혴혰혶혯혥혴 혭혪혬혦 with assistive
> > technologies
> > like 퓥퓸퓲퓬퓮퓞퓿퓮퓻?”
> 
> 
> Something similar:
> 
> https://twitter.com/aaronreynolds/status/1083098920132071424?s=20
> 
> "This is what it’s like to get texts from my fourteen year old while
> driving."
> 
> https://t.co/s8949bmgZI

That is pretty good actually and even a positive point for emoji (if
these were mere images you would get nothing out of it without extra
tagging, and it would still lack the standardization). Nothing like what
one gets from the math symbols abuse.

Regards,
Khaled

Re: A sign/abbreviation for "magister"

2018-10-31 Thread Khaled Hosny via Unicode

On Wed, Oct 31, 2018 at 03:32:09PM -0700, Asmus Freytag via Unicode wrote:
> On 10/31/2018 9:03 AM, Khaled Hosny via Unicode wrote:
> 
> A while I was localizing some application to Arabic and the developer
> “helpfully” used m² for square meter, but that does not work for Arabic
> because there is no superscript ٢ in Unicode, so I had to contact the
> developer and ask for markup to be used for the superscript so that O
> can use it as well.
> 
> This just pushes the issue down one level.
> 
> Because it assumes that the presence/absence of markup is locale-independent.
> 
> For translation of general text I know this is not true. There are instances
> where some words in certain languages are customarily italicized in a way that
> is not lexical, therefore not something where the source language would ever
> supply markup.

That was a while ago, but IIRC, the markup was enabled for that
particular widget unconditionally. The localizer is now free to use the
markup or not use it, the string was translatable as whole with the
embedded markup. It should be possible to enable markup for any widget,
it is just an option to tick off in the UI designer, but may experience
is that markup is seldom needed in computer UIs, but I may be biased
with the kind of UIs and locales I’m most familiar with.

Regards,
Khaled

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Khaled Hosny via Unicode

On Tue, Oct 30, 2018 at 10:02:43PM +0100, Marcel Schneider wrote:
> On 30/10/2018  at 21:34, Khaled Hosny via Unicode wrote:
> > 
> > On Tue, Oct 30, 2018 at 04:52:47PM +0100, Marcel Schneider via Unicode 
> > wrote:
> > > E.g. in Arabic script, superscript is considered worth 
> > > encoding and using without any caveat, whereas when Latin script is on, 
> > > superscripts are thrown into the same cauldron as underscoring.
> > 
> > Curious, what Arabic superscripts are encoded in Unicode?
>  
> First, ARABIC LETTER SUPERSCRIPT ALEPH U+0671.
> But it is a vowel sign. Many letters put above are called superscript 
> when explaining in English.

As you say, this is a vowel sign not a superscript letter, so the name
is a misnomer at best. It should have been called COMBINING ARABIC
LETTER ALEF ABOVE, similar to COMBINING LATIN SMALL LETTER A. In Arabic
it is called small or dagger alef.

> There is the range U+FC5E..U+FC63 (presentation forms).

That is a backward compatiplity block no one is supposed to use, there
are many such backward comatipility presentation forms even of Latin
script (U+FB00..U+FB4F).

So I don’t see what makes you think, based on this, that Unicode is
favouring Arabic or other scripts over Latin.

Regards,
Khaled

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Khaled Hosny via Unicode

On Tue, Oct 30, 2018 at 04:52:47PM +0100, Marcel Schneider via Unicode wrote:
>   E.g. in Arabic script, superscript is considered worth 
> encoding and using without any caveat, whereas when Latin script is on, 
> superscripts are thrown into the same cauldron as underscoring.

Curious, what Arabic superscripts are encoded in Unicode?

Regards,
Khaled

Re: Unicode 11 Georgian uppercase vs. fonts

2018-07-27 Thread Khaled Hosny via Unicode

On Fri, Jul 27, 2018 at 02:02:07PM +0100, Michael Everson via Unicode wrote:
> 1) Show evidence of titlecasing in Hebrew or Arabic.

FWIW, there was a case system for Arabic used at some point in Egypt,
called “crown letters”, and introduced under the direction of king Fuad
and was used in some capacity in official documents till the end of the
monarch:
https://en.wikipedia.org/wiki/Crown_Letters_and_Punctuation_and_Their_Placements
http://hibastudio.com/wp-content/uploads/2014/01/ar458.jpg

Regards,
Khaled

Re: metric for block coverage

2018-02-18 Thread Khaled Hosny via Unicode

On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass via Unicode wrote:
> Adam Borowski wrote,
> 
> > I'm looking for a way to determine a font's coverage of available scripts.
> > It's probably reasonable to do this per Unicode block.  Also, it's a safe
> > assumption that a font which doesn't know a codepoint can do no complex
> > shaping of such a glyph, thus looking at just codepoints should be adequate
> > for our purposes.
> 
> You probably already know that basic script coverage information is
> stored internally in OpenType fonts in the OS/2 table.
> 
> https://docs.microsoft.com/en-us/typography/opentype/spec/os2
> 
> Parsing the bits in the "ulUnicodeRange..." entries may be the
> simplest way to get basic script coverage info.

Though this might not be very reliable since OpenType does not have a
definition of what it means for a Unicode block to be supported; some
font authoring tools use a percentage, others use the presence of any
characters in the range, and fonts might even provide incorrect data for
any reason.

However, I don’t think script or block coverage is that useful, what
users are usually interested in is the language coverage.

Regards,
Khaled

Re: superscripts & subscripts for science/mathematics?

2018-01-23 Thread Khaled Hosny via Unicode

On Mon, Jan 22, 2018 at 07:43:34PM -0800, David Melik via Unicode wrote:
> ‘The intended use was to allow chemical and algebra formulas to be written
> without
> markup’--https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts.
> Unless wrong, apart from  disagreement, it's clear mathematics word
> processing software is useful, but not a reason to not finish
> almost-complete set of basic superscripts & subscripts ((super|sub)scripts)
> for relevant alphabets used (English, Greek, perhaps Hebrew, latter two
> which were in my original post subject line, but I likely accidentally used
> link I received to delete pre-moderated post.)

Mathematics written in Arabic notation use Arabic-Indic numbers and
Arabic letters and they can occur in superscripts and subscripts as
well.

Regards,
Khaled

Re: Algorithms for Unicode script detection

2017-07-05 Thread Khaled Hosny via Unicode

On Thu, Jul 06, 2017 at 09:43:29AM +1000, Simon Cozens via Unicode wrote:
> I want to segment a Unicode text into runs according to their script.
> I've had a look through UAX#24 in the hope of finding a standard
> algorithm for doing this, but there isn't one specified. The
> implementation section gives some good pointers for what to be careful
> with (paired punctuation, etc.) but I can't find a step-by-step
> algorithm similar to the bidi algorithm or collation algorithm.
> 
> Equally, I don't see anything in ICU that segments into script-based
> runs. You can get script properties, but that doesn't help you resolve
> common characters in the context of a run.
> 
> Does anyone know of an open-source algorithm for doing this?

There is source/extra/scrptrun/ in ICU source tree (but not part of the
API), apparently it is used by its ParagraphLayout library. (A copy if
this code is used by Pango, and another copy is used by LibreOffice).

Regards,
Khaled

Re: Coloured Punctuation and Annotation

2017-04-06 Thread Khaled Hosny

On Thu, Apr 06, 2017 at 12:50:02PM +0200, Werner LEMBERG wrote:
> 
> > This page should show colored Hamza, diacritical dots and vowel
> > marks on web browsers that support MS color font format (currently
> > Firefox, Edge, and Internet Expoler on latest Windows 10):
> > http://www.amirifont.org/fatiha-colored.html
> > 
> > No special markup have been used, the color information is embedded
> > in a regular OpenType font.
> 
> Very nice!  It als works with Firefox on my GNU/Linux box.

I think I worded this vaguely, it works with Firefox on all platforms
(even on Android), the Windows 10 restriction is for Internet Expoler
only.

Regards,
Khaled

Re: Coloured Punctuation and Annotation

2017-04-06 Thread Khaled Hosny

On Wed, Apr 05, 2017 at 05:29:57PM -0700, Asmus Freytag wrote:
> On 4/5/2017 5:14 PM, Michael Everson wrote:
> > > On 5 Apr 2017, at 23:16, Asmus Freytag  wrote:
> > > 
> > > Do you have any examples of plain text that is rendered with parts of 
> > > characters having white (opaque) background?
> > > 
> > > I'm not aware of any
> > There are certainly MSS (in many languages) where some punctuation made of 
> > dots have some of the dots red and some black.
> 
> Agreed, those would be a challenge to reproduce with standard font
> technology and in plain text.

Not any more, thanks to Emoji!

This page should show colored Hamza, diacritical dots and vowel marks on
web browsers that support MS color font format (currently Firefox, Edge,
and Internet Expoler on latest Windows 10):
http://www.amirifont.org/fatiha-colored.html

No special markup have been used, the color information is embedded in
a regular OpenType font.

Regards,
Khaled

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Khaled Hosny

On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote:
> On 13 Mar 2017, at 17:55, J Decker  wrote:
> > 
> > I liked the Go implementation of character type - a rune type - which is a 
> > codepoint.  and strings that return runes from by index.
> > https://blog.golang.org/strings
> 
> IMO, returning code points by index is a mistake.  It over-emphasises
> the importance of the code point, which helps to continue the notion
> in some developers’ minds that code points are somehow “characters”.
> It also leads to people unnecessarily using UCS-4 as an internal
> representation, which seems to have very few advantages in practice
> over UTF-16.

But there are many text operations that require access to Unicode code
points. Take for example text layout, as mapping characters to glyphs
and back has to operate on code points. The idea that you never need to
work with code points is too simplistic.

Regards,
Khaled

Re: "A Programmer's Introduction to Unicode"

2017-03-10 Thread Khaled Hosny

On Fri, Mar 10, 2017 at 05:00:55PM +, Peter Constable wrote:
> FYI:
> 
> http://reedbeta.com/blog/programmers-intro-to-unicode/
> 
> The visuals may be the most interesting part. E.g., in the usage heat
> map, Arabic Presentation Forms-B lights up much more than I would have
> expected

I often see U+FEFB and other lam-alef ligatures used on social media (I
easily spot it because my default font does not have them so they end up
using fallback font).

My guess is that might be because some keyboard layouts (Xorg, Android?)
use them for the lam-alef keys on the keyboard (I’m guilty of doing this
for Xorg keyboard layout because it didn’t handle more than one
character per key, this was then decomposed back inside XIM input
method, but many people don’t use XIM and the decomposition does not
happen, it was messy overall).

Regards,
Khaled

Re: Superscript and Subscript Characters in General Use

2017-01-19 Thread Khaled Hosny

On Sat, Jan 14, 2017 at 02:18:01AM +0100, Marcel Schneider wrote:
> On Thu, 12 Jan 2017 17:01:41 +0200, Khaled Hosny wrote:
> > 
> > LibreOffice indeed did not use HarfBuzz on Windows before 5.3, which is 
> > not released yet. 
> 
> Is the integration of HarfBuzz limited to free software? 

HarfBuzz has a fairly liberal license, so in theory it can be used in
anywhere.

> And what might be the reason of the delayed integration of HarfBuzz in the 
> Windows version of LibreOffice?

Nothing specific, LibreOffice and OpenOffice.org before it and most like
StarOffice before them just used what API the platform provides to do
text layout, which is not an uncommon practice, it even seemed to be the
best practice back in time. The reasons it finally switched to HarfBuzz
are outlined in:

https://bugs.documentfoundation.org/show_bug.cgi?id=89870

Regards,
Khaled

Re: Superscript and Subscript Characters in General Use

2017-01-12 Thread Khaled Hosny

On Thu, Jan 12, 2017 at 03:22:18PM +0100, Marcel Schneider wrote:
> > This is done by HarfBuzz which automatically activates OpenType 
> > frac/dnom/numr features for  sequences, 
> > so if the font has the features one gets vulgar fractions out of box. 
> 
> According to Wikipedia (
> https://en.wikipedia.org/wiki/HarfBuzz#Major_users
> ), HarfBuzz is included in LibreOffice too, but being on Windows, despite of 
> having just installed the brandnew version 5.2.4.2, I still donʼt get it, 
> since 
> it comes with 5.3: 
> https://wiki.documentfoundation.org/ReleaseNotes/5.3#Text_Layout

LibreOffice indeed did not use HarfBuzz on Windows before 5.3, which is
not released yet.

Regards,
Khaled

Re: Superscript and Subscript Characters in General Use

2017-01-11 Thread Khaled Hosny

On Thu, Jan 12, 2017 at 12:24:29PM +0900, Martin J. Dürst wrote:
> On 2017/01/11 17:32, Richard Wordingham wrote:
> 
> > The truly straight Unicode approach in HTML is to use 1945.
> > Just entering those 5 characters into a text entry box in Firefox gave
> > me a properly formatted vulgar fraction.  That is how vulgar fractions
> > are supposed to work.  Unfortunately, one may need to avoid 'exciting
> > new fonts' in favour of those with a large, working repertoire.
> 
> Just for the record: The vulgar fraction display also happened in
> Thunderbird (on Windows). Firefox and Thunderbird use the same display
> engine. I have switched HTML display off, because I prefer to read all my
> mail in plain text, but it still worked.

This is done by HarfBuzz which automatically activates OpenType
frac/dnom/numr features for  sequences,
so if the font has the features one gets vulgar fractions out of box.
This works in Chrome as well since it uses HarfBuzz (older version of
Chrome didn’t enable HarfBuzz by default for Latin so the fractions
might not show there).

Regards,
Khaled

Re: Why incomplete subscript/superscript alphabet ?

2016-10-01 Thread Khaled Hosny

On Sat, Oct 01, 2016 at 03:00:50PM +0300, Jukka K. Korpela wrote:
> 1.10.2016, 11:29, Khaled Hosny wrote:
> 
> > On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote:
> [...]
> >> What I was pointing at was that when using
> > > rich text or markup, it is complicated or impossible to have 
> > > typographically
> > > correct glyphs used (even when they exist), whereas the use of Unicode
> > > codepoints for subscript or superscript characters may do that in a much
> > > simpler way.
> > 
> > That is not generally true.
> 
> It is generally true, but not without exceptions.
> 
> > In TeX you get true superscript glyphs by default.
> 
> I suppose you’re right, though I don’t know exactly how TeX implements
> superscripts. I suspect the fonts that TeX normally uses do not contain
> (many) superscript or subscript glyph variants, but TeX might actually map
> e.g. ^2 in math mode to a superscript glyph for 2 (identical with to the
> glyph for ²).

TeX has fonts designed for use at 8pt (size of 1st level scripts) and
5pt (the size of 2nd level scripts) with all the optical correction for
them to look right when scaled down. They provide all the glyphs
provided by the fonts for larger font sizes, so any character can be
used in super or subscripts, no special mapping is needed.

Regards,
Khaled

Re: Why incomplete subscript/superscript alphabet ?

2016-10-01 Thread Khaled Hosny

On Fri, Sep 30, 2016 at 07:31:58PM +0300, Jukka K. Korpela wrote:
> 30.9.2016, 19:11, Leonardo Boiko wrote:
> 
> > The Unicode codepoints are not intended as a place to store
> > typographically variant glyphs (much like the Unicode "italic"
> > characters aren't designed as a way of encoding italic faces).
> 
> There is no disagreement on this. What I was pointing at was that when using
> rich text or markup, it is complicated or impossible to have typographically
> correct glyphs used (even when they exist), whereas the use of Unicode
> codepoints for subscript or superscript characters may do that in a much
> simpler way.

That is not generally true. In TeX you get true superscript glyphs by
default. On the web you can use font features in CSS to get them as
well, provided that you are using a font that supports them.

Regards,
Khaled

Re: Unicode Bidi Algorithm – Java reference implementation

2016-09-17 Thread Khaled Hosny

On Sat, Sep 17, 2016 at 05:01:10PM +0530, Deepak Jois wrote:
> Hi
> 
> It seems that the Java reference implementation for the Unicode Bidi
> algorithm that I downloaded from the unicode.org site fails against
> some test cases in the BidiCharacterTest.txt file – the ones that are
> specifically meant to test for changes in Unicode 8.0.
> 
> Has the reference implementation been updated, and does anyone have a
> copy they can share? Is there a reference implementation in some other
> language that I could look at, which has been updated?

I think there is a C implementation that is kept up to date, and there
is also a Python implementation that should pass the tests:
https://github.com/behdad/pybyedie

Regards,
Khaled

Re: Numerical fractions written in Arabic script

2016-07-27 Thread Khaled Hosny

On Tue, Jul 26, 2016 at 09:12:38PM -0400, Robert Wheelock wrote:
> Hello again, y’all!
> 
> How do Arabs, Iranians, Afghans, Pakistanis, Urdu ... all write their
> equivalents of common numerical fractions (consisting of a numerator, a
> separator character, and a denominator)?!?!
> Considering that Arabic written script reads from right to left (like in
> Hebrew, Syro-Aramaic, and the fantasy language of Tsolyáni), would they use
> a normal right-facing foreslash (1/2), a left-facing backslash (1\2), or do
> they align numerator above|demoniator below a horizontal fraction bar?!?!

In Arabic, beveled fractions are written from left to right with a right
facing slash. Also the integer is written to the left of the fraction
(whether it is a nut or beveled fraction).

Regards,
Khaled

Re: non-breaking snakes

2016-05-04 Thread Khaled Hosny

That sounds more like traditional Tibetan justification than kashida:
http://rishida.net/scripts/tibetan/#justification

On Wed, May 04, 2016 at 09:23:04AM +0200, Mark Davis ☕️ wrote:
> Arabic has tatweel/kashida for justification; rather similar in principle.
> 
> https://en.wikipedia.org/wiki/Kashida
> 
> Mark
> 
> On Wed, May 4, 2016 at 9:14 AM, Shriramana Sharma  wrote:
> 
> > Isn't there some Japanese orthography feature that already does
> > something like this?
> >
> > --
> > Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा
> >

Re: Devanagari and Subscript and Superscript

2015-12-15 Thread Khaled Hosny

On Tue, Dec 15, 2015 at 11:55:02AM +, Plug Gulp wrote:
> Please note that the teacher had to use a Circumflex Accent (Caret) to
> indicate superscript, which is an unwritten convention, in the absence
> of proper superscript support within Unicode.

If the teacher is explaining actual math to his students, then the
superscript is the least of his worries.

Math typesetting is two dimensional, and is much more complex than
regular formated text (not even regular plan text)that it needs its own
typesetting engines.

There are various plain text markup languages to markup math, if one
really wants to represent complex mathematical notation in plain text.

Regards,
Khaled

Re: AW: Proposal for German capital letter "ß"

2015-12-09 Thread Khaled Hosny

On Wed, Dec 09, 2015 at 06:16:35PM +0100, Frédéric Grosshans wrote:
> * use your own casing rule and add a ZWNJ (zero width non joiner character)
> such that ss↔SS and ß↔S+ZWNJ + S.

Wouldn’t ZWJ be a more logical choice given that he wants to “join” both
S’s into a single character.

Regards,
Khaled

Re: APL Under-bar Characters

2015-08-16 Thread Khaled Hosny

On Sun, Aug 16, 2015 at 09:31:25AM -0700, alexwei...@alexweiner.com wrote:
 Khaled,
 Thank you for the link. The normalization methods were already discussed,
 specifically here:
 
 http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00047.html

Grapheme cluster boundaries detection is different from normalisation,
please read the link I provided.

 Where the problem of how big is ä is discussed. The answer being that this 
 is
 one symbol, because the Unicode Consortium decided that it is also its own
 standalone character. From the thread:
 
 I'll give you an example. What would you want ⍴,'ä' to be?
 
 Right now, that could return either 1 or 2 depending on whether the ä was 
 using
 the precomposed character (U+00E4) or the combining mark (U+0061, U+0308).
 Visually, these are identical, and generally you'd expect them to compare
 equal.

If you are counting grapheme clusters, then the answer is one in both
cases.

 In Unicode, the comparison of equivalent (but with different characters)
 strings are done by performing a normalisation step prior to comparison. There
 are 4 different types of normalisation, with different behaviour.

Quoting from the link I provided:

A key feature of default Unicode grapheme clusters (both legacy and
extended) is that they remain unchanged across all canonically
equivalent forms of the underlying text. Thus the boundaries remain
unchanged whether the text is in NFC or NFD. Using a grapheme
cluster as the fundamental unit of matching thus provides a very
clear and easily explained basis for canonically equivalent
matching. This is important for applications from searching to
regular expressions.

See also: http://unicode.org/faq/char_combmark.html#7

 Now, the ä character has a precomposed form in Unicode, and if you couple that
 with the NFC normalisation form, you'd get the above _expression_ to return 1.
 
 
 So I'm not sure why the allowance was made for ä as well as other certain
 characters,  but not for other things (under-bar characters) that face
 similar representation issues. 

It was encoded for compatibility of pre-existing character sets AFAIK.

Regards,
Khaled


 
 
  Original Message 
 Subject: Re: APL Under-bar Characters
 From: Khaled Hosny khaledho...@eglug.org
 Date: Sun, August 16, 2015 8:17 am
 To: alexwei...@alexweiner.com
 Cc: unicode@unicode.org
 
 On Sun, Aug 16, 2015 at 07:35:17AM -0700, alexwei...@alexweiner.com wrote:
  Hello Unicode Mailing List,
 
  There is significant discussion about the problems of adding capital
 letters
  with individual under-bars in this mailing list for GNU APL.
 
  http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00050.html
 
  Pretty much it adds up to the following problem:
 
  The string length functionality would view an 'A' code point combined
 with an
  '_' code point as an item that has two elements, while something that
 looks
  like 'A' Should be atomic, and return a length of one.
 
 I think what you need is better “character” counting [1], rather than
 new precomposed characters.
 
 Regards,
 Khaled
 
 1. http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

Re: APL Under-bar Characters

2015-08-16 Thread Khaled Hosny

On Sun, Aug 16, 2015 at 07:35:17AM -0700, alexwei...@alexweiner.com wrote:
 Hello Unicode Mailing List,
 
 There is significant discussion about the problems of adding capital letters
 with individual under-bars in this mailing list for GNU APL.
 
 http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00050.html
 
 Pretty much it adds up to the following problem:
 
 The string length functionality would view an 'A' code point combined with an
 '_' code point as an item that has two elements, while something that looks
 like 'A'  Should be atomic, and return a length of one.

I think what you need is better “character” counting [1], rather than
new precomposed characters.

Regards,
Khaled

1. http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

Re: ZWJ as a Ligature Suppressor

2015-08-10 Thread Khaled Hosny

This is not always true, some rendering engines (like HarfBuzz) try to
follow the Unicode rules so ZWJ does not break ligatures except in
Arabic where the standard says it should be interpreted as ZWJ, ZWNJ,
ZWJ.

Regards,
Khaled

On Mon, Aug 10, 2015 at 05:58:24PM +, Andrew Glass (WINDOWS) wrote:
 Hi Richard,
 
 To ligate or not to ligate is up to the font designer. Normally, GSUB lookups 
 that perform ligation will be broken by the presence of ZWJ or ZWNJ. If a 
 font designer wishes to ligate in the presence of a ZWJ or ZWNJ then they 
 could choose to include appropriate glyph sequences in their ligation 
 lookups. For example:
 
 glyphA glyphB - glyphC
 glyphA ZWJ glyphB - glyphC
 
 Cheers,
 
 Andrew
 
 
 Andrew Glass Ph.D.
 Program Manager
 Shell Text Input Group | Windows | Microsoft
 
 -Original Message-
 From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Richard 
 Wordingham
 Sent: Sunday, August 9, 2015 3:58 AM
 To: unicode@unicode.org
 Subject: ZWJ as a Ligature Suppressor
 
 According to the text just after TUS 7.0.0 Figure 23-3 
 (http://www.unicode.org/versions/Unicode7.0.0/ch23.pdf#G25237), ZWJ 
 suppresses ligatures in Arabic script.  Does this rule apply to other 
 normally cursive joined scripts, e.g. Syriac and Mongolian?
 
 Am I right in thinking that for an OpenType font for other scripts, the font 
 writer must take precautions to prevent ZWJ accidentally suppressing 
 ligatures that would be better suppressed by ZWNJ or ZWJ ZWNJ ZWJ?

Re: Plain text custom fraction input

2015-07-23 Thread Khaled Hosny

On Thu, Jul 23, 2015 at 10:25:22AM +0200, Marcel Schneider wrote:
 The remaining question would then be: What was the idea when at font
 design, the fraction slash was given left and right kerning, so that a
 preceding superscript digit will take exactly the place it has as a
 part of a precomposed fraction, and a following subscript takes place
 like if it were a denominator in one of the precomposed fractions?

What says that this kerning is there for super/subscript glyphs, it can
be equally (and more likely) be there for the numerator and denominator
glyphs.

Regards,
Khaled

Re: Plain text custom fraction input

2015-07-23 Thread Khaled Hosny

On Wed, Jul 22, 2015 at 11:54:02PM +0100, Richard Wordingham wrote:
 On Wed, 22 Jul 2015 12:21:32 +0200 (CEST)
 Marcel Schneider charupd...@orange.fr wrote:
 
  On 22 Jul 2015, at 09:52, Richard Wordingham  wrote:
 
  We never thought of common hieroglyphs otherwise as running LTR,
  while on monuments the great liberty of the script allows to run in
  amost all directions. IMO monumental transcription is always
  difficult to deal with, whenever exact rendering is expected.
  However, since Unicode's purpose is plain text encoding, we must
  stick with what I consider as a convention in egyptology...
 
 Which means that Ancient Egyptian hieroglyphs are unencoded!  Their
 default direction is right-to-left, but that's only the start of the
 trouble.  The encoded hieroglyphs aren't Bidi-mirrored, so if I embed
 then in a right-to-left override, I should get retrograde characters.

At least in OpenType, you can have mirrored glyphs in the font (which
you will need in any case) and use a “rtlm” feature which should be
applied when the text is being typeset right-to-left (naturally or
forced).

Regards,
Khaled

Re: Plain text custom fraction input

2015-07-22 Thread Khaled Hosny

On Wed, Jul 22, 2015 at 09:00:38AM +0200, Marcel Schneider wrote:
 On 21 Jul 2015, at 18;42, Doug Ewell  wrote:
 
  As explained in TUS 7.0, §6.2 (General Punctuation), p. 273, U+2044
  FRACTION SLASH is intended for use with Basic Latin digits, or other
  digits with General Category = Nd. The superscript and subscript
  presentation forms have General Category = No.
 
 That is was bugs me, that this kerning fraction slash is presented to
 us as to be used with plain digits, that overlap the fraction slash in
 proportional fonts. That recommendation is inconsistent with plain
 text encoding. Following TUS, anybody who uses U+2044 must use a
 fraction formatting feature. I know this from the time I'd got the
 valid demo version of some Desktop Publishing software. The feature
 wasn't flagged by the fraction slash, and on the other hand, the
 feature worked with the common slash U+002F too. It's a formatting
 command like superscript or underline.

Some layout engines, like HarfBuzz, automatically turn on the required
OpenType features for proper fraction rendering when fraction flag is
used. If the font has “numr” and “dnom” features, HarfBuzz will turn
them on for the digitsfraction slashdigits sequence. IMHO, that is
the most Unicode-compliant approach and other engines should do the
same.

Regards,
Khaled

Re: WORD JOINER vs ZWNBSP

2015-06-30 Thread Khaled Hosny

On Tue, Jun 30, 2015 at 11:02:18AM +0200, Marcel Schneider wrote:
 On Sun, Jun 28, 2015, Peter Constable 
 wrote:
 
  Marcel: Can you please clarify in what way Windows 7 is not supporting 
  U+2060.
 
 On my netbook, which is running Windows 7 Starter, U+2060 is not a
 part of any of the shipped fonts.

It is a control character, it does not need to have a glyph in the font
to be properly supported.

Re: Persian counter styles

2015-02-25 Thread Khaled Hosny

On Feb 26, 2015 2:41 AM, Shervin Afshar shervinafs...@gmail.com wrote:
 On Tue, Feb 24, 2015 at 11:04 AM, Khaled Hosny khaledho...@eglug.org
wrote:

 I don’t know about Persian, but in Arabic isolated Heh is not used in
 math or lists is it can be confused with Arabic-Indic digit five, and
 instead it is always used in initial form in such situations.


 I don't believe that the potential confusability between Arabic-Indic
digit five and stand-alone Heh implies that it should not be used in
writing math.

I only stated that it is not used (i.e. The current practice) whether it
should or shouldn't be used is up to the mathematicians who write that math
(and for one, the Arabic Mathematical Alphabetic block does not have an
isolated Heh, though its place is reserved).

Regards,
Khaled
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: Arabic percent sign and percent signs in RTL scripts

2014-02-05 Thread Khaled Hosny

On Tue, Feb 04, 2014 at 03:43:37PM +0100, Martinho Fernandes wrote:
 Is the arabic percent sign (U+066A) just a typographical variation of
 the normal percent sign (U+0025) or is it somehow more distinct than
 that?

The former. It is mainly used when Arabic-Indic or Extended Arabic-Indic
digits are used.
 
 What about its placement? Is it placed to the left or to the right of
 the digits it applies to?

It should follow the digit in the input stream, and its proper visual
placement should be handled by the Unicode bidirectional algorithm.

Regards,
Khaled
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: COMBINING OVER MARK?

2013-10-01 Thread Khaled Hosny

On Mon, Sep 30, 2013 at 05:51:09PM -0700, Leo Broukhis wrote:
 Hi All,
 
 Attached is a part of page 36 of  Henry Alford's *The Queen's English: a
 manual of idiom and usage (1888)* [
 http://archive.org/details/queensenglishman00alfo]
 
 Is the way to indicate alternative s/z spellings used there plain text
 (arguably, if it can be done with a typewriter, it is plain text)

I see a typeset book not an output of a typewriter.

 or rich text (ignoring the font size of letters s and z)?
 
 If it's the latter, what's the markup to achieve it?

Using TeX:

  \def\s{${}^{\rm s}_{\rm z}$}
  
  49. How are we to decide between {\it s} and {\it z} in such words as
  anathemati\s{}e, cauteri\s{}e, criti\-ci\s{}e, deodori\s{}e, dogmati\s{}e,
  fraterni\s{}e, and the rest? Many of these are derived from Greek
  \bye

Regards,
Khaled
attachment: tex.png

Re: COMBINING OVER MARK?

2013-10-01 Thread Khaled Hosny

Well that paragraph is rich text; different fonts (roman and upright) at
different sizes (text and script size) pretty much makes it formated
text to me.

Regards,
Khaled

On Tue, Oct 01, 2013 at 10:19:24AM -0700, Leo Broukhis wrote:
 Khaled,
 
 On a typewriter, the same effect can be achieved as
 anathematihalf-interval upsBS1 interval downzhalf-interval upe
 
 Where would the line between markup and typesetting languages be drawn?
 
 Leo
 
 
 On Tue, Oct 1, 2013 at 2:09 AM, Khaled Hosny khaledho...@eglug.org wrote:
 
  On Mon, Sep 30, 2013 at 05:51:09PM -0700, Leo Broukhis wrote:
   Hi All,
  
   Attached is a part of page 36 of  Henry Alford's *The Queen's English: a
   manual of idiom and usage (1888)* [
   http://archive.org/details/queensenglishman00alfo]
  
   Is the way to indicate alternative s/z spellings used there plain text
   (arguably, if it can be done with a typewriter, it is plain text)
 
  I see a typeset book not an output of a typewriter.
 
   or rich text (ignoring the font size of letters s and z)?
  
   If it's the latter, what's the markup to achieve it?
 
  Using TeX:
 
\def\s{${}^{\rm s}_{\rm z}$}
 
49. How are we to decide between {\it s} and {\it z} in such words as
anathemati\s{}e, cauteri\s{}e, criti\-ci\s{}e, deodori\s{}e,
  dogmati\s{}e,
fraterni\s{}e, and the rest? Many of these are derived from Greek
\bye
 
  Regards,
  Khaled

Re: Why blackletter letters?

2013-09-12 Thread Khaled Hosny

On Thu, Sep 12, 2013 at 01:21:28PM +0100, Neil Harris wrote:
 On 12/09/13 11:26, Johan Winge wrote:
 On Wed, 11 Sep 2013 20:29:51 +0200, Hans Aberg
 haber...@telia.com wrote:
 
 ... The symbol for the empty set ∅ is originally a Greek letter
 phi ϕ, ans some use the latter.
 
 According to the autobiography of André Weil, quoted at
 http://jeff560.tripod.com/set.html, the empty set symbol ∅ was
 inspired by the Scandinavian Ø, and would then have nothing to do
 with the Greek phi, except for a superficial resemblance. I'm
 aware that some mathematician indeed do use Φ/φ, supposedly due to
 this misconception and/or lacking coverage in fonts and/or
 carelessness, but I find it terribly annoying. Really, it is no
 more correct than using ß in lieu of β.
 
 -- Johan Winge
 
 
 
 Do some mathematicians _really_ use Φ/φ instead of ∅, or does it
 just look like they're doing so?

Seems so: http://math.stackexchange.com/q/227548

Also, when I went to school we were taught that phi denotes a group of
nothing, not sure if that was supposed to be the empty set (we were
taught math in Arabic, so not sure how that translates into English).

Regards,
Khaled

Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

2013-06-21 Thread Khaled Hosny

On Fri, Jun 21, 2013 at 02:27:38PM +0100, Michael Everson wrote:
 On 21 Jun 2013, at 14:06, Denis Jacquerye moy...@gmail.com wrote:
 
  It is not the character model that is not reliable, it is the application.
  If you application doesn't support locale settings and locale specific
  font features, fix the application.
 
 Try this in the file system. 

The file system embeds visual rendering of text? You probably mean the file
manager, my file manager shows me locale-dependant glyph variants
without any special setup (apart from choosing a font that have the said
variants, and they are available as OpenType variants, no less).

Regards,
Khaled

Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

2013-06-21 Thread Khaled Hosny

On Fri, Jun 21, 2013 at 04:00:20PM +0100, Michael Everson wrote:
 Yeah, I don't believe that you can language-tag individual file names
 for such display as that is markup. 

Why do you need to? You only need one language, it is not like file
names are multilingual high quality text books where every fine
typographic detail for each language have to be respected. Only the
language that the user care about matters, and this can be easily
inferred from the system locale, and passed down to the text rendering
stack.

Regards,
Khaled

Re: interaction of Arabic ligatures with vowel marks

2013-06-12 Thread Khaled Hosny

On Tue, Jun 11, 2013 at 08:09:31PM -0700, Stephan Stiller wrote:
 Hi,
 
 How is the placement of vowel marks around ligatures handled in Arabic text?

OpenType has special support for placing non combining marks over
ligatures (a subset of the general support for controlling the placement
of non-combining marks); it is entirely handled at text rendering level,
no difference in input whether the bases will be ligated or not.

No idea about other font technologies.

Regards,
Khaled

Re: xkcd: LTR

2012-11-27 Thread Khaled Hosny

Looks OK here, but that is probably FreeType doing its magic as usual.

Regards,
Khaled

On Tue, Nov 27, 2012 at 02:29:45AM +0100, Philippe Verdy wrote:
Also I really don't like the Deseret font:
{font-family: CMU; src: url(CMUSerif-Roman.ttf) format(truetype);}
that you have inserted in your stylesheet (da.css) which is used to display
the whole text content of the page, including the English Latin text at the
bottom part. This downloaded font is difficult to read as it is not hinted
at all (so its rendering on screen is extremely poor, we probably don't
want to print each page of this XKCD series, when the main interest is the
image which is perfectly readable).
Could you ask to someone in this list to help you hinting this font a
minimum (even basic autohinting would be much better).

2012/11/27 Philippe Verdy verd...@wanadoo.fr

Did you try add the xml:lang=en-Dsrt pseudo-attribute to the html
element, as suggested by the W3C Unicorn validator ?

http://validator.w3.org/unicorn/check?ucn_uri=www.xn--elqus623b.net%2FXKCD%2F1138.htmlucn_lang=frucn_task=conformance#

May be this could help IE and Firefox that can't figure out the language
used to properly detect the encoding if they still don't trust the XML
declaration in this case, to avoid them to use an encoding guesser. It is
anyay curious because this site is valid as XHTML 1.1 (not as HTML5 which
uses a very different and simplified prolog, which is not matched here, so
the legacy rules should apply to detect XHTML here, then legacy HTML4 if
XHTML is no longer recognized by IE and Firefox). Because XHTML is properly
tagged, the XML requirements should apply and the XML declaration in the
prolog should be used without needing to guess the encoding from the rest
of the content (starting by a meta element in the HTML head element).

2012/11/27 John H. Jenkins jenk...@apple.com

That's because the domain does, in fact, use sinograms and not Deseret.
(It's my Chinese name.)

On 2012年11月26日, at 下午1:54, Philippe Verdy verd...@wanadoo.fr wrote:

I wonder why this IDN link appears to me using sinograms in its domain
name, instead of Deseret letters. The link works, but my browser cannot
display it and its displays the Punycoded name instead without decoding it.

This is strange because I do have Deseret fonts installed and I can
view Unicoded HTML pages containing Deseret letters.

2012/11/26 John H. Jenkins jenk...@apple.com

Or, if one prefers:

http://www.井作恆.net/XKCD/1137.htmlhttp://www.xn--elqus623b.net/XKCD/1137.html

On 2012年11月21日, at 上午10:22, Deborah Goldsmith golds...@apple.com
wrote:

http://xkcd.com/1137/

Finally, an xkcd for Unicoders. :-)

Debbie

Re: Connecting overline and Connecting underline

2012-11-16 Thread Khaled Hosny

On Fri, Nov 16, 2012 at 08:02:49PM +0100, Andreas Prilop wrote:
  U+0305  Combining overline
  U+0332  Combining low line
 should both connect on left and right.
 
 Which software (program and font) actually does this
 when you overline/underline gh?
 Test at
 http://www.user.uni-hannover.de/nhtcapri/combining-marks.html

My (unreleased) Amiri Quran font handles combining overline pretty well,
but it has only Arabic characters (the code that generates the overline
glyphs and OpenType rules is pretty generic, but that font have only a
very specific used case).

Regards,
 Khaled
attachment: overline.png

Re: Caret

2012-11-12 Thread Khaled Hosny

On Mon, Nov 12, 2012 at 09:35:03PM +0100, Philippe Verdy wrote:
 I understand then. You have a single logical position (in encoded plain text),
 that maps to two visual positions which may be considered AFTER depending on
 the direction properties of the character that you *may* type.
 
 A single vertical line assumes however that you'll type a character which will
 use the SAME direction as the character BEFORE the insertion point.
 
 This case remains very infrequent: it is extremely rare to start typing text 
 in
 the middle between RTL and LTR text. Usually typing occurs at end of a
 paragraph, and most paragraphs use a single direction and when you have to
 insert new text in the middle of a paragraph, this is extremely rarely between
 a visual-LTR sequence and a viual RTL sequence (I think the most frequent case
 will occur between digits and letters/symbols, in cases like currency amounts
 or measurements).

I’m not sure from where you are getting your statistics, but I’ve to
deal with all those “rare” and “extremely rare” situations all the day.

Regards,
 Khaled

Re: Apostrophe, and DIN keyboard

2012-08-14 Thread Khaled Hosny

On Tue, Aug 14, 2012 at 03:56:23PM -0400, Robert Wheelock wrote:
 ... 90º ... 45º

BTW, the degree sign is ° not the masculine ordinal indicator that you
are using.

Regards,
 Khaled

Re: No appropriate code point for some Chinese punctuation marks

2012-07-22 Thread Khaled Hosny

On Sun, Jul 22, 2012 at 09:43:29AM -0700, Asmus Freytag wrote:
 Especially in multiscript environment, and those are not that rare,
 really, it's almost impossible to get such unfications to behave
 correctly without explicit font binding. And we all know that
 control of that is elusive in many contexts.

It is a quite possible actually, all needed is a text layout engine that
does automatic script tagging e.g. Pango and, to some extent, Firefox,
and font that provide localised, script-specific punctuation glyphs, and
it should just work even with plain text. I've been doing that with
Arabic and it works rather reliably.

Regards,
 Khaled

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-11 Thread Khaled Hosny

On Wed, Jul 11, 2012 at 10:47:33AM +0200, Hans Aberg wrote:
 On 11 Jul 2012, at 03:51, Khaled Hosny wrote:
 
  It can be handled at a different level; when one types 3:5 in a
  Unicode-complient TeX engine, what gets output to the output file is the
  ratio not the colon, and colon gets output with 3\colon{}5.
 
 Actually, TeX does it wrongly relative Unicode: a colon : in the
 input file should expand TeX $\colon$, whereas ∶ RATIO U+2236 should
 expand to TeX $:$.

It is a kind of primitive input method, like using / for division slash
and * for asterisk operator, and ratio is more frequent in math than the
colon. (original TeX handled this by having different glyphs/glyph
classes in math than TeX, Unicode-compliant TeX engines map them to the
appropriate Unicode character).

Regards,
 Khaled

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-11 Thread Khaled Hosny

On Wed, Jul 11, 2012 at 04:20:26PM +0200, Hans Aberg wrote:
 On 11 Jul 2012, at 15:59, Khaled Hosny wrote:
 
  On Wed, Jul 11, 2012 at 10:47:33AM +0200, Hans Aberg wrote:
  On 11 Jul 2012, at 03:51, Khaled Hosny wrote:
  
  It can be handled at a different level; when one types 3:5 in a
  Unicode-complient TeX engine, what gets output to the output file is the
  ratio not the colon, and colon gets output with 3\colon{}5.
  
  Actually, TeX does it wrongly relative Unicode: a colon : in the
  input file should expand TeX $\colon$, whereas ∶ RATIO U+2236 should
  expand to TeX $:$.
  
  It is a kind of primitive input method, like using / for division slash
  and * for asterisk operator, and ratio is more frequent in math than the
  colon. (original TeX handled this by having different glyphs/glyph
  classes in math than TeX, Unicode-compliant TeX engines map them to the
  appropriate Unicode character).
 
 There are a number of other incompatibilities between original TeX and
 Unicode:
 
 For example, ASCII letters are in TeX math mode typeset in italics,
 but Unicode has a mathematical italics style, so ASCII letters should
 be typeset upright in a strict Unicode mode. And similar for Greek
 letters, I gather.
 
 If I try the code below in lualatex, then the 푩 and the 퐁 both come
 out typeset upright.

There is a “literal” mode in unicode-math package just for that, check
its manual for more details.

 Also, in the code there is an example where spacing produces a
 semantic difference: {A: B} is the set of all A satisfying the
 predicate B, whereas {A : B} is the set of the single element A : B.
 (It is more common to use | nowadays in the first case, but it is
 also used as an operator.)

There is also an option to control colon vs. ratio behaviour, but this
is getting off-topic IMO.

Regards,
 Khaled

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-10 Thread Khaled Hosny

They are spaced differently. Attached how they are rendered by TeX,
using its default spacing rules, the first is the ratio (which is spaced
as a relational symbol) and the second is the colon (which is spaced as
punctuation mark), both in math mode, and the last one is the colon in
text mode.

On Tue, Jul 10, 2012 at 04:22:06PM -0700, Mark Davis ☕ wrote:
 I would disagree about the preference for ratio; I think it is a historical
 accident in Unicode. 
 
 What people use and have used for ratio is simply a colon. One writes 3:5, and
 I doubt that there was a well-established visual difference that demanded a
 separate code for it, so someone would need to write 3∶5 instead.
 
 Mark
 
 — Il meglio è l’inimico del bene —
 
 
 
 On Tue, Jul 10, 2012 at 3:22 PM, Asmus Freytag asm...@ix.netcom.com wrote:
 
 U+2236 RATIO
 * Used in preference to 003A : to denote division or scale
 
 
attachment: texput.png

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-10 Thread Khaled Hosny

It can be handled at a different level; when one types 3:5 in a
Unicode-complient TeX engine, what gets output to the output file is the
ratio not the colon, and colon gets output with 3\colon{}5.

Regards,
 Khaled

On Tue, Jul 10, 2012 at 06:00:24PM -0700, Mark Davis ☕ wrote:
 That is, they may be spaced differently (depending on the font and
 environment).
 
 I'm not against pointing to RATIO for specific math contexts, but to tell Joe
 Smith that he should be using a different character to say that the ratio of
 gravel to sand should be 3:1 is artificial and pointless.
 
 ━━━
 Mark
 
 — Il meglio è l’inimico del bene —
 
 
 
 On Tue, Jul 10, 2012 at 5:51 PM, Khaled Hosny khaledho...@eglug.org wrote:
 
 They are spaced differently. Attached how they are rendered by TeX,
 using its default spacing rules, the first is the ratio (which is spaced
 as a relational symbol) and the second is the colon (which is spaced as
 punctuation mark), both in math mode, and the last one is the colon in
 text mode.
 
 On Tue, Jul 10, 2012 at 04:22:06PM -0700, Mark Davis ☕ wrote:
  I would disagree about the preference for ratio; I think it is a
 historical
  accident in Unicode. 
 
  What people use and have used for ratio is simply a colon. One writes
 3:5, and
  I doubt that there was a well-established visual difference that 
 demanded
 a
  separate code for it, so someone would need to write 3∶5 instead.
 
  Mark
 
  — Il meglio è l’inimico del bene —
 
 
 
  On Tue, Jul 10, 2012 at 3:22 PM, Asmus Freytag asm...@ix.netcom.com
 wrote:
 
      U+2236 RATIO
      * Used in preference to 003A : to denote division or scale

Re: Too narrowly defined: DIVISION SIGN COLON

2012-07-10 Thread Khaled Hosny

On Wed, Jul 11, 2012 at 03:39:23AM +0200, Philippe Verdy wrote:
 (Unfortunately it's still almost impossible to determine how browsers
 are selecting fonts and which fonts get finally used to render text in
 their tricky code,

Firefox has an addon for that:
https://addons.mozilla.org/en-US/firefox/addon/fontinfo/

Regards,
 Khaled

Re: [OT] Flag coding (was: Re: Tags and future new technologies [...])

2012-06-02 Thread Khaled Hosny

On Sat, Jun 02, 2012 at 11:22:12AM +0200, Philippe Verdy wrote:
 2012/6/2 William_J_G Overington wjgo_10...@btinternet.com:
  An interesting spin-off could be that the introduction of such an encoding 
  could lead to the introduction of chromatic font technology by industry.
 
 I've been waiting for long for fonts embedding colorful glyphs (that
 also contain enough information for rendering the embedded colors with
 monochromatic patterns, also encoded in the font as a property of its
 internal colormap).
 
 Such thing is still not in OpenType, but it DOES exist in other font
 technologies (e.g. in SVG fonts, even though this is still an
 unfinished standard that does not meet the technical quality observed
 in OpentType, but that DOES use a much simpler and coherent design
 than the many incoherent tricks and deprecated items found in the
 OpenType family, including for such basic things such as metrics data
 which are a nightmare to make compatible).

https://wiki.mozilla.org/SVGOpenTypeFonts

Regards,
 Khaled

Re: Variant glyphs for mathematical symbols

2012-05-07 Thread Khaled Hosny

On Sun, May 06, 2012 at 06:36:36PM -0700, Asmus Freytag wrote:
 First question:
 
 When the integral symbols were encoded in Unicode there was
 discussion of the fact that these were deliberately unifying an
 upright and a slanted style of integral.
 
 Now, I'm pretty sure that I've seen both styles in print at
 some point, but I can't seem to find any TrueType or OpenType
 fonts that support the slanted style. Or, I may just not know
 where to look.

 Is this style still in use anywhere, and do people make or maintain
 fonts for it?

Latin Modern Math font has slanted integrals:
http://www.gust.org.pl/projects/e-foundry/lm-math

XITS Math have default slanted integrals as well as optional upright
ones:
https://github.com/khaledhosny/xits-math

Both fonts use the new OpenType MATH table and thus need an application
that support it for proper math typesetting, namely MS Office 2007+,
XeTeX and LuaTeX.

STIX fonts also provide both:
http://stixfonts.org/

 Second question:
 
 When the mathematical relations were encoded there were
 variants that were unified where the sole difference was
 something subtle like a slant of one of the lines.
 
 However, these variants were also given Standardized
 Variation Sequences. Are there any fonts that contain
 glyphs for these variant forms? Either as replacement for
 the more typical forms, or as alternate glyphs?
 
 Again, I may simply not know where to look.

XITS Math supports the mathematical variants using variation sequences
that are listed here:
http://unicode.org/reports/tr25/tr25-9.html#_Toc218

 PS: should these symbols exist in non-Truetype fonts
 I'd be interested in pointers as well, but preferably
 from someone who would know how to convert
 them into TrueType format.

Many TeX math fonts have slanted integrals.

Regards,
 Khaled

Re: U+2018 is not RIGHT HIGH 6

2012-05-02 Thread Khaled Hosny

On Wed, May 02, 2012 at 08:04:01AM -0700, Doug Ewell wrote:
 Certain font designers have made these directional for decades, leading
 to the hideous ``convention'' which some people seem to love, but which
 is a classic example of abusing character encoding to achieve
 typographical results.

This stems from TeX’s and its Computer Modern fonts, AFAIK, which are
older than history...

Regards,
 Khaled

Re: Joining Arabic Letters

2012-04-07 Thread Khaled Hosny

On Sat, Apr 07, 2012 at 08:50:18PM +0200, Escape Landsome wrote:
  - the browser, including version
 
 Mozilla Firefox 9.0.1

There was a bug in Firefox 9 causing the behaviour you described, it
have been fixed in Firefox 10:
https://bugzilla.mozilla.org/show_bug.cgi?id=714067

Regards,
 Khaled

Re: Joining Arabic Letters

2012-03-31 Thread Khaled Hosny

On Sat, Mar 31, 2012 at 08:55:28AM +0200, Philippe Verdy wrote:
 For now I've not seen any existing Arabic font that exhibit the
 correct normative joining behavior for these letters such as  U+063D
 (the Farsi Yeh with an inverted v above, which is dual-joining like
 the Farsi Yeh at U+06CC without the inverted v above, and in the same
 joining group; those fonts only map a single non-joining glyph for
 U+063D, but behave correctly for U+06CC). This is true even for all
 Arabic fonts shipped with Windows 7.

Check my free Amiri font (http://amirifont.org), it has full Unicode 6.0
Arabic coverage, with 6.1 additions under the way. But if you are using
a layout engine that predates the addition of that character into
Unicode, even a good font will not help here since the engine will be
using the older Unicode character database where the joining behaviour
of this letter is undefined.

Regards,
 Khaled

Re: Joining Arabic Letters

2012-03-31 Thread Khaled Hosny

On Fri, Mar 30, 2012 at 07:37:53PM +0200, Andreas Prilop wrote:
 I come back to
  http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/thread.html#11
 
 A similar problem of showing non-joining, isolated Arabic glyphs
 can be seen in the attached file. Both Internet Explorer 8 and
 MS Word 2010 display isolated glyphs in some cases.
 
 I think a better idea is to have joining glyphs always even for
 different typefaces. At least the Unicode Standard should say
 what should happen when Arabic characters of different typefaces
 follow each other.

OpenOffice/LibreOffice work around this by conditionally inserting ZWJ
when there is a font switch in the middle of the word and joining is
desired.

Regards,
 Khaled

Re: Combining Triple Diacritics (N3915) not accepted by UTC #125

2010-11-10 Thread Khaled Hosny

On Wed, Nov 10, 2010 at 06:11:08PM +0100, Karl Pentzlin wrote:
 From the Pre-Preliminary minutes of UTC #125 (L2/10-416):
 
  C.4 Preliminary Proposal to enable the use of Combining Triple
  Diacritics in Plain Text (WG2 N3915) [Pentzlin, L2/10-353]
   - see http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3915.pdf
 
  [125-A13] ... UTC does not believe that either solution A or solution B
  represents an appropriate encoding solution for the text
  representation problem shown in this document. Appropriate
  technology involving markup should be applied to the problem of
  representation of text at this level.
 
 This will not happen.
 Linguists will continue to use their PUA code points (or even their
 8-bit fonts), which employ these characters perfectly (albeit using
 precomposed glyphs for the used combinations).

Advanced typesetting engines like TeX (which were invented 30 years ago,
mind you) already support wide accents that span multiple characters:

$\widehat{abcd}$
$\widetilde{abcd}$
\bye

Even math formulas in new MS Office versions can do that (well it is
math because, apparently, only mathematicians cared about that, but I
don't see why it should not work for linguists too).

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: Combining Triple Diacritics (N3915) not accepted by UTC #125

2010-11-10 Thread Khaled Hosny

Or the other way around...

On Thu, Nov 11, 2010 at 08:53:49AM +0200, Klaas Ruppel wrote:
 Typographic solutions (as established they ever may be) do not solve encoding
 matters.
 
 Best regards,
 __
 Klaas Ruppel   www.kotus.fi/?l=ens=1
 Kotus  www.kotus.fi
 Fociswww.focis.fi
 Tel. +358 207 813 278  Fax +358 207 813 219
 
 
 Khaled Hosny kirjoitti 10.11.2010 kello 20.03:
 
 
 On Wed, Nov 10, 2010 at 06:11:08PM +0100, Karl Pentzlin wrote:
 
 From the Pre-Preliminary minutes of UTC #125 (L2/10-416):
 
 
 
 C.4 Preliminary Proposal to enable the use of Combining Triple
 
 Diacritics in Plain Text (WG2 N3915) [Pentzlin, L2/10-353]
 
  - see http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3915.pdf
 
 
 
 [125-A13] ... UTC does not believe that either solution A or
 solution B
 
 represents an appropriate encoding solution for the text
 
 representation problem shown in this document. Appropriate
 
 technology involving markup should be applied to the problem of
 
 representation of text at this level.
 
 
 
 This will not happen.
 
 Linguists will continue to use their PUA code points (or even their
 
 8-bit fonts), which employ these characters perfectly (albeit using
 
 precomposed glyphs for the used combinations).
 
 
 Advanced typesetting engines like TeX (which were invented 30 years ago,
 mind you) already support wide accents that span multiple characters:
 
 $\widehat{abcd}$
 $\widetilde{abcd}$
 \bye
 
 Even math formulas in new MS Office versions can do that (well it is
 math because, apparently, only mathematicians cared about that, but I
 don't see why it should not work for linguists too).
 
 Regards,
 Khaled
 
 --
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer
 
 

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: A simpler definition of the Bidi Algorithm

2010-09-10 Thread Khaled Hosny

On Fri, Sep 10, 2010 at 05:00:21PM -0700, Asmus Freytag wrote:
 PS: Personally, I don't find the presentation in terms of the
 regular expressions any more intuitive than the original.

Some people, when confronted with a problem, think ‟I know,
I'll use regular expressions.” Now they have two problems.
 —Jamie Zawinski

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: High dot/dot above punctuation?

2010-07-29 Thread Khaled Hosny

On Wed, Jul 28, 2010 at 11:37:28AM -0700, Asmus Freytag wrote:
 On 7/28/2010 10:09 AM, Murray Sargent wrote:
 Contextual rendering is getting to be more common thanks to
 adoption of OpenType features. For example, both MS Publisher 2010
 and MS Word 2010 support various contextually dependent OpenType
 features at the user's discretion. The choice of glyph for U+002E
 could be chosen according to an OpenType style.
 I know that the technology exists that (in principle) can overcome
 an early limitation of 1:1 relation between characters and glyphs in
 a single font. I also know that this technology has been implemented
 for certain (but not all) types of mappings that are not 1:1.
 It's worth remembering that plain text is a format that was introduced due 
 to the limitations of early computers. Books have always been rendered with 
 at least some degree of rich text. And due to the complexity of Unicode, 
 even Unicode plain text often needs to be rendered with more than one font.
 However, the question I raised here is whether such mechanisms have
 been implemented to date for FULL STOP. Which implementation makes
 the required context analysis to determine whether 002E is part of a
 number during layout? If it does make this determination, which
 OpenType feature does it invoke? Which font supports this particular
 OpenType feature?

I have few fonts where I implemented a 'locl' OpenType feature that maps
European to Arabic digits, and contextual substitution feature that
replaces the dot with Arabic decimal separator when it comes between two
Arabic numbers, so I think it is doable.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: High dot/dot above punctuation?

2010-07-29 Thread Khaled Hosny

On Thu, Jul 29, 2010 at 10:01:37AM +0200, Kent Karlsson wrote:
 
 Den 2010-07-29 08.47, skrev Khaled Hosny khaledho...@eglug.org:
 
  I have few fonts where I implemented a 'locl' OpenType feature that maps
  European to Arabic digits, and contextual substitution feature that
  replaces the dot with Arabic decimal separator when it comes between two
  Arabic numbers, so I think it is doable.
 
 Doable is not the same thing as a good idea. Your example here is one of the
 not-at-all-good ideas.

This was done of a GUI font, the main aim is to have Arabic numbers in
Arabic contexts and vice versa, since the numbers here are generated on
the fly like dates, percentages etc. it is not possible (or even
desirable) to change the input. Also, I don't buy in Unicode idea of
encoding different sets of decimal digits separately, they are all
different graphical presentations of the same thing.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: Pashto yeh characters

2010-07-28 Thread Khaled Hosny

On Wed, Jul 28, 2010 at 04:33:12PM +0200, Andreas Prilop wrote:
 On Tue, 27 Jul 2010, Khaled Hosny wrote:
 
  it just happen not to get in those two positions
  in modern orthography, but it can be seen in Quran
  which is still written in the old, early Islamic orthography.
 
 If you argue with archaic spelling, then ð and þ are English letters.

Except we are talking about a letter that is still in contemporary use,
just not occurring at certain positions of the word.

 | http://www.unicode.org/mail-arch/unicode-ml/y2010-m07/att-0295/01-U_0649.jpg
 | http://www.unicode.org/mail-arch/unicode-ml/y2010-m07/att-0295/01-U_0649.jpg
 
 According to Grammatik des klassischen Arabisch by Wolfdietrich Fischer,
 page 9, the ya is written two dots in such cases, too.

Except that this is not a Yaa and not pronounced like a Yaa, it is an
Alef (note the small dagger Alef above it).

 I doubt such questions can be solved with reference to the Quran,
 which originally had no dots at all.

Those are two scans from contemporary prints of Quran, where regular Yaa
have dots.

Just because Uyghur is still following the old orthography of placing
Alef Maqsura in the middle of the word, doesn't suddenly make it a no
Arabic character.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: Pashto yeh characters

2010-07-28 Thread Khaled Hosny

On Wed, Jul 28, 2010 at 05:32:21PM +0200, Andreas Prilop wrote:
 On Tue, 27 Jul 2010, Khaled Hosny wrote:
 
  According to Grammatik des klassischen Arabisch by Wolfdietrich Fischer,
  page 9, the ya is written two dots in such cases, too.
 
  Except that this is not a Yaa and not pronounced like a Yaa, it is an
  Alef (note the small dagger Alef above it).
 
 That is exactly what I meant and exactly what is written in W. Fischer.
 
 My point is that there are two dots below.

No, there aren't, at least in orthographies that differentiate between
Yaa and Alef Maqsura by dots.

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: Pashto yeh characters

2010-07-27 Thread Khaled Hosny

On Tue, Jul 27, 2010 at 06:43:19PM +0200, Andreas Prilop wrote:

[...]

 U+0649 has (should have) four glyphs without any dots. This is no
 Arabic letter, but an Uighur letter. Therefore you should not use
 U+0649 for Arabic, Persian, Pashto, Urdu but only U+06CC.

I'm not sure what is the bases of this conclusion, but U+0649 have no
dots in its initial/medial forms in Arabic too, it just happen not to
get in those two positions in modern orthography, but it can be seen in
Quran which is still written in the old, early Islamic orthography.

See the attached image showing the words فسوىهن and ميكىل.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer
attachment: U+0649.jpg

Re: Generic Base Letter

2010-06-29 Thread Khaled Hosny

On Tue, Jun 29, 2010 at 09:41:58PM -0700, Michael S. Kaplan wrote:
 Speaking as an MS employee who has seen how easy it is to put arbitrary
 combining marks on scripts like Latin and Cyrillic that don't look very
 good if the font has neither combined form glyphs or knowledge of
 attachment points, it may be the case that some of these situations that
 don't look good have more to do with the fact that making it look good
 typographically when no one put in the effort for the specific case may be
 simply the price one pays.

In case of Arabic and Hebrew, Uniscribe inserts the dotted circle
between, what it considers, invalid mark combination before doing any
OpenType layout, so it is impossible for a font designer to support such
combinations, simply because what the layout engine sees is
markdotted circlemark combination, so the markmark layout
feature even if present in the font will never triggered.

It is possible to hack around this by treating markdotted circlemark
sequences as markmark in the layout code, assuming nobody will ever
insert the dotted circle manually, but this won't work for Arabic since
the dotted circle also breaks Arabic shaping and getting around this
will be much harder.

Given how many times I've seen font designer trying to do advanced
Arabic and Hebrew OpenType fonts frustrated by this, I think it is time
for MS to review its decision here. Even if dropping invalid marks
special handling for various reasons, at least allowing the font
designers to tell Uniscribe please let me handle the valid and invalid
marks myself, I really know what I'm doing would be very helpful.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: Generic Base Letter

2010-06-28 Thread Khaled Hosny

On Sun, Jun 27, 2010 at 10:00:18PM -0700, Asmus Freytag wrote:
 The one argument that I find convincing is that too many
 implementations seem set to disallow generic combination, relying
 instead on fixed tables of known/permissible combinations.

Only if you consider Microsoft too many, AFAIK, only Microsoft's
Uniscribe does such, stupid in my opinion, behaviour.

 In that situation, a formally adopted character with the clearly
 stated semantic of is expected to actually render with ANY
 combining mark from ANY script would have an advantage. List-based
 implementations would then know that this character is expected to
 be added to the rendering tables for all marks of all scripts.
 
 Until and unless that is done, it couldn't be used successfully in
 those environments, but if the proposers could get buy-in from a
 critical mass of vendors of such implementations, this problem could
 be overcome.
 
 Without such a buy-in, by the way, I would be extremely wary of such
 a proposal, because the table-based nature of these implementations
 would prohibit the use of this new character in the intended way.

There are so many issues with MS implementation(s), for example you can not
combine any arbitrary Arabic diacritical marks on any given base
character. I don't think Unicode need to invent workaround broken vendor
implementations, interested parties should instead pressure on that
vendor to fix its implementation(s).

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

Re: Generic Base Letter

2010-06-28 Thread Khaled Hosny

On Mon, Jun 28, 2010 at 03:47:40PM +, Murray Sargent wrote:
 Khaled notes: There are so many issues with MS implementation(s), for 
 example you can not combine any arbitrary Arabic diacritical marks on any 
 given base character. I don't think Unicode need to invent workaround broken 
 vendor implementations, interested parties should instead pressure on that 
 vendor to fix its implementation(s).
 
 The MS Office math facility allows combining marks in the range 
 U+0300..U+036F and most in the range U+20D0..U+20F0 to be applied to any base 
 character(s) including complicated mathematical expressions. Such generality 
 is needed in mathematics, since tildes, hats, bars, etc., are displayed over 
 multiple base characters such as the expression a+b. Hebrew and Arabic 
 combining marks aren't currently treated as valid mathematical combining 
 marks, so the sequence U+25CC U+05BC U+05B8 doesn't render as Vincent desires 
 in a math zone. It seems reasonable to allow all Unicode combining marks as 
 accents in math zones.

That would be nice, but we were talking about combining marks in normal,
non-math, text. For example, it is now common practice to use two
consecutive Fatha/Damma/Kasra for a certain form of Arabic tanwin used
in Koran, however Uniscribe won't allow this and will always insert a
dotted circle between the two marks. I know this behaviour is
documented, but I fail to see the rationale behind it. Generally
speaking, doing script spell checking in the rendering engine is a
lousy idea IMO.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer

69 matches

Mail list logo