Re: Detecting UTF-8 Locale Question

2003-03-25 Thread Muhammad Asif
My initial plan for finding out about the current locale is that the
program will, at start up, look at the LC_CTYPE environment variable.  If
that variable is defined and contains the substring UTF-8 or regex-able
variants thereof (like utf8 on Linux), then everything is fine.  If not
present, the program prints a warning message to the user suggesting they
set the locale to a UTF-8 locale and provides an example of how to do
that.  If the locale is not set properly, the program still functions, but
of course any UTF-8 encoded data will not be displayed properly on the
terminal.
(Of course, even if a locale *is* set to a UTF-8 locale, it doesn't
guarantee that UTF-8 data will be displayed properly because (1) glyphs
still may not be available in the fonts on the system (2) the terminal may
not handle the script properly (i.e., when I last checked, xterm didn't
handle Indic or RTL scripts)).
I haven't much exp. with UNIX but for Multilingual apps this approach will 
be work without any problem. Fortunately in web env. (where most of time i 
worked) browser support UTF-8 all we need to do is to detect user lang. 
pref and render data in utf-8. But i believe above approach will work fine 
except limits you mentioned already about fonts and capabilities of user 
client (xterminals).

Asif




Re: Unicode Arabic Rendering Problem

2003-02-28 Thread Muhammad Asif
It seems to be encoding problem, as mentioned by tang. If you choose 
Microsoft Sans Serif (or tahoma) fonts and encoding suggested by tang it 
will work fine.

Asif



At 10:05 AM 2/28/2003 -0800, Mete Kural wrote:
Hello Folks,

I wanted to ask a question to those of you who have
Unicode Arabic knowledge. We have this website
http://www.quranreader.org where we are trying to
display the text of the Quran with accurately encoded
Unicode text rather than the traditional images. Some
of the characters in the Quran aren't rendered
correctly. We are letting the browser to use its
default Unicode font on the website, which is Times
New Roman Unicode for the newer versions of Internet
Explorer I think. If we used a high-quality Unicode
font for Arabic, would this solve the problem? Or is
this a bigger problem that has to do with the
rendering engine provided by the operating system?
I would like to give you an example. In Arabic when
you have a Lam And Alef together, it is rendered in a
unique way instead of the regular rendering for these
letters that kind of looks like this:
 \  /
  \/
  /\
  \/
Figure 1
In the Quran, there is sometimes this combination of
characters: Lam-Hamza-Alif
In such a case, the Lam and Alif are still rendered
the way they would be had there not been a hamza
inbetween, and the hamza is simply put above the alef
and lam in the middle which looks kind of like this:
  c
 \  /
  \/
  /\
  \/
Figure 2
Note that this is different than the case as
illustrated in Figure 3 where the hamza is directly
above the alef and not in between lam and alef.
c
 \  /
  \/
  /\
  \/
Figure 3
So there is a subtle difference that the hamza is not
directly above the alef but rather in between the alef
and the lam. I am attaching a small gif file named
Sample.gif that will demostrate the subtle
difference of the positioning of the hamza. Attached
are two words from the Quran. Look for the second word
where the hamza is in between the alef and the lam
instead of directly above the alef.
When we encode this case with this combination of
Unicode characters: 0644-0627-0621
in Internet Explorer, instead of showing it like
Figure 2, it totally seperates all letters and shows
it like this:
|  |
|  |
| C \__/
which is totally wrong.

Which one do you think is the problem here?

1) We are not encoding this combination of characters
in the correct way.
2) This is a font-related problem.
3) This is a bigger problem for which the rendering
engine on the operating system has to be modified.
Thank you very very much,
Mete Kural





Re: Unicode and Encoding Problems in Browsers

2003-02-20 Thread Muhammad Asif

can it be problem with Uniscribe (USP10.dll) shiped with Windows2000 and 
IE? ie. may be characters appearing as rectangles are not supported in 
particualar version of Uniscribe.

But then why Netscape is rendering these properly?

Asif

At 12:56 PM 2/7/2003 +, Shlomi Tal wrote:
I'd like to mention that this problem which Muhammad Asif brings forth is 
an extant one in my circle of work. I work as PC technician, and one 
complaint I often get in tech support calls is that the user is unable to 
type Hebrew in the Search box in the MSN Israel website (msn.co.il) under 
Windows XP. At the first time, I told the user to set the Language for 
Non-Unicode Programs (known as the System Locale in Windows 2000, which 
sets the emulated ANSI codepage), but it didn't help: the user still 
complained of seeing boxes instead of proper Hebrew letters. The encoding 
of MSN.co.il is Hebrew (Windows).

It doesn't happen under all machines. Mine at home runs XP too, but I 
don't have that problem. I suspect it's not related to Unicode/encodings 
stuff at all. The fact that it appears only under XP (and not 2000 or 98, 
for instance) leads me to believe it may have something to do with the 
Java VM (which is by default lacking in XP and updates browser components 
when installed).

I hope that is of some enlightenment.

ST

_
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. 
http://join.msn.com/?page=features/virus







Re: Unicode and Encoding Problems in Browsers

2003-02-20 Thread Muhammad Asif
Chris   thx for yr reply.
well, i have a letter that i wrote as label of text box and its very fine 
in IE, now i type same letter in text box but it appear as rectangle? And 
when i save the data submitted from form into database and retrieve it back 
its appearing very fine as label(though it was displayed as rectangle ). 
Problem is just when i type in text box? As you said this seems to be 
problem with font but why it appears correct as label then?

Also i am using MS San Sarif which is mentioned on XP site to support 
Arabic/Urdu characters.

Asif



Glyphs appearing as rectangles usually means that the font being used to 
display the document does not contain the glyphs to display those 
characters (- or sometimes that there are errors in the glyph outlines 
which can prevent them from being properly rendered on some systems).

This problem will often occur if the character set being used to display 
the web page is wrong (misinterpreted by the browser) - since the font may 
not contain glyphs for *that* character set. When a glyph outline for a 
particular character is not present in a font then the default glyph 
defined in that font (usually an empty rectangle) is displayed for that 
character.

An OpenType font should contain glyphs for the nominal forms of all the 
Unicode characters it supports - and these nominal glyph forms should be 
mapped directly to the corresponding Unicode codepoints. Thus, even 
without Uniscribe, the nominal glyphs for those characters should be 
displayed by the basic font rendering system - though you won't get any of 
the contextual shaping relying on OpenType lookups which under Windows are 
handled by Uniscribe.

- Chris





Re: Unicode and Encoding Problems in Browsers

2003-02-09 Thread Muhammad Asif
Tal,

I have checked it with window 2000 Server and NT, same problem lies there.
I also checked it by installing jvm for windows but no use.

On Win XP Professional i checked it with Netscapte 7.0 and it works fine. 
All the characters are displayed properly in text boxes.

I will check it down on Win 98 too.


Asif

At 12:56 PM 2/7/2003 +, Shlomi Tal wrote:
I'd like to mention that this problem which Muhammad Asif brings forth is 
an extant one in my circle of work. I work as PC technician, and one 
complaint I often get in tech support calls is that the user is unable to 
type Hebrew in the Search box in the MSN Israel website (msn.co.il) under 
Windows XP. At the first time, I told the user to set the Language for 
Non-Unicode Programs (known as the System Locale in Windows 2000, which 
sets the emulated ANSI codepage), but it didn't help: the user still 
complained of seeing boxes instead of proper Hebrew letters. The encoding 
of MSN.co.il is Hebrew (Windows).

It doesn't happen under all machines. Mine at home runs XP too, but I 
don't have that problem. I suspect it's not related to Unicode/encodings 
stuff at all. The fact that it appears only under XP (and not 2000 or 98, 
for instance) leads me to believe it may have something to do with the 
Java VM (which is by default lacking in XP and updates browser components 
when installed).

I hope that is of some enlightenment.

ST





Unicode and Encoding Problems in Browsers

2003-02-07 Thread Muhammad Asif
Hi,
I actually want to enter some Arabic text in simple HTML text box . I set 
language in my Windows XP settings to Arabic. When i tried to type, there 
are certain characters that are not displayed. Instead rectangles are 
displayed. Characters are from Unicode BMP which is supposed to be 
supported by browsers. I am using IE 6.0

Also default encoding of browser is UTF-8, but if i change the encoding to 
default windows Western European it works fine and every character is got 
displayed. Problem in this is when form is submitted you did not get 
Unicode characters but their entity references in HTML.

How can i make browser work to display unicode characters with UTF-8 
encoding. So when form is submitted i get Unicode data to store in data base.

Thanks a lot for your time.

Asif