Re: Detecting UTF-8 Locale Question
My initial plan for finding out about the current locale is that the program will, at start up, look at the LC_CTYPE environment variable. If that variable is defined and contains the substring UTF-8 or regex-able variants thereof (like utf8 on Linux), then everything is fine. If not present, the program prints a warning message to the user suggesting they set the locale to a UTF-8 locale and provides an example of how to do that. If the locale is not set properly, the program still functions, but of course any UTF-8 encoded data will not be displayed properly on the terminal. (Of course, even if a locale *is* set to a UTF-8 locale, it doesn't guarantee that UTF-8 data will be displayed properly because (1) glyphs still may not be available in the fonts on the system (2) the terminal may not handle the script properly (i.e., when I last checked, xterm didn't handle Indic or RTL scripts)). I haven't much exp. with UNIX but for Multilingual apps this approach will be work without any problem. Fortunately in web env. (where most of time i worked) browser support UTF-8 all we need to do is to detect user lang. pref and render data in utf-8. But i believe above approach will work fine except limits you mentioned already about fonts and capabilities of user client (xterminals). Asif
Re: Unicode Arabic Rendering Problem
It seems to be encoding problem, as mentioned by tang. If you choose Microsoft Sans Serif (or tahoma) fonts and encoding suggested by tang it will work fine. Asif At 10:05 AM 2/28/2003 -0800, Mete Kural wrote: Hello Folks, I wanted to ask a question to those of you who have Unicode Arabic knowledge. We have this website http://www.quranreader.org where we are trying to display the text of the Quran with accurately encoded Unicode text rather than the traditional images. Some of the characters in the Quran aren't rendered correctly. We are letting the browser to use its default Unicode font on the website, which is Times New Roman Unicode for the newer versions of Internet Explorer I think. If we used a high-quality Unicode font for Arabic, would this solve the problem? Or is this a bigger problem that has to do with the rendering engine provided by the operating system? I would like to give you an example. In Arabic when you have a Lam And Alef together, it is rendered in a unique way instead of the regular rendering for these letters that kind of looks like this: \ / \/ /\ \/ Figure 1 In the Quran, there is sometimes this combination of characters: Lam-Hamza-Alif In such a case, the Lam and Alif are still rendered the way they would be had there not been a hamza inbetween, and the hamza is simply put above the alef and lam in the middle which looks kind of like this: c \ / \/ /\ \/ Figure 2 Note that this is different than the case as illustrated in Figure 3 where the hamza is directly above the alef and not in between lam and alef. c \ / \/ /\ \/ Figure 3 So there is a subtle difference that the hamza is not directly above the alef but rather in between the alef and the lam. I am attaching a small gif file named Sample.gif that will demostrate the subtle difference of the positioning of the hamza. Attached are two words from the Quran. Look for the second word where the hamza is in between the alef and the lam instead of directly above the alef. When we encode this case with this combination of Unicode characters: 0644-0627-0621 in Internet Explorer, instead of showing it like Figure 2, it totally seperates all letters and shows it like this: | | | | | C \__/ which is totally wrong. Which one do you think is the problem here? 1) We are not encoding this combination of characters in the correct way. 2) This is a font-related problem. 3) This is a bigger problem for which the rendering engine on the operating system has to be modified. Thank you very very much, Mete Kural
Re: Unicode and Encoding Problems in Browsers
can it be problem with Uniscribe (USP10.dll) shiped with Windows2000 and IE? ie. may be characters appearing as rectangles are not supported in particualar version of Uniscribe. But then why Netscape is rendering these properly? Asif At 12:56 PM 2/7/2003 +, Shlomi Tal wrote: I'd like to mention that this problem which Muhammad Asif brings forth is an extant one in my circle of work. I work as PC technician, and one complaint I often get in tech support calls is that the user is unable to type Hebrew in the Search box in the MSN Israel website (msn.co.il) under Windows XP. At the first time, I told the user to set the Language for Non-Unicode Programs (known as the System Locale in Windows 2000, which sets the emulated ANSI codepage), but it didn't help: the user still complained of seeing boxes instead of proper Hebrew letters. The encoding of MSN.co.il is Hebrew (Windows). It doesn't happen under all machines. Mine at home runs XP too, but I don't have that problem. I suspect it's not related to Unicode/encodings stuff at all. The fact that it appears only under XP (and not 2000 or 98, for instance) leads me to believe it may have something to do with the Java VM (which is by default lacking in XP and updates browser components when installed). I hope that is of some enlightenment. ST _ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus
Re: Unicode and Encoding Problems in Browsers
Chris thx for yr reply. well, i have a letter that i wrote as label of text box and its very fine in IE, now i type same letter in text box but it appear as rectangle? And when i save the data submitted from form into database and retrieve it back its appearing very fine as label(though it was displayed as rectangle ). Problem is just when i type in text box? As you said this seems to be problem with font but why it appears correct as label then? Also i am using MS San Sarif which is mentioned on XP site to support Arabic/Urdu characters. Asif Glyphs appearing as rectangles usually means that the font being used to display the document does not contain the glyphs to display those characters (- or sometimes that there are errors in the glyph outlines which can prevent them from being properly rendered on some systems). This problem will often occur if the character set being used to display the web page is wrong (misinterpreted by the browser) - since the font may not contain glyphs for *that* character set. When a glyph outline for a particular character is not present in a font then the default glyph defined in that font (usually an empty rectangle) is displayed for that character. An OpenType font should contain glyphs for the nominal forms of all the Unicode characters it supports - and these nominal glyph forms should be mapped directly to the corresponding Unicode codepoints. Thus, even without Uniscribe, the nominal glyphs for those characters should be displayed by the basic font rendering system - though you won't get any of the contextual shaping relying on OpenType lookups which under Windows are handled by Uniscribe. - Chris
Re: Unicode and Encoding Problems in Browsers
Tal, I have checked it with window 2000 Server and NT, same problem lies there. I also checked it by installing jvm for windows but no use. On Win XP Professional i checked it with Netscapte 7.0 and it works fine. All the characters are displayed properly in text boxes. I will check it down on Win 98 too. Asif At 12:56 PM 2/7/2003 +, Shlomi Tal wrote: I'd like to mention that this problem which Muhammad Asif brings forth is an extant one in my circle of work. I work as PC technician, and one complaint I often get in tech support calls is that the user is unable to type Hebrew in the Search box in the MSN Israel website (msn.co.il) under Windows XP. At the first time, I told the user to set the Language for Non-Unicode Programs (known as the System Locale in Windows 2000, which sets the emulated ANSI codepage), but it didn't help: the user still complained of seeing boxes instead of proper Hebrew letters. The encoding of MSN.co.il is Hebrew (Windows). It doesn't happen under all machines. Mine at home runs XP too, but I don't have that problem. I suspect it's not related to Unicode/encodings stuff at all. The fact that it appears only under XP (and not 2000 or 98, for instance) leads me to believe it may have something to do with the Java VM (which is by default lacking in XP and updates browser components when installed). I hope that is of some enlightenment. ST
Unicode and Encoding Problems in Browsers
Hi, I actually want to enter some Arabic text in simple HTML text box . I set language in my Windows XP settings to Arabic. When i tried to type, there are certain characters that are not displayed. Instead rectangles are displayed. Characters are from Unicode BMP which is supposed to be supported by browsers. I am using IE 6.0 Also default encoding of browser is UTF-8, but if i change the encoding to default windows Western European it works fine and every character is got displayed. Problem in this is when form is submitted you did not get Unicode characters but their entity references in HTML. How can i make browser work to display unicode characters with UTF-8 encoding. So when form is submitted i get Unicode data to store in data base. Thanks a lot for your time. Asif