Re: [2] Arabic Translation

Torsten Schlabach Fri, 23 Feb 2007 04:16:02 -0800

Hi Bashir and Amine,

> I think using Resource Bundle Editor plug in will solve the problem as
> Amine pointed.

Did it solve the problem? I have done some experiments last night andgot some interesting insights into details. So please allow me to sharewhat I found out, hopefully for the benefit of anyone who might run intothis.


First of all, the short answer:

As we all know, Java strings are Unicode internally. (Note that Unicode!= UTF-8. UTF-8 is one possible representation of Unicode among others.)When you read a properties files which contains \uXXXX code in it, youcan be quite sure you will have valid Unicode characters in your stringin memory. So I doubt that the problem is anywhere on the input side.

But in order to render stuff properly in the browser, you need to makesure that you set the appropriate encoding to the servlet response. Youusually do it like that:


        protected void doGet(HttpServletRequest request,
                        HttpServletResponse response) throws ServletException, 
IOException {

                response.setContentType("text/html; charset=UTF-8");
                
                PrintWriter pw = response.getWriter();

This is not the same as just outputting

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

from your servlet code, because Java knows that it has Unicode stringsin memory and unless you set the encoding of your output stream whichwrites to the browser properly, Java is intelligent to know that itcannot render certain unicode ranges (like Chineese or Arabic) to anoutput device that it thinks is Latin-1 (ISO-8859-1), so it will justput a question mark '?' in there for those characters it thinks itcannot print.


Well, the problem you showed in your screenshot at

http://www.nabble.com/file/6705/ArabicLabel.JPG

is different.

These characters are what you can expect to see if you render an UTF-8encoded string to a Latin-1 based device without Java knowing that thestring is UTF-8.

So I was able to reproduce that pattern when I wrote my properties filedirectly in UTF-8 (not using \uXXXX escape codes but real UTF-8). Inthat case, Java expected the properties file to be in Latin-1, which isin line with the spec and thus did *not* build a proper Unicode stringwith Arabic characters in memory.


Take a look at

https://java.sun.com/j2se/1.4.2/docs/api/java/util/Properties.html

> When saving properties to a stream or loading them from a stream, the
> ISO 8859-1 character encoding is used. For characters that cannot be
> directly represented in this encoding, Unicode escapes  are used;
> however, only a single 'u' character is allowed in an escape sequence.
> The native2ascii tool can be used to convert property files to and
> from other character encodings.

In other words:

The kind of pattern you see in your screenshot can only come from someUTF-8 (not \uXXXX escape sequences) being misread as Latin-1 ASCII andthen passed on. So did either you or that RBE plugin maybe save /convert your properties file?

Out the output side, this cannot happen, even if you mess up thecharacter coding somewhere in the pipeline, as Java will either renderit properly or use questions marks.

But to be on the safe side and make sure that OFBiz will be setting theServlet response encoding to UTF-8, you probably need to set a parameterin some web.xml files.


Take a look at the code here, OFBiz does something quite interesting there:

http://svn.apache.org/repos/asf/ofbiz/trunk/framework/webapp/src/org/ofbiz/webapp/control/ControlServlet.java

// setup DEFAULT chararcter encoding and content type, this will beoverridden in the RequestHandler for view rendering

        String charset = getServletContext().getInitParameter("charset");

if (charset == null || charset.length() == 0) charset =request.getCharacterEncoding();

        if (charset == null || charset.length() == 0) charset = "UTF-8";

Debug.logInfo("The character encoding of the request is: [" +request.getCharacterEncoding() + "]. The character encoding we will usefor the request and response is: [" + charset + "]", module);


In plain text:

- If the charset servlet init parameter is set: use that one. (It's*not* set by default. You can set it in web.xml.)- If it's not set, use the request's encoding. (This can be the trap! Iam not sure what encoding a browser would use for sending the request!)

- If neither one's the case, set a default of UTF-8.

It would make sense to watch the debug log output on your system as it'sgoing to tell you what encoding it uses, but I'd bet it is using UTF-8for one or the other reason. Just you take out some uncertainty byexplicitely forcing UTF-8 through that init-parameter. You can never besure what a user's browser does.

I will send you my sandbox standalone servlet which demonstrates some ofthe issues mentioned here. I cannot attach it to the mail as the listdoes not seem to allow this.


Hope this helps.

Note once again, this has nothing to do yet with RTL (right-to-left)support as well as transcoding / transliteration. Has a Jira issue beencreated on that one yet. If so, I'd be happy to contibute my 2 cents toit. I can just warn from experience with an app: RTL is !=transliteration / transcoding and both is far from trivial. Nottechnically, just to decide *what* you want to do. But that's a wholeseparate email.


Have a nice weekend. We're close to it yet.

Regards,
Torsten

Bashir Alfetori schrieb:

Trosten,

I think using Resource Bundle Editor plug in will solve the problem as Amine
pointed.

Regards,
Bashir


Torsten Schlabach-2 wrote:

Ok, so the problem seems to be somewhere else.
Let me see if I will find some time on that subject tonight.
Regards,
Torsten

Bashir Alfetori schrieb:

Hi, Torsten!

I am using Mozilla Firefox. I tried with Internet Explorer and the same
problem occurred.

The meta tag is set properly as you stated. here is part of the source
code
of the page

<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
   <title>OFBiz: Accounting Manager: Edit Agreement</title>
   <script language="javascript" src="/images/calendar1.js"
type="text/javascript"></script>
   <script language="javascript" src="/images/selectall.js"
type="text/javascript"></script>

   <script language="javascript" src="/images/fieldlookup.js"
type="text/javascript"></script>
   <link rel="stylesheet" href="/images/maincss.css" type="text/css"/>
   <link rel="stylesheet" href="/images/tabstyles.css" type="text/css"/>

</head>

Best regards,Bashir




Torsten Schlabach-2 wrote:

Hi Bashir!

Forget about Windows-1256. This is not what you want.
What browser are you using? Internet Explorer or Mozilla? Would youbring trying the other one, as one step?
Here's what I'd check:
- Is the page delivered to the browser as HTML or XHTML? (You can checkusing the "show page source" option in your browser.)- In case it's XHTML and in case the browser is Internet Explorer, it'snot enough to have <?xml version="1.0" encoding="UTF-8"?>, but youshould either add an HTML meta tag like this:
<head>
...
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
...
</head>

or make sure the corresponding HTTP header is sent.
I am not that familiar with OFBiz internals to tell you how to ake thishappen, but I am sure other people on the list would be able to help.
But in order to find out if this would solve your problem or now, justdo this:
Save the page to the harddisk.
Add that <meta ...> tag manuelly to the <head> section of your HTML.
Reload the saved page from your harddisk into the browser.

I found IE wanted that extra info while Mozilla doesn't.

Regards,
Torsten


Bashir Alfetori schrieb:
Torstan,
As an example, here is a screen shot of how one label appears in the
Create
Agreement Screen.
http://www.nabble.com/file/6705/ArabicLabel.JPG
The Character encoding of the browser is set by default to
Unicode(UTF-8).
if changed to Arabic(Windows-1256) the second screen shot is obtained.
http://www.nabble.com/file/6706/ArabicLabel_Encoding_Windows-1256.JPG
Best regards,
Bashir



Bashir Alfetori wrote:
Torstan,

Arabic word:                             الرئيسية
ِAppears in the browser as:         Ø§Ù„Ø±Ø¦ÙŠØ³ÙŠØ©

The character encoding in the browser is set to Unicode (UTF-8)




Torsten Schlabach-2 wrote:
Bashir,

could you send a screenhot of how it looks like?

Regards,
Torsten

Bashir Alfetori schrieb:
Adrian

Yes, Arabic is right-to-left language. I have modified the existing
style
sheet to reverse the direction. I noticed that not every thing is ok
when
reversing direction especially in forms. Now, first I am concerned


about

displaying Arabic characters even if the direction is still
left-to-right.
Till now I couldn't do that as mentioned above.

Regards,Bashir




Adrian Crum wrote:

Bashir,

Is Arabic a right-to-left language? If yes, then you can either


modify

the
existing style sheets to reverse the direction or you can leave the
existingstyle sheets alone and cascade a "right-to-left" style sheet that
reverses
thedirection.
I have done some experiments with reversing the direction in OFBiz.

Let

me

knowif you need any help.


-Adrian


Bashir Alfetori wrote:

I am trying to start with translating OFBiz to Arabic. I tried one
properties
file. At first I was not able to enter Arabic characters into that


file

until changed the property file to support utf-8. It was ok then to


enter

Arabic characters in the properties file but in the browser the


result

was
that the labels were not showing in Arabic. It was a kind of


garbage

words.
I also tried to build a simple Ofbiz application in Arabic like


that

shown
in the hello world tutorials but the same problem existed. Also it


seems

that every character in Arabic was displayed in the browser in


three

strange
characters. The direction is still left to right. I just want to


display

Arabic characters on the browser.

Re: [2] Arabic Translation

Reply via email to