Hi Bashir and Amine,

> I think using Resource Bundle Editor plug in will solve the problem as
> Amine pointed.

Did it solve the problem? I have done some experiments last night and got some interesting insights into details. So please allow me to share what I found out, hopefully for the benefit of anyone who might run into this.

First of all, the short answer:

As we all know, Java strings are Unicode internally. (Note that Unicode != UTF-8. UTF-8 is one possible representation of Unicode among others.) When you read a properties files which contains \uXXXX code in it, you can be quite sure you will have valid Unicode characters in your string in memory. So I doubt that the problem is anywhere on the input side.

But in order to render stuff properly in the browser, you need to make sure that you set the appropriate encoding to the servlet response. You usually do it like that:

        protected void doGet(HttpServletRequest request,
                        HttpServletResponse response) throws ServletException, 
IOException {

                response.setContentType("text/html; charset=UTF-8");
                
                PrintWriter pw = response.getWriter();

This is not the same as just outputting

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

from your servlet code, because Java knows that it has Unicode strings in memory and unless you set the encoding of your output stream which writes to the browser properly, Java is intelligent to know that it cannot render certain unicode ranges (like Chineese or Arabic) to an output device that it thinks is Latin-1 (ISO-8859-1), so it will just put a question mark '?' in there for those characters it thinks it cannot print.

Well, the problem you showed in your screenshot at

http://www.nabble.com/file/6705/ArabicLabel.JPG

is different.

These characters are what you can expect to see if you render an UTF-8 encoded string to a Latin-1 based device without Java knowing that the string is UTF-8.

So I was able to reproduce that pattern when I wrote my properties file directly in UTF-8 (not using \uXXXX escape codes but real UTF-8). In that case, Java expected the properties file to be in Latin-1, which is in line with the spec and thus did *not* build a proper Unicode string with Arabic characters in memory.

Take a look at

https://java.sun.com/j2se/1.4.2/docs/api/java/util/Properties.html

> When saving properties to a stream or loading them from a stream, the
> ISO 8859-1 character encoding is used. For characters that cannot be
> directly represented in this encoding, Unicode escapes  are used;
> however, only a single 'u' character is allowed in an escape sequence.
> The native2ascii tool can be used to convert property files to and
> from other character encodings.

In other words:

The kind of pattern you see in your screenshot can only come from some UTF-8 (not \uXXXX escape sequences) being misread as Latin-1 ASCII and then passed on. So did either you or that RBE plugin maybe save / convert your properties file?

Out the output side, this cannot happen, even if you mess up the character coding somewhere in the pipeline, as Java will either render it properly or use questions marks.

But to be on the safe side and make sure that OFBiz will be setting the Servlet response encoding to UTF-8, you probably need to set a parameter in some web.xml files.

Take a look at the code here, OFBiz does something quite interesting there:

http://svn.apache.org/repos/asf/ofbiz/trunk/framework/webapp/src/org/ofbiz/webapp/control/ControlServlet.java

// setup DEFAULT chararcter encoding and content type, this will be overridden in the RequestHandler for view rendering
        String charset = getServletContext().getInitParameter("charset");
if (charset == null || charset.length() == 0) charset = request.getCharacterEncoding();
        if (charset == null || charset.length() == 0) charset = "UTF-8";
Debug.logInfo("The character encoding of the request is: [" + request.getCharacterEncoding() + "]. The character encoding we will use for the request and response is: [" + charset + "]", module);

In plain text:

- If the charset servlet init parameter is set: use that one. (It's *not* set by default. You can set it in web.xml.) - If it's not set, use the request's encoding. (This can be the trap! I am not sure what encoding a browser would use for sending the request!)
- If neither one's the case, set a default of UTF-8.

It would make sense to watch the debug log output on your system as it's going to tell you what encoding it uses, but I'd bet it is using UTF-8 for one or the other reason. Just you take out some uncertainty by explicitely forcing UTF-8 through that init-parameter. You can never be sure what a user's browser does.

I will send you my sandbox standalone servlet which demonstrates some of the issues mentioned here. I cannot attach it to the mail as the list does not seem to allow this.

Hope this helps.

Note once again, this has nothing to do yet with RTL (right-to-left) support as well as transcoding / transliteration. Has a Jira issue been created on that one yet. If so, I'd be happy to contibute my 2 cents to it. I can just warn from experience with an app: RTL is != transliteration / transcoding and both is far from trivial. Not technically, just to decide *what* you want to do. But that's a whole separate email.

Have a nice weekend. We're close to it yet.

Regards,
Torsten

Bashir Alfetori schrieb:
Trosten,
I think using Resource Bundle Editor plug in will solve the problem as Amine
pointed.

Regards,
Bashir


Torsten Schlabach-2 wrote:

Ok, so the problem seems to be somewhere else.
Let me see if I will find some time on that subject tonight.
Regards,
Torsten

Bashir Alfetori schrieb:

Hi, Torsten!

I am using Mozilla Firefox. I tried with Internet Explorer and the same
problem occurred.

The meta tag is set properly as you stated. here is part of the source
code
of the page

<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
   <title>OFBiz: Accounting Manager: Edit Agreement</title>
   <script language="javascript" src="/images/calendar1.js"
type="text/javascript"></script>
   <script language="javascript" src="/images/selectall.js"
type="text/javascript"></script>

   <script language="javascript" src="/images/fieldlookup.js"
type="text/javascript"></script>
   <link rel="stylesheet" href="/images/maincss.css" type="text/css"/>
   <link rel="stylesheet" href="/images/tabstyles.css" type="text/css"/>
</head>

Best regards, Bashir



Torsten Schlabach-2 wrote:


Hi Bashir!

Forget about Windows-1256. This is not what you want.

What browser are you using? Internet Explorer or Mozilla? Would you bring trying the other one, as one step?

Here's what I'd check:

- Is the page delivered to the browser as HTML or XHTML? (You can check using the "show page source" option in your browser.) - In case it's XHTML and in case the browser is Internet Explorer, it's not enough to have <?xml version="1.0" encoding="UTF-8"?>, but you should either add an HTML meta tag like this:

<head>
...
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
...
</head>

or make sure the corresponding HTTP header is sent.

I am not that familiar with OFBiz internals to tell you how to ake this happen, but I am sure other people on the list would be able to help.

But in order to find out if this would solve your problem or now, just do this:

Save the page to the harddisk.
Add that <meta ...> tag manuelly to the <head> section of your HTML.
Reload the saved page from your harddisk into the browser.

I found IE wanted that extra info while Mozilla doesn't.

Regards,
Torsten


Bashir Alfetori schrieb:


Torstan,
As an example, here is a screen shot of how one label appears in the
Create
Agreement Screen.
http://www.nabble.com/file/6705/ArabicLabel.JPG
The Character encoding of the browser is set by default to
Unicode(UTF-8).
if changed to Arabic(Windows-1256) the second screen shot is obtained.
http://www.nabble.com/file/6706/ArabicLabel_Encoding_Windows-1256.JPG

Best regards,
Bashir



Bashir Alfetori wrote:



Torstan,

Arabic word:                             الرئيسية
ِAppears in the browser as:         الرئيسية

The character encoding in the browser is set to Unicode (UTF-8)




Torsten Schlabach-2 wrote:



Bashir,

could you send a screenhot of how it looks like?

Regards,
Torsten

Bashir Alfetori schrieb:



Adrian

Yes, Arabic is right-to-left language. I have modified the existing
style
sheet to reverse the direction. I noticed that not every thing is ok
when
reversing direction especially in forms. Now, first I am concerned

about


displaying Arabic characters even if the direction is still
left-to-right.
Till now I couldn't do that as mentioned above.


Regards, Bashir



Adrian Crum wrote:




Bashir,

Is Arabic a right-to-left language? If yes, then you can either

modify

the


existing style sheets to reverse the direction or you can leave the
existing style sheets alone and cascade a "right-to-left" style sheet that

reverses



the direction.

I have done some experiments with reversing the direction in OFBiz.

Let


me



know if you need any help.

-Adrian


Bashir Alfetori wrote:




I am trying to start with translating OFBiz to Arabic. I tried one
properties
file. At first I was not able to enter Arabic characters into that

file


until changed the property file to support utf-8. It was ok then to

enter



Arabic characters in the properties file but in the browser the

result


was
that the labels were not showing in Arabic. It was a kind of

garbage

words.
I also tried to build a simple Ofbiz application in Arabic like

that

shown
in the hello world tutorials but the same problem existed. Also it

seems



that every character in Arabic was displayed in the browser in

three

strange
characters. The direction is still left to right. I just want to

display



Arabic characters on the browser.




Reply via email to