Hi Bashir and Amine,
> I think using Resource Bundle Editor plug in will solve the problem as
> Amine pointed.
Did it solve the problem? I have done some experiments last night and
got some interesting insights into details. So please allow me to share
what I found out, hopefully for the benefit of anyone who might run into
this.
First of all, the short answer:
As we all know, Java strings are Unicode internally. (Note that Unicode
!= UTF-8. UTF-8 is one possible representation of Unicode among others.)
When you read a properties files which contains \uXXXX code in it, you
can be quite sure you will have valid Unicode characters in your string
in memory. So I doubt that the problem is anywhere on the input side.
But in order to render stuff properly in the browser, you need to make
sure that you set the appropriate encoding to the servlet response. You
usually do it like that:
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException,
IOException {
response.setContentType("text/html; charset=UTF-8");
PrintWriter pw = response.getWriter();
This is not the same as just outputting
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
from your servlet code, because Java knows that it has Unicode strings
in memory and unless you set the encoding of your output stream which
writes to the browser properly, Java is intelligent to know that it
cannot render certain unicode ranges (like Chineese or Arabic) to an
output device that it thinks is Latin-1 (ISO-8859-1), so it will just
put a question mark '?' in there for those characters it thinks it
cannot print.
Well, the problem you showed in your screenshot at
http://www.nabble.com/file/6705/ArabicLabel.JPG
is different.
These characters are what you can expect to see if you render an UTF-8
encoded string to a Latin-1 based device without Java knowing that the
string is UTF-8.
So I was able to reproduce that pattern when I wrote my properties file
directly in UTF-8 (not using \uXXXX escape codes but real UTF-8). In
that case, Java expected the properties file to be in Latin-1, which is
in line with the spec and thus did *not* build a proper Unicode string
with Arabic characters in memory.
Take a look at
https://java.sun.com/j2se/1.4.2/docs/api/java/util/Properties.html
> When saving properties to a stream or loading them from a stream, the
> ISO 8859-1 character encoding is used. For characters that cannot be
> directly represented in this encoding, Unicode escapes are used;
> however, only a single 'u' character is allowed in an escape sequence.
> The native2ascii tool can be used to convert property files to and
> from other character encodings.
In other words:
The kind of pattern you see in your screenshot can only come from some
UTF-8 (not \uXXXX escape sequences) being misread as Latin-1 ASCII and
then passed on. So did either you or that RBE plugin maybe save /
convert your properties file?
Out the output side, this cannot happen, even if you mess up the
character coding somewhere in the pipeline, as Java will either render
it properly or use questions marks.
But to be on the safe side and make sure that OFBiz will be setting the
Servlet response encoding to UTF-8, you probably need to set a parameter
in some web.xml files.
Take a look at the code here, OFBiz does something quite interesting there:
http://svn.apache.org/repos/asf/ofbiz/trunk/framework/webapp/src/org/ofbiz/webapp/control/ControlServlet.java
// setup DEFAULT chararcter encoding and content type, this will be
overridden in the RequestHandler for view rendering
String charset = getServletContext().getInitParameter("charset");
if (charset == null || charset.length() == 0) charset =
request.getCharacterEncoding();
if (charset == null || charset.length() == 0) charset = "UTF-8";
Debug.logInfo("The character encoding of the request is: [" +
request.getCharacterEncoding() + "]. The character encoding we will use
for the request and response is: [" + charset + "]", module);
In plain text:
- If the charset servlet init parameter is set: use that one. (It's
*not* set by default. You can set it in web.xml.)
- If it's not set, use the request's encoding. (This can be the trap! I
am not sure what encoding a browser would use for sending the request!)
- If neither one's the case, set a default of UTF-8.
It would make sense to watch the debug log output on your system as it's
going to tell you what encoding it uses, but I'd bet it is using UTF-8
for one or the other reason. Just you take out some uncertainty by
explicitely forcing UTF-8 through that init-parameter. You can never be
sure what a user's browser does.
I will send you my sandbox standalone servlet which demonstrates some of
the issues mentioned here. I cannot attach it to the mail as the list
does not seem to allow this.
Hope this helps.
Note once again, this has nothing to do yet with RTL (right-to-left)
support as well as transcoding / transliteration. Has a Jira issue been
created on that one yet. If so, I'd be happy to contibute my 2 cents to
it. I can just warn from experience with an app: RTL is !=
transliteration / transcoding and both is far from trivial. Not
technically, just to decide *what* you want to do. But that's a whole
separate email.
Have a nice weekend. We're close to it yet.
Regards,
Torsten
Bashir Alfetori schrieb:
Trosten,
I think using Resource Bundle Editor plug in will solve the problem as Amine
pointed.
Regards,
Bashir
Torsten Schlabach-2 wrote:
Ok, so the problem seems to be somewhere else.
Let me see if I will find some time on that subject tonight.
Regards,
Torsten
Bashir Alfetori schrieb:
Hi, Torsten!
I am using Mozilla Firefox. I tried with Internet Explorer and the same
problem occurred.
The meta tag is set properly as you stated. here is part of the source
code
of the page
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>OFBiz: Accounting Manager: Edit Agreement</title>
<script language="javascript" src="/images/calendar1.js"
type="text/javascript"></script>
<script language="javascript" src="/images/selectall.js"
type="text/javascript"></script>
<script language="javascript" src="/images/fieldlookup.js"
type="text/javascript"></script>
<link rel="stylesheet" href="/images/maincss.css" type="text/css"/>
<link rel="stylesheet" href="/images/tabstyles.css" type="text/css"/>
</head>
Best regards,
Bashir
Torsten Schlabach-2 wrote:
Hi Bashir!
Forget about Windows-1256. This is not what you want.
What browser are you using? Internet Explorer or Mozilla? Would you
bring trying the other one, as one step?
Here's what I'd check:
- Is the page delivered to the browser as HTML or XHTML? (You can check
using the "show page source" option in your browser.)
- In case it's XHTML and in case the browser is Internet Explorer, it's
not enough to have <?xml version="1.0" encoding="UTF-8"?>, but you
should either add an HTML meta tag like this:
<head>
...
<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
...
</head>
or make sure the corresponding HTTP header is sent.
I am not that familiar with OFBiz internals to tell you how to ake this
happen, but I am sure other people on the list would be able to help.
But in order to find out if this would solve your problem or now, just
do this:
Save the page to the harddisk.
Add that <meta ...> tag manuelly to the <head> section of your HTML.
Reload the saved page from your harddisk into the browser.
I found IE wanted that extra info while Mozilla doesn't.
Regards,
Torsten
Bashir Alfetori schrieb:
Torstan,
As an example, here is a screen shot of how one label appears in the
Create
Agreement Screen.
http://www.nabble.com/file/6705/ArabicLabel.JPG
The Character encoding of the browser is set by default to
Unicode(UTF-8).
if changed to Arabic(Windows-1256) the second screen shot is obtained.
http://www.nabble.com/file/6706/ArabicLabel_Encoding_Windows-1256.JPG
Best regards,
Bashir
Bashir Alfetori wrote:
Torstan,
Arabic word: الرئيسية
ِAppears in the browser as: الرئيسية
The character encoding in the browser is set to Unicode (UTF-8)
Torsten Schlabach-2 wrote:
Bashir,
could you send a screenhot of how it looks like?
Regards,
Torsten
Bashir Alfetori schrieb:
Adrian
Yes, Arabic is right-to-left language. I have modified the existing
style
sheet to reverse the direction. I noticed that not every thing is ok
when
reversing direction especially in forms. Now, first I am concerned
about
displaying Arabic characters even if the direction is still
left-to-right.
Till now I couldn't do that as mentioned above.
Regards,
Bashir
Adrian Crum wrote:
Bashir,
Is Arabic a right-to-left language? If yes, then you can either
modify
the
existing style sheets to reverse the direction or you can leave the
existing
style sheets alone and cascade a "right-to-left" style sheet that
reverses
the
direction.
I have done some experiments with reversing the direction in OFBiz.
Let
me
know
if you need any help.
-Adrian
Bashir Alfetori wrote:
I am trying to start with translating OFBiz to Arabic. I tried one
properties
file. At first I was not able to enter Arabic characters into that
file
until changed the property file to support utf-8. It was ok then to
enter
Arabic characters in the properties file but in the browser the
result
was
that the labels were not showing in Arabic. It was a kind of
garbage
words.
I also tried to build a simple Ofbiz application in Arabic like
that
shown
in the hello world tutorials but the same problem existed. Also it
seems
that every character in Arabic was displayed in the browser in
three
strange
characters. The direction is still left to right. I just want to
display
Arabic characters on the browser.