Re: Java source escaping

Andreas Loew Thu, 31 Jan 2013 06:40:17 -0800

Hi again, Dridi,

see below for more comments... ;-)


Am 28.01.2013 15:43, schrieb Dridi Boukelmoune:

Hi Andreas,

Thank you for your quick answer. I didn't want to go too deep on the
details but I guess I have to :)

I fear so...

On Mon, Jan 28, 2013 at 3:14 PM, Andreas Loew <andreas.l...@oracle.com> wrote:

Hi Dridi,

Am 28.01.2013 14:57, schrieb Dridi Boukelmoune:

I'm having trouble building a project on Linux because of the classes
generated by XmlBeans. The build works properly on a Solaris platform
using the same tools:

Java : Hotspot 1.5.0_16
Ant : 1.6.5
XmlBeans: 2.3.1.0

For the XmlBeans version, it comes with Weblogic 10.0:
com.bea.core.xml.beaxmlbeans_2.3.1.0.jar
com.bea.core.xquery.xmlbeans-interop_1.0.0.0.jar
com.bea.core.xquery.beaxmlbeans-interop_1.0.0.0.jar
com.bea.core.xml.xmlbeans_2.3.1.0.jar

I hope my problem is not related to Weblogic's version.

I don't think so (see below for the details) - I rather tend to think that
this is a locale/encoding issue in the XSD and/or between those platforms.

So I have an XSD file containing something like:
<simpleType name="MyType">
          <restriction base="string">
                  <enumeration value="A cliché"></enumeration>
                  ...
          </restriction>
</simpleType>

With Linux, I get this output:
static final Enum A_CLICHÉ = Enum.forString("A cliché");

On the other hand, it produces this on Solaris:
static final Enum A_CLICH\311 = Enum.forString("A clich\351");

The java source code generated on Linux doesn't compile because of an
encoding mess I can't address now, so I'm currently trying to
understand how the code is generated. I haven't found yet which option
leads to either one or the other output that could have different
defaults based on the platform.

I hope someone can help me on this one.

Most probably, sorting out your locale/character encoding issues should
solve the issue.

First and foremost: Does the XSD in question with the French é include an
encoding declaration? Such as

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema>
     (...)
</xs:schema>

Yes it states a wrong UTF-8 encoding when it's actually ISO-8859-15.

Hmm - further below you state you must not modify the source - whichdefinitely is needed, because the encoding given in the XSD (if present)explicitly overrides any locales detected from the environment.

I was initially assuming that your XSD might not have specified anyencoding - but if it does, and does so wrongly, clearly this is whatneeds to be changed...

If you must not modify the source, you need the people who are indeedmaintaining this code to do so. I am not aware of any other workaroundto override a wrong encoding declaration given in an XSD. That does notmake real sense, as typically (other than in your very specificsituation), the simple and easy solution just is to correct the XSD tospecify the proper encoding...

If it does not, you need to explicitly mention the appropriate encoding of
that file in this place to make this portable.

If you don't, I expect the XMLBeans code generator to use the system's
default locale, which will be calculated on Unix platforms from environment
variables such as LANG, LC_CTYPE etc.?

LANG and LC_CTYPE are used, and part of the encoding mess I was
talking about. A mess because the locale names aren't the same on
Solaris and Linux (locale -a | grep fr_FR). And a mess because we have
a (complicated) shell script that drives the ant build a prepares a
proper environment depending on the project we're building.

Indeed. IMO, your build file should unfortunately need to explicitilyhandle these subtle differences if you want to use it on Solaris andLinux, regardless of the fact that even selecting the proper platformLANG / LC_CTYPE etc. values won't fix the wrong encoding in the XSD typeof issue, but will make the rest of the build run in a reproducible way.

Do you call scomp from the command line, or do you use Ant or Maven to call
the code generator?

That would be ant.

Which would be able to take explicit Java encoding as "-D" command lineparameter to the ant JVM, but the much better way to fix this is byfixing the build file to use correct Unix NLS (LANG /LC_CTYPE etc.)settings.

Also, how did you transport/copy the source code from the Solaris to the
Linux machine? Did you copy files in binary mode, or using scp (which might
have done recoding of text files on the fly based on language/encoding
settings on source and target machine)? So you should check that your XSD
files on both machines indeed are binary identical.

It's the same shell script that does a svn checkout after setting the
environment. So LANG and LC_CTYPE are set before the checkout and the
ant build.

Fine.

I hope that you should be able to simply add the appropriate encoding to
your XSD and be fine.

All the projects are built from ISO-8859-15 java sources. Also this is
an old legacy ant project, without any dependency management like ivy.

Fine, therefore your correct LANG value specifies fr_FR and the correct(OS-specific) notation for ISO 8859 p15.

There is one single project that is built from UTF-8 java sources. It
has a dependency on common classes generated from a bogus ISO-8859-15
XSD. So it builds on Solaris (a miracle) because the java code is
escaped, which results in a simple ASCII file, compatible with UTF-8.

This will need to be cleaned up anyway. Probably, your outer build fileshould dynamically switch the NLX environment (LANG / LC_CTYPE etc.)before calling into ant for this project.

As soon as the XSD contains the correct encoding that matches the file'scontents, then everything shoud start working fine... ;-)

I can't modify the source code for contractual reasons, the only thing
I can do so far is tweaking the environment. That's why I was
wondering if it could be some feature with a default value varying on
the platform (environment variable, system property, jvm flag...).

Unfortunately - as stated - not that I am aware of. You need to make theparty that created the faulty XSD file "simply" fix it... :-(


Best regards,
Andreas

Best Regards,
Dridi

Hope this helps & best regards,
Andreas


--
Andreas Loew | Senior Java Architect
Oracle Advanced Customer Services
ORACLE Germany


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@xmlbeans.apache.org
For additional commands, e-mail: user-h...@xmlbeans.apache.org

Re: Java source escaping

Reply via email to