Re: xml encodings, java

Mike Pogue 28 Jan 2000 16:39:46 -0000

Which encodings are available to you depends HEAVILY on the encoding
support in the underlying JVM.  In your case: Blackdown.


Note that Sun does NOT require JVM's to support the same encodings that
the Sun JVM does.
And, the Xerces-J parser does NOT do it's own encoding support -- it
uses whatever is there in the JVM. So, if your JVM doesn't support an
encoding that you want to use, bad things will happen (it usually won't
do what you expect).

The code you have below is a clever workaround, but ultimately, you want
to use a JVM that has the encoding support built-in.

Here's my experience with encodings:

Has fewest encodings:   Microsoft JVM
                        Blackdown JVM

                        Sun JVM
Has most encodings:     IBM JVM

So, I'd suggest you try to use the IBM 1.1.8 JVM.  It's fairly reliable,
scalable, and I think it has the encoding support you are looking for. 
(Of course, I am biased in this! :-)

Mike


Dmitry Melekhov wrote:
> 
> Hello!
> 
> I'm not shure that tis list is write place
> for this question. If I do mistake, I'm sorry!
> 
> Question is Cocoon related and about how xerces must
> works with encodings.
> 
> I write my xml documents in koi8 encoding,
> but set I encoding or not I always see ???? in browser instead of
> 8 bit characters.
> Taras Shumeyko pointed me that this is formatter problem and
> that problem is in org.apache.xml.serialize.BaseMarkupSerializer
> in function    protected String escape( String source )
> 
> I changed it- remove all reecodings from it and now
> I have Cocoon and Xerces works OK.
> Here is my variant of function:
> 
>   protected String escape( String source )
>     {
>         StringBuffer    result;
>         int             i;
>         char            ch;
>         String          charRef;
> 
>         result = new StringBuffer( source.length() );
>         for ( i = 0 ; i < source.length() ; ++i )  {
>             ch = source.charAt( i );
>             // If the character is not printable, print as character
> reference.
>             // Non printables are below ASCII space but not tab or line
>             // terminator, ASCII delete, or above a certain Unicode
> threshold.
> //          if ( ( ch < ' ' && ch != '\t' && ch != '\n' && ch != '\r' )
> ||
> //               ch > _lastPrintable || ch == 0xF7 )
> //                  result.append( "&#" ).append( Integer.toString( ch )
> ).append( ';' );
> //          else {
>                     // If there is a suitable entity reference for this
>                     // character, print it. The list of available entity
> 
>                     // references is almost but not identical between
>                     // XML and HTML.
> //                  charRef = getEntityRef( ch );
> //                  if ( charRef == null )
>                         result.append( ch );
> //                  else
> //                      result.append( '&' ).append( charRef ).append(
> ';' );
> //          }
>         }
>         return result.toString();
>     }
> 
> But this is dirty hack.
> 
> I want to understand how must Xerces treat encodings and why
> it don't wokrs now.
> 
> --
> Dmitry Melekhov
> http://www.aspec.ru/~dm
> 2:5050/[EMAIL PROTECTED]
> 
> P.S.
> My java platform is blackdown jdk 1.1.7 for Linux x86

Re: xml encodings, java

Reply via email to