Which encodings are available to you depends HEAVILY on the encoding
support in the underlying JVM. In your case: Blackdown.
Note that Sun does NOT require JVM's to support the same encodings that
the Sun JVM does.
And, the Xerces-J parser does NOT do it's own encoding support -- it
uses whatever is there in the JVM. So, if your JVM doesn't support an
encoding that you want to use, bad things will happen (it usually won't
do what you expect).
The code you have below is a clever workaround, but ultimately, you want
to use a JVM that has the encoding support built-in.
Here's my experience with encodings:
Has fewest encodings: Microsoft JVM
Blackdown JVM
Sun JVM
Has most encodings: IBM JVM
So, I'd suggest you try to use the IBM 1.1.8 JVM. It's fairly reliable,
scalable, and I think it has the encoding support you are looking for.
(Of course, I am biased in this! :-)
Mike
Dmitry Melekhov wrote:
>
> Hello!
>
> I'm not shure that tis list is write place
> for this question. If I do mistake, I'm sorry!
>
> Question is Cocoon related and about how xerces must
> works with encodings.
>
> I write my xml documents in koi8 encoding,
> but set I encoding or not I always see ???? in browser instead of
> 8 bit characters.
> Taras Shumeyko pointed me that this is formatter problem and
> that problem is in org.apache.xml.serialize.BaseMarkupSerializer
> in function protected String escape( String source )
>
> I changed it- remove all reecodings from it and now
> I have Cocoon and Xerces works OK.
> Here is my variant of function:
>
> protected String escape( String source )
> {
> StringBuffer result;
> int i;
> char ch;
> String charRef;
>
> result = new StringBuffer( source.length() );
> for ( i = 0 ; i < source.length() ; ++i ) {
> ch = source.charAt( i );
> // If the character is not printable, print as character
> reference.
> // Non printables are below ASCII space but not tab or line
> // terminator, ASCII delete, or above a certain Unicode
> threshold.
> // if ( ( ch < ' ' && ch != '\t' && ch != '\n' && ch != '\r' )
> ||
> // ch > _lastPrintable || ch == 0xF7 )
> // result.append( "&#" ).append( Integer.toString( ch )
> ).append( ';' );
> // else {
> // If there is a suitable entity reference for this
> // character, print it. The list of available entity
>
> // references is almost but not identical between
> // XML and HTML.
> // charRef = getEntityRef( ch );
> // if ( charRef == null )
> result.append( ch );
> // else
> // result.append( '&' ).append( charRef ).append(
> ';' );
> // }
> }
> return result.toString();
> }
>
> But this is dirty hack.
>
> I want to understand how must Xerces treat encodings and why
> it don't wokrs now.
>
> --
> Dmitry Melekhov
> http://www.aspec.ru/~dm
> 2:5050/[EMAIL PROTECTED]
>
> P.S.
> My java platform is blackdown jdk 1.1.7 for Linux x86