----- Original Message ----- From: Mike Pogue <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, January 28, 2000 8:33 PM Subject: Re: xml encodings, java
> The code you have below is a clever workaround, but ultimately, you want > to use a JVM that has the encoding support built-in. > > So, I'd suggest you try to use the IBM 1.1.8 JVM. It's fairly reliable, > scalable, and I think it has the encoding support you are looking for. > (Of course, I am biased in this! :-) > OK. I just tried IBM jdk, it work exactly as blackdown in this case. But I wont to know how must xerces (or may be this is cocoon problem, I don't know) works with encodings. Why there is code which I comment out? Why not to work with xml content like with raw data, only processing tags? How must it works if I set encoding in xml document and is it input (i.e. what I have in xml) or output (i.e. what cocoon send to browser) encoding, etc? I want to understand how it works ! :) Dmitry Melekhov http://www.aspec.ru/~dm 2:5050/[EMAIL PROTECTED] > Mike > > > Dmitry Melekhov wrote: > > > > Hello! > > > > I'm not shure that tis list is write place > > for this question. If I do mistake, I'm sorry! > > > > Question is Cocoon related and about how xerces must > > works with encodings. > > > > I write my xml documents in koi8 encoding, > > but set I encoding or not I always see ???? in browser instead of > > 8 bit characters. > > Taras Shumeyko pointed me that this is formatter problem and > > that problem is in org.apache.xml.serialize.BaseMarkupSerializer > > in function protected String escape( String source ) > > > > I changed it- remove all reecodings from it and now > > I have Cocoon and Xerces works OK. > > Here is my variant of function: > > > > protected String escape( String source ) > > { > > StringBuffer result; > > int i; > > char ch; > > String charRef; > > > > result = new StringBuffer( source.length() ); > > for ( i = 0 ; i < source.length() ; ++i ) { > > ch = source.charAt( i ); > > // If the character is not printable, print as character > > reference. > > // Non printables are below ASCII space but not tab or line > > // terminator, ASCII delete, or above a certain Unicode > > threshold. > > // if ( ( ch < ' ' && ch != '\t' && ch != '\n' && ch != '\r' ) > > || > > // ch > _lastPrintable || ch == 0xF7 ) > > // result.append( "&#" ).append( Integer.toString( ch ) > > ).append( ';' ); > > // else { > > // If there is a suitable entity reference for this > > // character, print it. The list of available entity > > > > // references is almost but not identical between > > // XML and HTML. > > // charRef = getEntityRef( ch ); > > // if ( charRef == null ) > > result.append( ch ); > > // else > > // result.append( '&' ).append( charRef ).append( > > ';' ); > > // } > > } > > return result.toString(); > > } > > > > But this is dirty hack. > > > > I want to understand how must Xerces treat encodings and why > > it don't wokrs now. > > > > -- > > Dmitry Melekhov > > http://www.aspec.ru/~dm > > 2:5050/[EMAIL PROTECTED] > > > > P.S. > > My java platform is blackdown jdk 1.1.7 for Linux x86 > >
