DO NOT REPLY [Bug 10385] - SSI-Servlet produces invalid character encoding information

bugzilla Thu, 26 Jun 2003 10:41:01 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10385>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10385

SSI-Servlet produces invalid character encoding information

[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |WORKSFORME



------- Additional Comments From [EMAIL PROTECTED]  2003-06-26 15:57 -------
I haven't found an existing solution to this problem, so I played a bit with 
the source and I have working fix for that.

First of all I am not very familiar with the procedure of applying patches to 
CVS (I mean I don't know if shall I report it before commiting anything or ask 
for a permission or anything else), so I didn't put it into the repository. 
Instead I will give out the source and/or binaries if somebody asks. I'll be 
happy if the patches would hit the repository anyway.

Okay, here's the trick: now SSIServlet handles two more init-parameters, ie. 
defaultInputEncoding and defaultOutputEncoding. First one tells the SSIInclude 
command to treat all processed (and included) files as they were written in 
this charset (by creating appriopriate readers). The second sets Content-
Type's charset attribute to given value and thus allow to create proper writer.

This forced me to add two methods to SSIExternalResolver interface: 
getDefaultInputEncoding and getDefaultOutputEncoding. Both return objects of 
the type java.nio.charset.Charset, that hold appropriate charsets.

If happens, that certain included file is in different charset than the rest, 
then it's charset can be entered after the file name. I was thinking of using 
separate parameter, but it would break NCSA standard, besides <!--#include> 
command allows any number of file/virtual parameters, so it would have to be 
written like this: <!--#include file="foo.txt" charset="iso-8859-2" 
file="bar.txt" charset="iso-8859-1"--> and so on. Well, maybe it's not bad, 
but as I've written, it breaks NCSA standard. So instead I've used the same 
syntax as in mail headers. So now we shall write: <!--#include 
file="foo.txt;charset=iso-8859-2" file="bar.txt; charset = iso-8859-1"--> 
a.s.o. I hope this will not break any rule, and I know---it's questionable.

This, however, solves my problems with incorrect output, and if we have all 
the files in the same charset, we do not have to use "...;charset=X" 
construction (to be honest, I haven't tested the charset stuff just mentioned).

Default encodings works however flawlessly. If anyone is interrested in this 
patch, please contact me. If Tomcat developers find this patch usefull or not 
too dirty/nasty, then I gladly add my .02 to the contribution.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 10385] - SSI-Servlet produces invalid character encoding information

Reply via email to