Re: Recent charset breakage

2021-04-01 Thread Christopher Schultz

Konstantin,

On 4/1/21 05:06, Konstantin Kolinko wrote:

чт, 1 апр. 2021 г. в 00:55, Christopher Schultz :


[...]

I've written a tiny JSP to demonstrate the problem.

charecho.jsp
 CUT 
<%
response.setContentType("text/html");
response.setCharacterEncoding("UTF-8");
%>




The value above is misspelled. You are missing "charset=" before "UTF-8".
Personally, I usually echo the actual contentType header value when
writing a meta tag. I think that would be



Thanks for pointing that out. I have modified the charecho.jsp file, so 
it is now:


<%@page contentType="text/html; charset=UTF-8" %>








<%= (null != request.getParameter("text") ? 
request.getParameter("text") : "")%>






The behavior is the same.

If I instead insert the following after the @page directive (to act as a 
filter, to keep the example completely self-contained), then this works 
as desired:


<%
  if(null == request.getCharacterEncoding()) {
application.log("Character encoding is unset; setting to UTF-8");
request.setCharacterEncoding("UTF-8");
  }
%>


[...]



So, somewhat "mystery solved" although I'd like to understand why
 didn't work.


Does validating your web.xml file against an xsd schema complete successfully?

request-character-encoding is defined in
(javax|jakarta)/serv/et/resources/web-app_4_0.xsd, which means Tomcat
9 or later. You wrote that you are running Tomcat 8.5.


Ooh, that would do it.

Confirmed: Using  with Tomcat *9* behaves as 
desired, even without the filter/hack to correct a missing charset.


Thanks,
-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Recent charset breakage

2021-04-01 Thread Konstantin Kolinko
чт, 1 апр. 2021 г. в 00:55, Christopher Schultz :
>
> [...]
>
> I've written a tiny JSP to demonstrate the problem.
>
> charecho.jsp
>  CUT 
> <%
>response.setContentType("text/html");
>response.setCharacterEncoding("UTF-8");
> %>
> 
> 

The value above is misspelled. You are missing "charset=" before "UTF-8".
Personally, I usually echo the actual contentType header value when
writing a meta tag. I think that would be


[...]

>
> So, somewhat "mystery solved" although I'd like to understand why
>  didn't work.

Does validating your web.xml file against an xsd schema complete successfully?

request-character-encoding is defined in
(javax|jakarta)/serv/et/resources/web-app_4_0.xsd, which means Tomcat
9 or later. You wrote that you are running Tomcat 8.5.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Recent charset breakage

2021-03-31 Thread Christopher Schultz

All,

On 3/31/21 17:54, Christopher Schultz wrote:

All,

I got a report of a user on our development system at $work saying that 
special characters were being mangled. We are using Tomcat 8.5 with a 
custom web application and MariaDB under the hood. We are expecting to 
use UTF-8 everywhere and I can confirm that our testing environment and 
production environments do *not* have this problem.


I've written a tiny JSP to demonstrate the problem.

charecho.jsp
 CUT 
<%
  response.setContentType("text/html");
  response.setCharacterEncoding("UTF-8");
%>






<%= (null != request.getParameter("text") ? 
request.getParameter("text") : "")%>





 CUT 

I tried this on my development and testing environments and it behaves 
properly in my testing environment running 8.5.53, but not on my 
development environment running 8.5.64.


So I got myself a fresh copy of both 8.5.53 and 8.5.64 and put this JSP 
into the ROOT web application and it didn't work as expected.


Just enter either or both of these multi-byte Unicode characters into 
the text area and submit the form. You'll get mangled characters showing 
up which, if you submit many times, will multiple over and over again.


†


Our custom application does have a "character encoding filter" in-place 
which sets the request character encoding to "UTF-8" if it's null (which 
is very common) which is the only thing I can think of that's not quite 
similar to an out-of-the-box configuration for Tomcat.


I'm in the process of checking *everything*. But I'm hoping someone can 
(a) explain why the above JSP doesn't behave as expected on an 
out-of-the-box Tomcat and (b) what I might be overlooking, especially 
since this has been working for us for many years without any problems 
until somewhat recently.


Thanks,
-chris


I knew this had to be a problem in my own environment, but here's the 
explanation. First, to answer (a) above:


In order to make charecho.jsp work as expected in a vanilla Tomcat 
environment, you have to use a CharacterEncodingFilter. I wasn't able to 
get it to work by simply adding 
UTF-8 to 
webapps/ROOT/WEB-INF/web.xml.


Once that was done, it works as expected.

For my own environment, we recently violated item #6 from this set of 
instructions:


https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding#CharacterEncoding-Q8

We (I, actually!) had installed a new  which reads a request 
parameter and it was firing *before* the CharacterEncodingFilter was 
setting the default character encoding.


So, somewhat "mystery solved" although I'd like to understand why 
 didn't work.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Recent charset breakage

2021-03-31 Thread Christopher Schultz

All,

I got a report of a user on our development system at $work saying that 
special characters were being mangled. We are using Tomcat 8.5 with a 
custom web application and MariaDB under the hood. We are expecting to 
use UTF-8 everywhere and I can confirm that our testing environment and 
production environments do *not* have this problem.


I've written a tiny JSP to demonstrate the problem.

charecho.jsp
 CUT 
<%
  response.setContentType("text/html");
  response.setCharacterEncoding("UTF-8");
%>






<%= (null != request.getParameter("text") ? 
request.getParameter("text") : "")%>





 CUT 

I tried this on my development and testing environments and it behaves 
properly in my testing environment running 8.5.53, but not on my 
development environment running 8.5.64.


So I got myself a fresh copy of both 8.5.53 and 8.5.64 and put this JSP 
into the ROOT web application and it didn't work as expected.


Just enter either or both of these multi-byte Unicode characters into 
the text area and submit the form. You'll get mangled characters showing 
up which, if you submit many times, will multiple over and over again.


†


Our custom application does have a "character encoding filter" in-place 
which sets the request character encoding to "UTF-8" if it's null (which 
is very common) which is the only thing I can think of that's not quite 
similar to an out-of-the-box configuration for Tomcat.


I'm in the process of checking *everything*. But I'm hoping someone can 
(a) explain why the above JSP doesn't behave as expected on an 
out-of-the-box Tomcat and (b) what I might be overlooking, especially 
since this has been working for us for many years without any problems 
until somewhat recently.


Thanks,
-chris


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org