On 20.10.2016 15:55, Mark Juszczec wrote:
On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) <a...@ice-sa.com>
wrote:


Can you tell us (or remind us) exactly how the browser is sending this
request for the parameter "JOEL" (with dieraesis on the E) to the server ?
Is it a part of the query-string of the URL, or is it in the body of a
POST request ?

The following on-line documentation describes precisely how this should
work :
http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
(See "URIEncoding", but also "useBodyEncodingForURI", and follow the link
provided to the same attributes in the HTTP Connector :
http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
)

So check exactly what you are doing, and if that matches these rules
somehow.

Personal rant :
Unfortunately, this is is still a big mess in the HTTP protocol.
And the people in charge of the design of the protocol missed a golden
opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the
default, instead of clinging to iso-8859-1. Thus condemning all web
programmers worldwide to another 20 years of obscure bugs and clunky
work-arounds.

(s) Andr%C3%A9


The data is being returned by Shibboleth and passed to Tomcat in the body
of an HTTP GET request.

Nitpick : that is a contradiction in terms. A GET request, per RFC, has no 
"body".
See : https://tools.ietf.org/html/rfc7231#section-4  4.3.1 GET

I don't know Shibboleth, and I do not know how it works exactly, but based on what you seem to imply here, I will assume that the "joel" in question is being passed as part of the GET request URL (like "..?givenName=joel&otherparam=xxx..").
(Technically, that part is the "query-string" part of the URI).

Based on what else you indicate below about Shibbolet, I would also assume that the "e with dieresis" (sorry, can't type it on my German keyboard), is passed in that query-string, as iso-8859-1, perhaps percent-encoded as %CB or %EB.

Receiving this, recent Tomcats would decode this either as iso-8859-1 (latin-1) (if STRICT_SERVLET_COMPLIANCE is enforced), or as UTF-8 (by default), or according to what you set as "URIEncoding" and/or "useBodyEncodingForURI". If it tries UTF-8, that may or may not generate a valid Java Unicode character, but it would in any case not be the character that you expect. If you set it to decode the URIs using iso-8859-1, then it would decode this correctly (and generate the correct java Unicode character in your application), but it would decode *all* further request URIs using iso-8859-1, which would most probably have adverse effects on the rest of your application.

So it would seem that you are stuck somewhere in-between.
But it is not a Tomcat issue, it is a Shibbolet issue.
(Or rather, a Shibbolet-and-HTTP-defaulting-to-iso-8859-1 issue).


This is by design of the application and there's nothing I can do about it.


Neither can we.

As such, my only options for enforcing UTF-8 are by using "URIEncoding"
and/or "useBodyEncodingForURI" as described in the links.

I've done this and it has not had any impact on the problem.

Last night, I found these bits of information:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

My interpretation (and PLEASE tell me if I'm wrong) is, since at least
2007, headers have been locked in to the ISO-8859-1 charset due to specs
that govern how the world wide web is going to work.


Well yes, see my previous rant.
See : https://tools.ietf.org/html/rfc7230#section-3.2
3.2.4.  Field Parsing (at the end)

This:

https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeAccess

I am sorry, but I do not really have the time right now (nor the setup) to investigate further into what Shibbolet is doing, or what they are really explaining in that article. But while reading this "in diagonal", I have a suspicion that maybe the following may help you, in the case of a mod_jk Connector to Tomcat :

http://tomcat.apache.org/connectors-doc/reference/apache.html

JkEnvVar        

"Adds a name and an optional default value of environment variable that should be sent to servlet-engine as a request attribute. If the default value is not given explicitly, the variable will only be send, if it is set during runtime.
The default is empty, so no additional variables will be sent.
This directive can be used multiple times per virtual server. The settings will be merged between the global server and any virtual server. You can retrieve the variables on Tomcat as request attributes via request.getAttribute(attributeName). Note that the variables send via JkEnvVar will not be listed in request.getAttributeNames(). Empty default values are supported since version 1.2.20. Not sending variables with empty defaults and empty runtime value has been introduced in version 1.2.21. "

In other words : if Shibbolet can send this value in the form of a HTTP header, and you can configure the Apache httpd front-end to pick up the value of that header and set it into an "Apache environment variable" (perhaps with mod_rewrite and a RewriteRule)), then you could ask mod_jk to forward this variable content to Tomcat, as a request attribute.
(and thus pick it up with request.getAttribute(), and perhaps in the correct 
encoding)

A lot of speculation here..

(And maybe by the above, I am just duplicating what Shibbolet already does by 
itself)



goes on to reiterate what the first link says and propose a workaround (see
the Java link at the end of the page)

"Shibboleth attributes are by default UTF-8 encoded. However, depending on
the servlet contaner configuration they are interpreted as ISO-8859-1
values. This causes problems with non-ASCII characters. The solution is to
re-encode attributes, e.g. with:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");"


Although MY data is delivered as attributes (so I have to use
request.getAttribute("FirstName") )  this works

ISO-8859-1 is the default used by ByteChunk and I've verified it is not
reset/changed to UTF-8 despite having specified it in server.xml per Tomcat
documentation.

I found this:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

which says this problem has been around since at least 2007

Then I found this:

https://wiki.shibboleth.net/confluence/plugins/servlet/mobil
e#content/view/4358180

which suggests the following solution:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");

I have to get my data via request.getAttribute("key")

Is the solution appropriate for data delivered as attributes?
I have read the information that says its a dangerous hack and is the main
reason I have not implemented it.

However, given the Shibboleth forum posts and what I've discovered about
ByteChunk seems to cast this in a different light.

Any thoughts, comments would be greatly appreciated.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to