On Wed, 26 Sep 2001, Keith Wannamaker wrote:

> 0x3b = ';'.  Ignacio is right, SessionID doesn't remove the id
> because it is not expecting ; to be encoded.  So now it shows
> up in the URI and has the side effect of breaking sessions
> that depend on url rewriting.  But, the spec does say the URL
> should be encoded, so I'd rather fix SessionID with this patch.
>
> However, are there other places where TC is manipulating the
> URL and assuming it is unencoded?

I'm not sure this is the right solution ( but it's a good patch:-).

The 'original' URI included a ';' - which is a valid
character used in the right way. It shouldn't be encoded by
mod_jk - the whole reason for encoding is to try to reproduce
the original URI or something equivalent.

The only use for %3B is to allow the user to specify
some path-info ( or other path components ) that include the
';' character. In a URI ';' is used to pass additional
informations about the path ( and it seems it can be attached
to any path component ) - I never saw any server to use
this feature.

Well, we have a big nightmare here - and probably the only
way out is to find some consistent ( and implementable )
 interpretation of the involved specs and stick with that.

The encoded URI is used only to satisfy the servlet spec -
and re-encoding the URI is an imperfect workaround.

We have few choices:

1. revert to the use of unencoded_uri.
Problems:
- what about IIS and NES, where this is not available ?
- what about integrating with apache, where the decoded
uri is used for everything ( that means any attempt to
authenticate using apache modules may create huge problems)

2. Use a different encoding function, that doesn't
encode ';'.
Problems:
- encoding/decoding will result in a different URI
( thus the servlet spec will not be happy )
- inconsistency between tomcat standalone and tomcat+server

3. Revert to the use of uri ( i.e. the decoded uri ), and
change the getRequestURI ( the facade ) to generated a
'canonical' encoding.
Problems:
- again it'll not be the 'original' as required by servlet
( but at least this will be consistent across servers)


Using 'unencoded' URIs is a huge source of security problems -
the server and container _must_ use decoded URIs internally,
because otherwise security constraints would be useless.

On the other side, servlets that are doing any matching
on 'unencoded' URIs are likely to be tricked easily by
encoding tricks. So if a user is doing any security check,
he'll likely repeat all our bugs.

Using a 'canonical' encoding ( where only the chars that
are required to be encoded are encoded ) has the benefit
that gives a consistent output.

My preference would be to do (3).

Costin




Reply via email to