RE: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
What the status of that one about a week later ? I recall the discussions some months ago about replacing the previous uri with unparsed_uri. Did we have a way to determine that the uri came from mod_rewrite and not from client (via the notes). In that case what about using r-uri instead of r-unparsed_uri ? - Henri Gomez ___[_] EMAIL : [EMAIL PROTECTED](. .) PGP KEY : 697ECEDD...oOOo..(_)..oOOo... PGP Fingerprint : 9DF8 1EA8 ED53 2F39 DC9B 904A 364F 80E6 -Original Message- From: Bill Barker [mailto:[EMAIL PROTECTED]] Sent: Wednesday, August 15, 2001 9:51 PM To: [EMAIL PROTECTED] Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix 1.3.17 (with negotiation_module removed to prevent that problem). - Original Message - From: [EMAIL PROTECTED] To: Bill Barker [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 1:01 PM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix Apache2.0 + mod_jk + JNI + tc3.3 gives me the correct answer, 404 ( with the correct URI - /?A=B.jsp ). Note that typing the unencoded version is returning the correct answer too, i.e. index.html. What version of apache are you using ? Costin On Wed, 15 Aug 2001, Bill Barker wrote: It is actually worse than that. TC3.3B1 (with the mod_jk that it ships with, I haven't tried j-t-c yet) gives a directory listing in response to: http://myserver/%3f%41%3d%42.jsp - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 11:44 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 11:49:43PM -0400, Keith Wannamaker wrote: Try ap_escape_uri That does the trick. Here's the patch which gets things working again, thanks for all the help. Hopefully this will get applied soon. Is there any 3.2.4 release planned to fix the small number of bugs/problems in 3.2.3 (I also recall bumping into some issues with error documents and getting into infinite loops which were fixed) Thanks, Dave --- mod_jk.c.orig Tue Jun 19 15:44:57 2001 +++ mod_jk.cTue Aug 14 22:42:32 2001 @@ -358,13 +358,12 @@ s-method = (char *)r-method; s-content_length = get_content_length(r); s-query_string = r-args; -s-req_uri = r-unparsed_uri; -if (s-req_uri != NULL) { - char *query_str = strchr(s-req_uri, '?'); - if (query_str != NULL) { - *query_str = 0; - } -} +/* + * The 2.2 servlet spec errata says the uri from + * HttpServletRequest.getRequestURI() should remain encoded. + * [http://java.sun.com/products/servlet/errata_042700.html] + */ +s-req_uri = ap_escape_uri(r-pool, r-uri); s-is_ssl = JK_FALSE; s-ssl_cert = NULL;
RE: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
I am concerned that the loss of original escaping will break somebody. For instance: r-unparsed_uri = fe%3afi%40fo%3ffum r-uri= fe:fi@fo?fum ap_escape_uri(r-uri) = fe:fi@fo%3ffum Magically authentication information appears in my request to an oddly-named server. Maybe the solution is to choose one of the three at runtime by a mod_jk config option? Keith | -Original Message- | From: David Rees [mailto:[EMAIL PROTECTED]] | Sent: Wednesday, August 15, 2001 1:45 AM | To: [EMAIL PROTECTED] | Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix | | | On Tue, Aug 14, 2001 at 11:49:43PM -0400, Keith Wannamaker wrote: | Try ap_escape_uri | | That does the trick. | | Here's the patch which gets things working again, thanks for all the help. | Hopefully this will get applied soon. Is there any 3.2.4 release planned to | fix the small number of bugs/problems in 3.2.3 (I also recall bumping into | some issues with error documents and getting into infinite loops which were | fixed) | | Thanks, | Dave
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Wed, Aug 15, 2001 at 08:56:45AM -0400, Keith Wannamaker wrote: I am concerned that the loss of original escaping will break somebody. For instance: As Costin pointed out, the escaping of a URI does not change its semantics - they should be treated as identical by anyone who follows the URI spec. Escaping where it wasn't escaped *shouldn't* break anyone. And, the whole question is what does Tomcat see the request as? I could make a case that it should never know about the unparsed_uri, but only the uri that httpd finally resolved to and that mod_jk picked up. -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Wed, 15 Aug 2001, Justin Erenkrantz wrote: On Wed, Aug 15, 2001 at 08:56:45AM -0400, Keith Wannamaker wrote: I am concerned that the loss of original escaping will break somebody. For instance: As Costin pointed out, the escaping of a URI does not change its semantics - they should be treated as identical by anyone who follows the URI spec. Escaping where it wasn't escaped *shouldn't* break anyone. And, the whole question is what does Tomcat see the request as? I could make a case that it should never know about the unparsed_uri, but only the uri that httpd finally resolved to and that mod_jk picked up. -- justin I guess the only choice we can make is if Apache is part of the servlet container ( and most follow its rules ) or not. If it is, then mod_rewrite ( and half of the modules ) just can't be used - they alter the request in a way that's not allowed by the spec. Apache can only forward requests to tomcat, and if it's lucky serve static files ( for apps not using filters or strange mappings ). It can't authenticate ( since the auth model doesn't follow the role based rules ), can't filter ( since Apache2.0 filters are very different from 2.3 filters ). But the bright side - our live is much simpler, we don't have to worry. If we treat apache as a web server, that cooperates with tomcat but can do at least what a proxy is allowed to do by the HTTP spec ( i.e. alter the request, etc ) - then we are fine ( except the life is interesting again, and a lot of work to do including this fix ). Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. - Original Message - From: Justin Erenkrantz [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 8:27 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, Aug 15, 2001 at 08:56:45AM -0400, Keith Wannamaker wrote: I am concerned that the loss of original escaping will break somebody. For instance: As Costin pointed out, the escaping of a URI does not change its semantics - they should be treated as identical by anyone who follows the URI spec. Escaping where it wasn't escaped *shouldn't* break anyone. And, the whole question is what does Tomcat see the request as? I could make a case that it should never know about the unparsed_uri, but only the uri that httpd finally resolved to and that mod_jk picked up. -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
Fw: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
- Original Message - From: Bill Barker [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 12:15 PM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix It is actually worse than that. TC3.3B1 (with the mod_jk that it ships with, I haven't tried j-t-c yet) gives a directory listing in response to: http://myserver/%3f%41%3d%42.jsp - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 11:44 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Wed, Aug 15, 2001 at 08:58:00AM -0700, [EMAIL PROTECTED] wrote: And, the whole question is what does Tomcat see the request as? I could make a case that it should never know about the unparsed_uri, but only the uri that httpd finally resolved to and that mod_jk picked up. -- justin If we treat apache as a web server, that cooperates with tomcat but can do at least what a proxy is allowed to do by the HTTP spec ( i.e. alter the request, etc ) - then we are fine ( except the life is interesting again, and a lot of work to do including this fix ). This is the way I expect it to behave, but as Keith pointed out, it may be useful to have this as a configuration option. -Dave
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Wed, 15 Aug 2001, Bill Barker wrote: It is actually worse than that. TC3.3B1 (with the mod_jk that it ships with, I haven't tried j-t-c yet) gives a directory listing in response to: http://myserver/%3f%41%3d%42.jsp If I translate this corectly, your request is http://myserver/?a=b.jsp This is treated as a request for /, with parameters ( that are ignored since it's a static page ). Hmm, it should return a redirect or index.html if exists. Tomcat standalone is ok, it responds 404 ( and it does so because it corectly does a single decoding _after_ separating the URI in components, as required by URI spec ). For mod_jk, it's a bit tricky. I assume you configured apache to handle the static requests ? Can you make sure you have a index.html page ? If you see a dir listing, can you tell me who's generating it ( tomcat adds the version number at bottom ) Thanks, Costin - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 11:44 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Apache2.0 + mod_jk + JNI + tc3.3 gives me the correct answer, 404 ( with the correct URI - /?A=B.jsp ). Note that typing the unencoded version is returning the correct answer too, i.e. index.html. What version of apache are you using ? Costin On Wed, 15 Aug 2001, Bill Barker wrote: It is actually worse than that. TC3.3B1 (with the mod_jk that it ships with, I haven't tried j-t-c yet) gives a directory listing in response to: http://myserver/%3f%41%3d%42.jsp - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 11:44 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Actually, I have an index.jsp file. According to the logs (I haven't turned up the logging level yet, so the information in mininal), I get: Ctx() : Compiling: /?A=B.jsp to _0003fA_0003dB_0 The corresponding .java file just prints static HTML with a base href=file://localhost/path/to/ROOT/h1/path/to/ROOT/h1 followed by lines like: img align=middle src=doc:/lib/images/ftp/file.gif width=32 height=32a href=index.jspindex.jsp/abr - Original Message - From: [EMAIL PROTECTED] To: Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 12:59 PM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: It is actually worse than that. TC3.3B1 (with the mod_jk that it ships with, I haven't tried j-t-c yet) gives a directory listing in response to: http://myserver/%3f%41%3d%42.jsp If I translate this corectly, your request is http://myserver/?a=b.jsp This is treated as a request for /, with parameters ( that are ignored since it's a static page ). Hmm, it should return a redirect or index.html if exists. Tomcat standalone is ok, it responds 404 ( and it does so because it corectly does a single decoding _after_ separating the URI in components, as required by URI spec ). For mod_jk, it's a bit tricky. I assume you configured apache to handle the static requests ? Can you make sure you have a index.html page ? If you see a dir listing, can you tell me who's generating it ( tomcat adds the version number at bottom ) Thanks, Costin - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 11:44 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
1.3.17 (with negotiation_module removed to prevent that problem). - Original Message - From: [EMAIL PROTECTED] To: Bill Barker [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 1:01 PM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix Apache2.0 + mod_jk + JNI + tc3.3 gives me the correct answer, 404 ( with the correct URI - /?A=B.jsp ). Note that typing the unencoded version is returning the correct answer too, i.e. index.html. What version of apache are you using ? Costin On Wed, 15 Aug 2001, Bill Barker wrote: It is actually worse than that. TC3.3B1 (with the mod_jk that it ships with, I haven't tried j-t-c yet) gives a directory listing in response to: http://myserver/%3f%41%3d%42.jsp - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Bill Barker [EMAIL PROTECTED] Sent: Wednesday, August 15, 2001 11:44 AM Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix On Wed, 15 Aug 2001, Bill Barker wrote: Personally, I agree with Justin and Costin that mod_jk should be able to use the uri field. Having said that, I'd like to point out that the mod_jk.c in j-t-c is flat-out broken. It doesn't handle the case where the '?' itself is encoded. Since this case is part of a currently popular attack on IIS, it will show up. Interesting finding. However tomcat decoder should be able to do so - if it doesn't we must fix it. Can you check against 3.3beta1 ? As a note, IMHO it is perfectly legal to have an encoded '?' in the URI, and the behavior should be: the '?' will be decoded _after_ the URI is separated from query string, and it's used as part of the file name. AFAIK there is no reason a file ( or pathInfo ) can't have the '?' char inside, and the URI spec allow that. ( of course, paranoia may force us to remove this kind of behavior ). Costin
[TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Hi, I came across the need to use mod_rewrite to rewrite some URLs I was sending to Tomcat. After playing with it a bit (I had it working a while ago) and finding that Tomcat was not receiving the rewritten URLs no matter what I did, I took a look at the source to native/apache1.3/mod_jk.c. Not being much of an Apache hacker, the variables were descriptive enough to tell me to make this change to the file: --- mod_jk.c.orig Tue Aug 14 17:58:21 2001 +++ mod_jk.cTue Aug 14 18:04:58 2001 @@ -358,7 +358,7 @@ s-method = (char *)r-method; s-content_length = get_content_length(r); s-query_string = r-args; -s-req_uri = r-unparsed_uri; +s-req_uri = r-uri; if (s-req_uri != NULL) { char *query_str = strchr(s-req_uri, '?'); if (query_str != NULL) { After this change my URLs were getting rewritten as expected again. Can we apply this change to the tree if there's nothing wrong with it for the next release? This problem has affected a large number of users, just take a look at the tomcat-dev/user archives. It seems that this change was made to satisfy the errata at http://java.sun.com/products/servlet/errata_042700.html, but is it the correct fix if we're intentionally munging the request? Thanks, Dave
RE: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Hi David, Unfortunately there are people who were breaking because we didn't follow the spec. The better way to fix it is to create an inverse function for ap_parse_uri(request_rec *r, const char *uri) [http_protocol.c] in mod_jk... one that would 'unparse' the munged r-uri rewrite and use it instead of r-unparsed_uri. Keith | -Original Message- | From: David Rees [mailto:[EMAIL PROTECTED]] | Sent: Tuesday, August 14, 2001 9:13 PM | To: [EMAIL PROTECTED] | Subject: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix | | | Hi, | | I came across the need to use mod_rewrite to rewrite some URLs I was sending | to Tomcat. | | After playing with it a bit (I had it working a while ago) and finding that | Tomcat was not receiving the rewritten URLs no matter what I did, I took a | look at the source to native/apache1.3/mod_jk.c. Not being much of an | Apache hacker, the variables were descriptive enough to tell me to make this | change to the file: | | --- mod_jk.c.orig Tue Aug 14 17:58:21 2001 | +++ mod_jk.cTue Aug 14 18:04:58 2001 | @@ -358,7 +358,7 @@ | s-method = (char *)r-method; | s-content_length = get_content_length(r); | s-query_string = r-args; | -s-req_uri = r-unparsed_uri; | +s-req_uri = r-uri; | if (s-req_uri != NULL) { | char *query_str = strchr(s-req_uri, '?'); | if (query_str != NULL) { | | After this change my URLs were getting rewritten as expected again. | | Can we apply this change to the tree if there's nothing wrong with it for | the next release? This problem has affected a large number of users, just | take a look at the tomcat-dev/user archives. | | It seems that this change was made to satisfy the errata at | http://java.sun.com/products/servlet/errata_042700.html, but is it the | correct fix if we're intentionally munging the request? | | Thanks, | Dave
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 06:13:24PM -0700, David Rees wrote: --- mod_jk.c.orig Tue Aug 14 17:58:21 2001 +++ mod_jk.cTue Aug 14 18:04:58 2001 @@ -358,7 +358,7 @@ s-method = (char *)r-method; s-content_length = get_content_length(r); s-query_string = r-args; -s-req_uri = r-unparsed_uri; +s-req_uri = r-uri; if (s-req_uri != NULL) { char *query_str = strchr(s-req_uri, '?'); if (query_str != NULL) { After this change my URLs were getting rewritten as expected again. Can we apply this change to the tree if there's nothing wrong with it for the next release? This problem has affected a large number of users, just take a look at the tomcat-dev/user archives. This breaks query strings. r-uri contains only the path portion of the URL. r-unparsed_uri contains the URL in its virgin format - as sent by the client. You can see that mod_jk is looking for the query string (look at the strchr two lines down) - it won't be there in the r-uri. You now need to modify mod_jk to look at r-args. But, if you need access to the encoded URI (which is what the comment above that line in the j-t-c version of mod_jk seems to indicate), the only way to do it in httpd is to do with unparsed_uri. All of the other parameters (i.e. r-uri) have been escaped already. I'm not sure what the solution is. But, this one kills off query strings to servlets. That's even worse than losing internal rewrite capabilities. I wonder how Pier is addressing this in mod_webapp. I'll have to look. -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 10:20:26PM -0400, Keith Wannamaker wrote: Unfortunately there are people who were breaking because we didn't follow the spec. The better way to fix it is to create an inverse function for ap_parse_uri(request_rec *r, const char *uri) [http_protocol.c] in mod_jk... one that would 'unparse' the munged r-uri rewrite and use it instead of r-unparsed_uri. Hi, OK, are you volunteering to write it? ;-) If not, I'll have to take a look when I get some time and see if I can figure it out. As an aside, it appears that Tomcat 3.3 remains broken in this regard, as it uses r-uri instead of r-unparsed_uri. -Dave
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 10:20:26PM -0400, Keith Wannamaker wrote: Hi David, Unfortunately there are people who were breaking because we didn't follow the spec. The better way to fix it is to create an inverse function for ap_parse_uri(request_rec *r, const char *uri) [http_protocol.c] in mod_jk... one that would 'unparse' the munged r-uri rewrite and use it instead of r-unparsed_uri. You *could* just call ap_escape_uri and try to recreate the relevant pieces. Rough pseudocode: t1 = ap_escape_uri(r-uri) t2 = ap_escape_uri(r-args) mod_jk's-uri = strcat(r-uri, ?, r-args, NULL) The root problem is that r-unparsed_uri and r-uri may not be identical in their context. If you are using mod_rewrite, you could have: r-unparsed_uri=/foo.jsp?bar=baz r-uri=/spaz.jsp r-args=bar=baz But, now you may have escaped something that wasn't originally escaped. That may be bad as well. -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 07:25:32PM -0700, David Rees wrote: On Tue, Aug 14, 2001 at 10:20:26PM -0400, Keith Wannamaker wrote: Unfortunately there are people who were breaking because we didn't follow the spec. The better way to fix it is to create an inverse function for ap_parse_uri(request_rec *r, const char *uri) [http_protocol.c] in mod_jk... one that would 'unparse' the munged r-uri rewrite and use it instead of r-unparsed_uri. Hi, OK, are you volunteering to write it? ;-) If not, I'll have to take a look when I get some time and see if I can figure it out. As an aside, it appears that Tomcat 3.3 remains broken in this regard, as it uses r-uri instead of r-unparsed_uri. My bad. It is actually easier than I just said - s-req_uri isn't the complete unparsed URI - just the path. I didn't look high enough in mod_jk.c. The version in j-t-c for apache-1.3 has: s-query_string = r-args; /* * The 2.2 servlet spec errata says the uri from * HttpServletRequest.getRequestURI() should remain encoded. * [http://java.sun.com/products/servlet/errata_042700.html] */ s-req_uri = r-unparsed_uri; if (s-req_uri != NULL) { char *query_str = strchr(s-req_uri, '?'); if (query_str != NULL) { *query_str = 0; } } That strchr call is trying to remove the query string (dicking with the unparsed_uri like that is a BAD idea - imagine logs looking at the unparsed_uri). You could just have: s-query_string = r-args; /* * The 2.2 servlet spec errata says the uri from * HttpServletRequest.getRequestURI() should remain encoded. * [http://java.sun.com/products/servlet/errata_042700.html] */ s-req_uri = ap_encode_uri(r-pool, r-uri); That seems like it'd satisfy everyone. -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Justin Erenkrantz at [EMAIL PROTECTED] wrote: I wonder how Pier is addressing this in mod_webapp. I'll have to look. -- justin Easy as 1.2.3... WARP has a concept of URI and QUERY STRING... Very separate things... All I do is req-ruri=apr_pstrdup(req-pool,r-uri); req-args=apr_pstrdup(req-pool,r-args); The URI goes into the URI, the query string goes into the query string... Apache does it for me, why should I bother? :) Pier
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Wed, Aug 15, 2001 at 03:41:30AM +0100, Pier P. Fumagalli wrote: Justin Erenkrantz at [EMAIL PROTECTED] wrote: I wonder how Pier is addressing this in mod_webapp. I'll have to look. -- justin Easy as 1.2.3... WARP has a concept of URI and QUERY STRING... Very separate things... All I do is req-ruri=apr_pstrdup(req-pool,r-uri); req-args=apr_pstrdup(req-pool,r-args); The URI goes into the URI, the query string goes into the query string... Apache does it for me, why should I bother? :) Which, of course, is the right solution. But, do you have to (re)escape the uri (or, is that done in Java land?)? Seems like the 2.2 spec says that the getRequestURI() function must return an escaped URI. r-uri is unescaped. Or, does 2.3 say something different? -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Justin Erenkrantz at [EMAIL PROTECTED] wrote: On Wed, Aug 15, 2001 at 03:41:30AM +0100, Pier P. Fumagalli wrote: Justin Erenkrantz at [EMAIL PROTECTED] wrote: I wonder how Pier is addressing this in mod_webapp. I'll have to look. -- justin Easy as 1.2.3... WARP has a concept of URI and QUERY STRING... Very separate things... All I do is req-ruri=apr_pstrdup(req-pool,r-uri); req-args=apr_pstrdup(req-pool,r-args); The URI goes into the URI, the query string goes into the query string... Apache does it for me, why should I bother? :) Which, of course, is the right solution. DOH! :) Am I lucky or what :) :) :) But, do you have to (re)escape the uri (or, is that done in Java land?)? Seems like the 2.2 spec says that the getRequestURI() function must return an escaped URI. r-uri is unescaped. Or, does 2.3 say something different? -- justin It's done in Java land (well, in theory! :) I should really check, that might be one hit of performance improvement (like 1 millisecond per request). Ok, get over it Pier, performance is after the beta :) Pier (love talking to himself -who, me?- at 4 AM :)
RE: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
| This breaks query strings. | | r-uri contains only the path portion of the URL. r-unparsed_uri | contains the URL in its virgin format - as sent by the client. No, I don't believe this is quite right. getRequestURI() in a servlet should return r-unparsed_uri minus a query string. Setting s-uri = r-uri doesn't break query strings.. but it *does* break the encoding of the uri. So tc 3.3 is currently broken as is mod_webapp (unless the string is encoded on the java side in TC4). However, Justin, I think your suggestion is the correct solution: s-req_uri = ap_encode_uri(r-pool, r-uri); David, or anyone else interested too, would you try this with some corner test cases and see if it lives up to expectation? Keith
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, 14 Aug 2001, Justin Erenkrantz wrote: Which, of course, is the right solution. Is it ? Re-escaping the URI will most likely generate something very different from the original, it's not symetrical. Getting a re-escaped request is different from the original, unescaped uri. That's the reason we use the unescaped uri... Costin
RE: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Costin's right.. seems like the problem encountered was that there was no way to recreate the encoding (or lack thereof) on the original uri. So the kludge/solution was to use the unparsed uri and chop off the query string. Keith | -Original Message- | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] | Sent: Tuesday, August 14, 2001 11:13 PM | To: [EMAIL PROTECTED] | Subject: Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix | | | On Tue, 14 Aug 2001, Justin Erenkrantz wrote: | | Which, of course, is the right solution. | | Is it ? Re-escaping the URI will most likely generate something very | different from the original, it's not symetrical. Getting a re-escaped | request is different from the original, unescaped uri. That's the reason | we use the unescaped uri... | | Costin |
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
You could just have: s-query_string = r-args; /* * The 2.2 servlet spec errata says the uri from * HttpServletRequest.getRequestURI() should remain encoded. * [http://java.sun.com/products/servlet/errata_042700.html] */ s-req_uri = ap_encode_uri(r-pool, r-uri); Sounds like a reasonable solution. Costin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 08:12:31PM -0700, [EMAIL PROTECTED] wrote: On Tue, 14 Aug 2001, Justin Erenkrantz wrote: Which, of course, is the right solution. Is it ? Re-escaping the URI will most likely generate something very different from the original, it's not symetrical. Getting a re-escaped request is different from the original, unescaped uri. That's the reason we use the unescaped uri... Potentially, you are correct. It may not be symmetrical. However, httpd may jump in and rewrite the uri for you. If that is a problem (which is what the original poster was complaining about), then you need to use r-uri instead and escape it. Unless you want to only pass the original string NOT what the server is serving. I'm not sure what the Servlet spec says - use the original string that the client passed in, or use the real URI. I'm out of my depth here. *shrug* -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 11:13:34PM -0400, Keith Wannamaker wrote: Costin's right.. seems like the problem encountered was that there was no way to recreate the encoding (or lack thereof) on the original uri. So the kludge/solution was to use the unparsed uri and chop off the query string. mod_jk chops off the r-unparsed_uri itself without copying. Negative points for style. =-) -- justin
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, 14 Aug 2001, Justin Erenkrantz wrote: On Wed, Aug 15, 2001 at 03:41:30AM +0100, Pier P. Fumagalli wrote: Justin Erenkrantz at [EMAIL PROTECTED] wrote: I wonder how Pier is addressing this in mod_webapp. I'll have to look. -- justin Easy as 1.2.3... WARP has a concept of URI and QUERY STRING... Very separate things... All I do is req-ruri=apr_pstrdup(req-pool,r-uri); req-args=apr_pstrdup(req-pool,r-args); The URI goes into the URI, the query string goes into the query string... Apache does it for me, why should I bother? :) Which, of course, is the right solution. But, do you have to (re)escape the uri (or, is that done in Java land?)? Seems like the 2.2 spec says that the getRequestURI() function must return an escaped URI. r-uri is unescaped. Or, does 2.3 say something different? -- justin The getRequestURI() method is supposed to return the *undecoded* request URI. As Costin points out, re-escaping an escaped version is not the same thing. This didn't change in 2.3 -- however, in 2.2. it wasn't formally documented until an errata was published: http://java.sun.com/products/servlet/errata_042700.html Same thing for getQueryString() -- must remain undecoded. Craig
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, Aug 14, 2001 at 11:05:38PM -0400, Keith Wannamaker wrote: | This breaks query strings. | | r-uri contains only the path portion of the URL. r-unparsed_uri | contains the URL in its virgin format - as sent by the client. No, I don't believe this is quite right. getRequestURI() in a servlet should return r-unparsed_uri minus a query string. Setting s-uri = r-uri doesn't break query strings.. but it *does* break the encoding of the uri. So tc 3.3 is currently broken as is mod_webapp (unless the string is encoded on the java side in TC4). However, Justin, I think your suggestion is the correct solution: s-req_uri = ap_encode_uri(r-pool, r-uri); David, or anyone else interested too, would you try this with some corner test cases and see if it lives up to expectation? I gave it a shot and it compiled fine, but got this error at runtime: Cannot load /usr/local/apache/libexec/mod_jk.so into server: /usr/local/apache/libexec/mod_jk.so: undefined symbol: ap_encode_uri Any hints? I'm new at Apache module hacking. -Dave
RE: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
Try ap_escape_uri Keith | | s-req_uri = ap_encode_uri(r-pool, r-uri); | | David, or anyone else interested too, would you | try this with some corner test cases and see if | it lives up to expectation? | | I gave it a shot and it compiled fine, but got this error at runtime: | | Cannot load /usr/local/apache/libexec/mod_jk.so into server: | /usr/local/apache/libexec/mod_jk.so: undefined symbol: ap_encode_uri | | Any hints? I'm new at Apache module hacking. | | -Dave
Re: [TC3.2.3][PATCH] mod_jk / mod_rewrite bug fix
On Tue, 14 Aug 2001, Justin Erenkrantz wrote: mod_jk chops off the r-unparsed_uri itself without copying. Negative points for style. =-) -- justin That's true. However I'm not sure what else could we do - copy it once again to another buffer where we chop it ? It's not very much going on with the unparsed uri. If you strictly follow the spec, mod_rewrite is out of question - and same for most other apache modules that alter the request. Since all of them are working on the URI, the result is just something that has no unmodified orginal. However, if you read the URI spec, 2 URIs are equivalent if the octets are identical - it doesn't matter how you encode it. Re-escaping the URI has the extra benefit of getting a canonical escaping, which is also a bit safer ( hey, we also get the first class security checks apache is doing on the parsed uris ). Another note - my understanding of the HTTP specification is that proxies _are_ allowed to escape/unescape the URI - as long as the result is equivalent. So if a proxy is used, the original URI the user typed will be lost. Same for the browsers - what the user types is very different from what is sent ( at least in Opera ). Of course, we can define unparsed URI to be whatver the servlet container receives. This may be different from the original request ( if it goes through proxies ). Now the question is - where does the container starts :=). I think there are plenty of reasons to treat the Apache as not beeing part of the container - after all it follows completely different rules on mappings ( extension mapps can have path info), and in almost everything. In fact, I'm not sure all web servers even allow access to the original unescaped URI. Some IIS or NES expert should let us know. So my take is that the container should indeed return the original URI - that the container received. What apache does ( like rewriting, or canonicalise the URI ) is separate. Otherwise - the rewriting itself would violate the servlet spec, since it would alter the URI. Again - I would bet that at least one of IIS and NES doesn't allow access to original URI anyway. Costin P.S. Quite a long mail for something as simple as 1-2-3, I spend quite a lot of time with this issue - Larry may remember how long the bug was open and with my name on it.