Maxim and Chris, LOL! Yes, I am definitely an ignorant westerner. Thanks for pointing out my mistake. I'll be updating my unit test comments accordingly.
-James On Fri, Jan 24, 2025, 9:57 PM Maxim Solodovnik <solomax...@gmail.com> wrote: > Chris, > > from mobile (sorry for typos ;) > > > On Sat, Jan 25, 2025, 05:35 Christopher Schultz < > ch...@christopherschultz.net> wrote: > > > Maxim, > > > > On 1/23/25 9:31 PM, Maxim Solodovnik wrote: > > > from mobile (sorry for typos ;) > > > > > > > > > On Thu, Jan 23, 2025, 23:00 James Matlik <james.mat...@gmail.com> > wrote: > > > > > >> It works! > > >> > > >> I've been able to test with a customer name of: ÀËÌÑàëíñøü / \ Ѐӿ 中さ > 😀 > > >> customer > > >> This covers > > >> - Latin-1 Supplement characters are 1 byte: ÀËÌÑàëíñøü > > >> - The / and \ slash characters are ASCII, but are encoded due to > > special > > >> meaning in a URL > > >> - Cyrillic characters are 2 bytes: Ѐӿ > > >> > > > > > > These two are not Cyrillic :)) > > > > Unicode says the lowercase version of Ѐ[1] is Cyrillic[2], and we are > > ignorant westerners :) > > > > Also kha seems definitely to be Cyrillic[3]. > > > > My bad :) > These two are not in russian, ukrainian or bulgarian > > > > > These are: Вау :)) > > > > LOL > > > > More extraordinary are: ЩЖЦЪ > I guess :) > > > > -chris > > > > [1] https://www.compart.com/en/unicode/U+00C8 > > [2] https://www.compart.com/en/unicode/U+0450 > > [3] https://www.compart.com/en/unicode/U+04FF > > > > > > > > - Chinese and Japanese characters are 3 bytes: 中さ > > >> - emoji characters are 4 bytes: 😀 > > >> > > >> A client can request a URL path with the following: > > >> > > >> > > > "/customers/customer/%C3%80%C3%8B%C3%8C%C3%91%C3%A0%C3%AB%C3%AD%C3%B1%C3%B8%C3%BC%20%2F%20%5C%20%D0%80%D3%BF%20%E4%B8%AD%E3%81%95%20%F0%9F%98%80%20customer" > > >> > > >> Then Tomcat processes the URL to the following and passes it into the > > >> servlet. > > >> "/customers/customer/ÀËÌÑàëíñøü %2F %5C Ѐӿ 中さ 😀 customer" > > >> > > >> So we are now able to save this test customer to the database and read > > it > > >> back out again. I tried this with a few other paths that are a little > > >> structurally different (with/without query string, escaped string in a > > >> middle path segment instead of just the end) and it all worked as > > desired. > > >> > > >> Thank you so much for working on this! > > >> -James > > >> > > >> On Thu, Jan 23, 2025 at 8:20 AM Mark Thomas <ma...@apache.org> wrote: > > >> > > >>> James, > > >>> > > >>> I've added attributes (encodedReverseSolidusHandling and > > >>> encodedSolidusHandling) to the Context to provide control of how the > > >>> RequestDispatcher paths are processed. > > >>> > > >>> Snapshots built after 12.00 UTC today should include the new > > attributes. > > >>> > > >>> As I type, Tomcat 12 is available, Tomcat 11 is building and Tomcat > 10 > > >>> and Tomcat 9 are in the queue. All should be complete in a couple of > > >> hours. > > >>> > > >>> Mark > > >>> > > >>> > > >>> On 22/01/2025 09:30, Mark Thomas wrote: > > >>>> This is going to be fun. > > >>>> > > >>>> The RequestDispatcher processing currently does not take account of > > >>>> encodedSolidusHandling or encodedReverseSolidusHandling. > > >>>> RequestDsipatcher URLs are processed using the defaults. > > >>>> > > >>>> I have a test case that demonstrates various forms of this problem. > I > > >> am > > >>>> currently working on a fix. > > >>>> > > >>>> Mark > > >>>> > > >>>> > > >>>> On 21/01/2025 23:10, James Matlik wrote: > > >>>>> It got further, but ran into another security check within the > > >>>>> org.apache.catalina.core.ApplicationContext. I'm using the 10.1.35- > > >>>>> SNAPSHOT > > >>>>> version as that is closest to my current version, and easiest to > test > > >>>>> as a > > >>>>> result. This is the block of code having trouble now (linking into > > >>>>> 10.1.34): > > >>>>> https://github.com/apache/tomcat/blob/10.1.34/java/org/apache/ > > >>>>> catalina/core/ApplicationContext.java#L386-L396 > > >>>>> > > >>>>> This code: > > >>>>> 1. takes the URL (e.g. /customers/customer/A%2FB%5CC) > > >>>>> 2. decodes it (e.g. /customers/customer/A/B\C) > > >>>>> 3. normalizes the decoded url with the replaceBackSlash parameter > > >>>>> hardcoded > > >>>>> to true, thus forcing the decoded '\' character to become a '/' > > >>> character > > >>>>> (e.g. /customers/customer/A/B/C) > > >>>>> 4. then it compares the decoded URL to the normalized URL, and if > > they > > >>>>> are > > >>>>> not equal, an IllegalArgumentException is created and a 500 > response > > >> is > > >>>>> returned. > > >>>>> > > >>>>> It seems that this area of the code would need to be aware of the > > >>>>> encodedReverseSolidusHandling configuration as well. > > >>>>> > > >>>>> Thank you, > > >>>>> James > > >>>>> > > >>>>> On Tue, Jan 21, 2025 at 1:20 PM Mark Thomas <ma...@apache.org> > > wrote: > > >>>>> > > >>>>>> On 21/01/2025 14:15, James Matlik wrote: > > >>>>>>> Hello Mark, > > >>>>>>> > > >>>>>>> Yes, I would be available to test a snapshot if this gets > > >> implemented. > > >>>>>> > > >>>>>> Latest (as I type this) Tomcat 12 build with the new feature: > > >>>>>> > > >>>>>> > https://repository.apache.org/content/groups/snapshots/org/apache/ > > >>>>>> tomcat/tomcat/12.0.0-M1-SNAPSHOT/tomcat-12.0.0- > > >>>>>> M1-20250121.171115-315.tar.gz > > >>>>>> > > >>>>>> The snapshots for the 11.0.x, 10.1.x and 9.0.x are still building. > > >>>>>> Generally, you'll find them under: > > >>>>>> > > >>>>>> > > >>>>>> > https://repository.apache.org/content/groups/snapshots/org/apache/ > > >>>>>> tomcat/tomcat/ > > >>>>>> > > >>>>>> Anything built after 1700 UTC today should have the fix. Make sure > > >> you > > >>>>>> look for the snapshot for the current dev version for each release > > >>>>>> branch. > > >>>>>> > > >>>>>> The Connector attribute is called encodedReverseSolidusHandling > > >>>>>> > > >>>>>> Let the list know how you get on. > > >>>>>> > > >>>>>> Mark > > >>>>>> > > >>>>>> > > >>>>>>> > > >>>>>>> -James > > >>>>>>> > > >>>>>>> On Tue, Jan 21, 2025 at 8:17 AM Mark Thomas <ma...@apache.org> > > >> wrote: > > >>>>>>> > > >>>>>>>> On 18/01/2025 16:18, James Matlik wrote: > > >>>>>>>>> I agree with everything you have said. As the config options > > stand > > >>>>>> today, > > >>>>>>>>> the allowBackslash seems to implement part of > > >> encodeSolidusHandling. > > >>>>>>>>> > > >>>>>>>>> While encodeSolidusHandling supports: > > >>>>>>>>> * REJECT - Return 400 on encoded / > > >>>>>>>>> * DECODE - Decodes the / > > >>>>>>>>> * PASS_THROUGH - Passes the encoded / as is > > >>>>>>>>> > > >>>>>>>>> The allowBackslash supports two mostly analogous settings: > > >>>>>>>>> * false - REJECT > > >>>>>>>>> * true - DECODE, but it Decodes as / instead of \ > > >>>>>>>> > > >>>>>>>> Not quite. Decoding is separate. > > >>>>>>>> > > >>>>>>>> Currently, %5C always gets decoded to \. allowBackslash then > > >>>>>>>> determines > > >>>>>>>> whether any \ (originally encoded or not) characters in the URL > > >>>>>>>> will be > > >>>>>>>> permitted and converted to / or the URL rejected. > > >>>>>>>> > > >>>>>>>> The decoding and conversion are partially independent. For > > example, > > >>>>>>>> while it may be required that an encoded sequence %5C passed > > >>>>>>>> through to > > >>>>>>>> the application, whether an unencoded \ is rejected or converted > > >>>>>>>> to / is > > >>>>>>>> an independent choice. > > >>>>>>>> > > >>>>>>>> From a security point of view, it is necessary to always > > convert > > >>>>>>>> \ to > > >>>>>> / > > >>>>>>>> - at least on Windows anyway - else all sorts of nasty security > > >>> things > > >>>>>>>> could happen such as bypassing security constraints. > > >>>>>>>> > > >>>>>>>> We should be able to fix this for the next round of releases > > (early > > >>>>>>>> Feb). I am currently looking at whether a new option to > > effectively > > >>>>>>>> replace allowBackslash is sufficient or whether an new option is > > >>>>>>>> needed > > >>>>>>>> in addition to allowBackslash. > > >>>>>>>> > > >>>>>>>> My current thinking is that we should add > > >>> encodeReverseSolidusHandling > > >>>>>>>> with the same options as encodeSolidusHandling and leave the > > >>>>>>>> convert/reject decision for unencoded \ to allowBackslash. > > >>>>>>>> > > >>>>>>>>> IMO, an ideal implementation would introduce > > >>>>>> encodeReverseSolidusHandling > > >>>>>>>>> to behave the same as encodeSolidusHandling, but maybe add a > > >>>>>>>>> DECODE_AS_SOLIDUS. Then the allowBackslash setter would set the > > >>>>>>>>> encodeReverseSolidusHandling variable to DECODE_AS_SOLIDUS or > > >> REJECT > > >>>>>>>> (true > > >>>>>>>>> vs false) for backward compatibility, but should probably be > > >>>>>> deprecated. > > >>>>>>>>> The code that normalizes the URL can remove any special > handling > > >> of > > >>>>>> the \ > > >>>>>>>>> character because all enforcement can be applied within > UDecoder > > >>>>>>>>> as is > > >>>>>>>> done > > >>>>>>>>> with the / character today. > > >>>>>>>> > > >>>>>>>> I don't think DECODE_AS_SOLIDUS is necessary although it could > > >>>>>>>> always be > > >>>>>>>> added as an additional option later if necessary. > > >>>>>>>> > > >>>>>>>>> This would make the handling of \ more consistent with /, and I > > >>>>>>>>> imagine > > >>>>>>>> the > > >>>>>>>>> security concerns between the two remain about equal. Though, > > like > > >>>>>>>>> you > > >>>>>>>>> said, others on the team have already thought deeply about > this, > > >> so > > >>> I > > >>>>>> may > > >>>>>>>>> be missing something important. > > >>>>>>>>> > > >>>>>>>>> If I were to pursue getting a change like this made to Tomcat, > > >> where > > >>>>>>>> should > > >>>>>>>>> I start? > > >>>>>>>> > > >>>>>>>> This is the way to start the process. A discussion on the users > > >> list > > >>>>>>>> that clearly explains the problem and what a solution might look > > >>> like. > > >>>>>>>> > > >>>>>>>>> I know this would be a slow process. > > >>>>>>>> > > >>>>>>>> There is a good reason for the feature, the implementation looks > > to > > >>> be > > >>>>>>>> backwards compatible and not too invasive, there isn't simple > > >>>>>>>> extension > > >>>>>>>> point you can use to implement and one or more committers are > > >>>>>>>> interested > > >>>>>>>> in solving the problem then there is a good chance it will be in > > >> the > > >>>>>>>> next round of releases (early Feb). > > >>>>>>>> > > >>>>>>>>> Or, if I'm better off working around the core functionality, > > would > > >>>>>>>>> you > > >>>>>>>> have > > >>>>>>>>> any suggestions on how? I see the UDecoder recently changed to > > >>>>>>>>> support > > >>>>>>>>> encoded % characters. I considered using a double encoded \ > hack > > >> to > > >>>>>>>>> effectively pass through, but that won't work now. Plus, I > wasn't > > >>>>>>>>> able > > >>>>>> to > > >>>>>>>>> figure out how to add custom logic before the URL decode and > > >>>>>>>>> normalize > > >>>>>> to > > >>>>>>>>> add my work around encoding on the server side, as doing so in > > the > > >>>>>> client > > >>>>>>>>> side isn't feasible. > > >>>>>>>>> > > >>>>>>>>> Ideally, I wouldn't need to maintain a custom build of Tomcat > > >>>>>>>> indefinitely. > > >>>>>>>> > > >>>>>>>> There isn't an easy (or any) extension point to implement this. > It > > >>>>>>>> would > > >>>>>>>> have to be a custom Tomcat build. > > >>>>>>>> > > >>>>>>>> Are you able to test some snapshot builds if this gets > > implemented? > > >>>>>>>> > > >>>>>>>> Mark > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Thanks, James > > >>>>>>>>> > > >>>>>>>>> On Fri, Jan 17, 2025, 10:00 AM Christopher Schultz < > > >>>>>>>>> ch...@christopherschultz.net> wrote: > > >>>>>>>>> > > >>>>>>>>>> James, > > >>>>>>>>>> > > >>>>>>>>>> On 1/17/25 8:04 AM, James Matlik wrote: > > >>>>>>>>>>> When I'm talking about path parameters, it is in the context > of > > >>> how > > >>>>>>>> Open > > >>>>>>>>>>> API/Swagger defined them: > > >>>>>>>>>>> > > >> https://swagger.io/docs/specification/v3_0/describing-parameters/ > > >>>>>>>>>> > > >>>>>>>>>> Okay, that helps clear things up. In the URL specification > > >>>>>>>>>> (inherited > > >>>>>> by > > >>>>>>>>>> HTTP) defines them differently: > > >>>>>>>>>> > > >>>>>>>>>> https://www.rfc-editor.org/rfc/rfc3986.html#section-3.3 > > >>>>>>>>>> > > >>>>>>>>>> The term "path parameter" you are using is actually a "path > > >>> segment" > > >>>>>> in > > >>>>>>>>>> URL/HTTP terms. > > >>>>>>>>>> > > >>>>>>>>>> The problem you are running into is that encoded slashes are > > >> often > > >>>>>> used > > >>>>>>>>>> to attack servers by performing path-traversals or avoiding > > >>> security > > >>>>>>>>>> constraints. Let's say you are on Windows and there is a > > security > > >>>>>>>>>> constraint that says "only admin users can access files in > > >>>>>>>>>> D:\docroot\(ON\QC)". You have mapped URL-/ to > > >> physical-D:\docroot. > > >>>>>>>>>> (Admittedly this is very contrived). > > >>>>>>>>>> > > >>>>>>>>>> The attacker says "well, %29 is a backslash, so lemmie just > > check > > >>> to > > >>>>>> see > > >>>>>>>>>> if I can slip one past the security system" and requests > > >>>>>>>>>> http://localhost:8080/group/%28ON%5CQC%29%20LOCAL. If the > > server > > >>> is > > >>>>>>>>>> fooled, the attacker gets to request that resource in that > > >>>>>>>>>> directory. > > >>>>>>>>>> > > >>>>>>>>>> My guess is that Tomcat's "URL cleaner" is seeing that \, > > >>> attempting > > >>>>>> to > > >>>>>>>>>> "sanitize" it, and converting it to a / because that's the > local > > >>>>>>>>>> filesystem path separator. This makes security constraints > much > > >>> more > > >>>>>>>>>> difficult to bypass. > > >>>>>>>>>> > > >>>>>>>>>> My naïve reading of your desire to have > > >>>>>>>>>> encodeSolidusHandling=PASS_THROUGH behave the same for / and \ > > >>> seems > > >>>>>>>>>> reasonable if a misnomer. I say "naïve" because others on this > > >> team > > >>>>>> have > > >>>>>>>>>> spent countless hours of their lives mulling these things over > > >> and > > >>>>>>>>>> getting the logic "just right". > > >>>>>>>>>> > > >>>>>>>>>> I say it's a misnomer because "solidus" means literally / and > > not > > >>> \. > > >>>>>>>>>> encodeReverseSolidusHandling? encodeAntiSolidusHandling? > > >>>>>>>>>> > > >>>>>>>>>> The encodeSolidusHandling configuration setting is there so we > > >> can > > >>>>>> have > > >>>>>>>>>> a very secure default-configuration that prevents attacks on > > >> naive > > >>>>>>>>>> applications. In your case, you have an application that > > >>>>>>>>>> (presumably) > > >>>>>>>>>> knows exactly what it is doing, and therefore you have the > > option > > >>> to > > >>>>>>>>>> pass-through those solidus...es. Solidii? Whatever. > > >>>>>>>>>> > > >>>>>>>>>> I think that actually having a separate configuration setting > > for > > >>>>>>>>>> antisolidii handling makes more sense than having that > existing > > >>>>>> setting > > >>>>>>>>>> perform double-duty, since each one of those you allow to > pass- > > >>>>>>>>>> through > > >>>>>>>>>> opens the door slightly wider to an attacker. > > >>>>>>>>>> > > >>>>>>>>>> -chris > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>> --------------------------------------------------------------------- > > >>>>>>>>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > >>>>>>>>>> For additional commands, e-mail: users-h...@tomcat.apache.org > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >> --------------------------------------------------------------------- > > >>>>>>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > >>>>>>>> For additional commands, e-mail: users-h...@tomcat.apache.org > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > --------------------------------------------------------------------- > > >>>>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > >>>>>> For additional commands, e-mail: users-h...@tomcat.apache.org > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>>> > --------------------------------------------------------------------- > > >>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > >>>> For additional commands, e-mail: users-h...@tomcat.apache.org > > >>>> > > >>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > >>> For additional commands, e-mail: users-h...@tomcat.apache.org > > >>> > > >>> > > >> > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > >