On 13/10/17 18:42, André Warnier (tomcat) wrote: > On 13.10.2017 19:29, Mark Thomas wrote: >> On 13/10/2017 18:15, André Warnier (tomcat) wrote: >>> On 13.10.2017 18:17, Mark Thomas wrote: >>>> On 13/10/2017 17:09, James H. H. Lampert wrote: >>>>> Thanks to all of you who responded. >>>>> >>>>> I found a web page that explains it in ways that I can wrap my >>>>> 55-year-old brain around, and has an easy-to-read reference chart. >>>>> >>>>> https://perishablepress.com/stop-using-unsafe-characters-in-urls/ >>>>> >>>>> Question: the problem first showed up on a web service that takes a >>>>> "bodyless" POST operation, and I assume it also applies to GET >>>>> operations, and to the URL portion of a POST with a body. >>>>> >>>>> But what about the body of a POST? >>>> >>>> From an HTTP specification point of view, anything goes. >>> >>> With respect, I believe that "anything goes" is a bit imprecise here. >> >> Nope. >> >> You can POST anything. You are talking specifically about form data. > > Mmm. You are being a bit casuistic here. (Granted, not that I wasn't.)
Yeah, sorry about that. I tend to read "With respect..." as meaning pretty much exactly the opposite. > In the real world, I would expect that 99% of what is ever POSTed, /is/ > form data. > Not you ? For Tomcat I don't have a clue what the split is but my guess is that is it a lot less than 99% these days. > In >> that case, as I said, the body has to conform to what the component >> processing it expects. > > And that component would be .. ? https://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/tomcat/util/http/Parameters.java?view=annotate > I don't really know, but I would guess that in most webservers, the > component parsing the body of a POST with Content-type = > application/x-www-form-urlencoded, may be the same as the one which is > parsing the query-string of a URI, no ? > Considering the similarity of these two things, it would seem that the > temptation would be hard to resist. Tomcat uses exactly the same code - with a little wrapping to get the data into the same format before it starts. Mark >> And yes, unicode in form data is 'interesting'... >> >> Mark >> >> >>> See e.g. https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4 >>> >>> There are 2 ways for a user agent to send the content of a HTTP POST : >>> 1) with Content-type header = application/x-www-form-urlencoded >>> or >>> 2) with Content-type header = multipart/form-data >>> >>> and while it is true that in the case (2), any submitted key=value pair >>> would be sent separately 'as is', this would not necessarily be so in >>> case (1), because then all key=value pairs would be concatenated into >>> one long string, in which the different key=value pairs would be >>> separated by (unescaped) "&" signs. >>> (Apart from other required encodings, see the page above) >>> So if the client is not a browser, and "composes" itself the POST body >>> before sending it, and sends it with a Content-type (1), it had better >>> encode the individual parameter pairs as described, before concatenating >>> them, because that is what the server would expect. >>> >>> As an additional note, if it so happened that the data in the client >>> could contain Unicode text, do not forget that this is (still) not the >>> standard in HTTP (and URI's, and thus query-string-like things), and >>> make sure that you use the proper method to encode any printable >>> characters which are not purely US-ASCII. Again, browsers generally do >>> this correctly, but custom clients not necessarily. (And a "custom >>> client" in this case, could even be a bit of javascript which is >>> embedded in one of your own pages, but does its own calls to the server >>> on the side). >>> >>> I just recently got bitten by this, even in a quite recent browser, >>> where some javascript function was composing a POST to a server (using >>> type (1) above), and was NOT doing it correctly, even though the page >>> containing and calling this function was itself declared as >>> Unicode/UTF-8. >>> (that was with (and I am too sorely tempted to add "of course" to resist >>> it) some revision of IE-11 - although other revisions of the same >>> browser did not exhibit that same issue). >>> >>> [...] >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >>> For additional commands, e-mail: users-h...@tomcat.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org