Re: distinction between resource charset and format octet decoding

2020-02-06 Thread Garret Wilson
On 2/6/2020 10:44 AM, Mark Thomas wrote: … As of Tomcat 10, conf/web.xml contains the following: UTF-8 UTF-8 That *should* have the effect you are looking for but I confess I haven't tested it in any great detail. Yes! Oh, that is so wonderful. Thank you! I brought this issue up on the

Re: distinction between resource charset and format octet decoding

2020-02-06 Thread Mark Thomas
On 06/02/2020 13:39, Garret Wilson wrote: > On 2/6/2020 10:36 AM, Mark Thomas wrote: >> … Whether Tomcat should ship with this setting present in conf/web.xml by default is something that should probably be discussed for Tomcat 10. Given the current state of the web, there is a

Re: distinction between resource charset and format octet decoding

2020-02-06 Thread Garret Wilson
On 2/6/2020 10:36 AM, Mark Thomas wrote: … Whether Tomcat should ship with this setting present in conf/web.xml by default is something that should probably be discussed for Tomcat 10. Given the current state of the web, there is a reasonable case for doing so. I'll add that to the TOMCAT-NEXT

Re: distinction between resource charset and format octet decoding

2020-02-06 Thread Mark Thomas
On 06/02/2020 13:30, Garret Wilson wrote: > On 1/8/2019 9:57 PM, Mark Thomas wrote: >> … >> >> Yes, this default is now very out-dated. That is a side-effect of: >> … >> As of Servlet 4.0 there is a specification compliant configuration >> option to change this default to any encoding of your

Re: distinction between resource charset and format octet decoding

2020-02-06 Thread Garret Wilson
On 1/8/2019 9:57 PM, Mark Thomas wrote: … Yes, this default is now very out-dated. That is a side-effect of: … As of Servlet 4.0 there is a specification compliant configuration option to change this default to any encoding of your choice. Obviously, UTF-8 is one of the options. You can do

Re: distinction between resource charset and format octet decoding

2019-05-21 Thread Garret Wilson
Sorry to bring up the non-UTF-8 escaped octets form POST problem again, but … On 1/8/2019 3:57 PM, Mark Thomas wrote: … As of Servlet 4.0 there is a specification compliant configuration option to change this default to any encoding of your choice. Obviously, UTF-8 is one of the options. You

Re: distinction between resource charset and format octet decoding

2019-02-01 Thread Mark Thomas
On 01/02/2019 17:58, Garret Wilson wrote: > OK, Mark, I've made my initial edits to the > https://wiki.apache.org/tomcat/FAQ/CharacterEncoding page. _Please check > them over!_ This is my first edit to the wiki. > > That page has a lot of legacy information, some of which had to do with >

Re: distinction between resource charset and format octet decoding

2019-02-01 Thread Garret Wilson
OK, Mark, I've made my initial edits to the https://wiki.apache.org/tomcat/FAQ/CharacterEncoding page. _Please check them over!_ This is my first edit to the wiki. That page has a lot of legacy information, some of which had to do with internal Tomcat stuff, and some of which had to do with

Re: distinction between resource charset and format octet decoding

2019-02-01 Thread Garret Wilson
On 2/1/2019 9:38 AM, Christopher Schultz wrote: Amazing. A close reading of RFC 3986 reveals that there is no clear mandate for UTF-8 in existing URI schemes, even though recommended for new schemes. Anyway, everyone seems to have settled on UTF-8 (Tomcat included), so I'll try to indicate that.

Re: distinction between resource charset and format octet decoding

2019-02-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Garret, On 2/1/19 11:08, Garret Wilson wrote: > On 2/1/2019 7:23 AM, Garret Wilson wrote: >> … * "There /is no default encoding for URIs/ specified anywhere, >> which is why there is a lot of confusion when it comes to >> decoding these values."

Re: distinction between resource charset and format octet decoding

2019-02-01 Thread Garret Wilson
On 2/1/2019 7:23 AM, Garret Wilson wrote: …  * "There /is no default encoding for URIs/ specified anywhere, which    is why there is a lot of confusion when it comes to decoding these    values." Sheesh, this is is ancient. I'll correct it as per   

Re: distinction between resource charset and format octet decoding

2019-02-01 Thread Garret Wilson
Good morning, I'm just getting to the editing. I'm going to list some thoughts I have as I go through this, so you can verify things: * The servlet spec links are way out of date. I'll update them. * "There /is no default encoding for URIs/ specified anywhere, which is why there is a lot

Re: distinction between resource charset and format octet decoding

2019-01-23 Thread Mark Thomas
On 23/01/2019 05:07, Garret Wilson wrote: > On 1/15/2019 3:20 AM, Mark Thomas wrote: >> … >> Anything in PascalCase becomes a link to a wiki page of that name. >> Usernames are created in this form so references to the user >> automatically become links to that user's page in the wiki. > > > Ah,

Re: distinction between resource charset and format octet decoding

2019-01-22 Thread Garret Wilson
On 1/15/2019 3:20 AM, Mark Thomas wrote: … Anything in PascalCase becomes a link to a wiki page of that name. Usernames are created in this form so references to the user automatically become links to that user's page in the wiki. Ah, OK, that explains it. Very good to know. Maybe a little

Re: distinction between resource charset and format octet decoding

2019-01-15 Thread Mark Thomas
On 15/01/2019 03:39, Garret Wilson wrote: > On 1/9/2019 2:30 AM, Mark Thomas wrote: >> … >> Create yourself an account at https://wiki.apache.org/tomcat (click >> login then create an account) and let the list know your ID. Then one of >> the admins can add you to the allowed editors. > > > I

Re: distinction between resource charset and format octet decoding

2019-01-14 Thread Garret Wilson
On 1/9/2019 2:30 AM, Mark Thomas wrote: … Create yourself an account at https://wiki.apache.org/tomcat (click login then create an account) and let the list know your ID. Then one of the admins can add you to the allowed editors. I was just ready to create an account, but I want to verify the

Re: distinction between resource charset and format octet decoding

2019-01-09 Thread Mark Thomas
On 09/01/2019 00:50, Garret Wilson wrote: > Hi, Mark, and thanks for some quick response. You provided some info I > wasn't aware of. Some responses below: > > On 1/8/2019 9:57 PM, Mark Thomas wrote: >> On 08/01/2019 21:31, Garret Wilson wrote: >> >> >> >>> But as discussed above, this is

Re: distinction between resource charset and format octet decoding

2019-01-08 Thread Garret Wilson
Hi, Mark, and thanks for some quick response. You provided some info I wasn't aware of. Some responses below: On 1/8/2019 9:57 PM, Mark Thomas wrote: On 08/01/2019 21:31, Garret Wilson wrote: But as discussed above, this is completely wrong: the resource character encoding of a request

Re: distinction between resource charset and format octet decoding

2019-01-08 Thread Mark Thomas
On 08/01/2019 21:31, Garret Wilson wrote: But as discussed above, this is completely wrong: the resource character encoding of a request sent in `application/x-www-form-urlencoded` should have absolutely no bearing on how the encoded octets within that resource are decoded. That is not