I am not an experienced web standards wonk, so please forgive me if I'm making 
a mistake here.

When uploading files that contain special characters in their name, it appears 
to me that it is unspecified as to how those file names should be escaped. As a 
result, Webkit/Safari/Chrome appear to handle these filenames in one way, while 
Firefox handles them in another. I'm implementing the server side of this 
equation, and it is unclear to me what I should be doing. Am I missing 
something? Webkit even has a bug on this issue that states "I suggest working 
with WHATWG or HTML WG to get something specified in HTML5, and getting 
browsers converge on that." Is anyone working on this?


EXAMPLE

Create a file named: bàz'\"hi%22.txt  eg. using the unix command: touch 
bàz\'\\\"hi%22.txt


Firefox (13.0 beta on Mac) sends the following header, backslash escaping the 
double quote but not escaping the backslash.

Content-Disposition: form-data; name="somefile"; filename="bàz'\\"hi%22.txt"


Webkit (latest nightly r115711 on Mac): %-escapes the double quote, but does 
nothing to the literal %

Content-Disposition: form-data; name="somefile"; filename="bàz'\%22hi%22.txt"


THE SPECS: HTML5 states:

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data

Encode the (now mutated) form data set using the rules described by RFC 2388. 
[…] File names […] must use the character encoding selected above, though the 
precise name may be approximated if necessary (e.g. […]). User agents must not 
use the RFC 2231 encoding suggested by RFC 2388.


… this seems contradictory: Encode using RFC 2388, but do not using the 
encoding suggested by the RFC. Worse, no browser actually follows the RFC (e.g. 
they all use UTF-8 encoded parameter values), so that doesn't seem like the 
right answer. Is there a way out of this mess?

Evan

--
http://evanjones.ca/

Reply via email to