https://bugzilla.wikimedia.org/show_bug.cgi?id=73661

            Bug ID: 73661
           Summary: Uploads don't allow non-ASCII characters in filename
           Product: Pywikibot
           Version: core-(2.0)
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General
          Assignee: pywikipedia-b...@lists.wikimedia.org
          Reporter: commodorefabia...@gmx.de
       Web browser: ---
   Mobile Platform: ---

Depending on the used version either the original file may not contain
non-ASCII characters or the target page name on the wiki. This was changed in
Ib751ee3f4074a60f3b53b0afe3cc2dfc3e17b2f7 in pwb 2.0 so versions prior to that
won't work with non-ASCII local filenames and versions with that won't work
with non-ASCII wiki page names.

The problem is simply that the 'filename'-value in the header of the file/chunk
entry (not to be confused with the 'filename' entry in the MIME request). For
example:

  Content-Type: image/jpeg
  MIME-Version: 1.0
  Content-disposition: form-data; name="file"; filename*=utf-8''%C3%9C.jpg
  Content-Transfer-Encoding: binary

  [… binary data …]

This would be the RFC2231 compliant encoding of a non-ASCII character, which
would be used by default in Python 3. Python 2 instead does a strange encoding
of the complete line (this may not represent the same text as above but
similar):

  Content-disposition: =?utf-8?b?Zm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9?= 
   =?utf-8?b?IsOcMi5qcGci?=

Both are not accepted by the MediaWiki server and are answered with:

  badupload_file: File upload param file is not a file upload; be sure to use
multipart/form-data for your POST and include a filename in the
Content-Disposition header.

Or Python 2:

  missingparam: One of the parameters filekey, file, url, statuskey is required

It is possible to leave it UTF8 encoded although that is (afaics) not compliant
with the RFCs related to MIME which say that the header may only contain
US-ASCII characters.

Unfortunately I'm not sure what mediawiki does with this so I don't if there is
a better way, especially as Python 3 doesn't support 'bytes' in the header and
otherwise it's not possible to get the value not reencoded there.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to