Jan-Wijbrand Kolman wrote:

we recently realised mimetype assignment in Zope to e.g. Zope File objects is inconsistent and can vary when different clients (browsers) upload files with the same file extensions.

Example: when a file called "foobar.rtf" is upload to a Zope File
object from Linux Firefox, the mimetype assigned is (can be)
'application/rtf'. However, the same file uploaded to the same Zope
File object in the same Zope instance, using IE on Window2000 (with MS
Office installed) will get 'application/msword' assigned.

The mimetype assignment for uploaded files is done in OFS.Image.py
(maybe there're more places or other Products that do this - I know
that at least ExtFile does this too). line 463 of OFS.Image.py, Zope

def _get_content_type(self, file, body, id, content_type=None):
    headers=getattr(file, 'headers', None)
    if headers and headers.has_key('content-type'):
        if type(body) is not type(''): body=body.data
        content_type, enc=guess_content_type(
            getattr(file, 'filename',id), body, content_type)
    return content_type

Then I understood that the headers as sent by the client for this file
(may?) have a content-type entry that takes precedence over both the
mimetypes 'database' and the content_type passed in as an argument.

We could deal with the inconsistent assignment on the application
level (in this case Silva), but I'd rather consider changing this
behaviour on the Zope level. I could imagine changing the way a
mimetype is 'guessed' from an uploaded File to something like:

def _get_content_type(self, file, body, id, content_type=None):
    Order of precedence:
    1) see if guess_content_type resolves to a mimetype for the
    2) if not use content_type as sent in the headers if
    3) else use argument passed in
    headers = getattr(file, 'headers', {})
    content_type = headers.get('content-type', content_type)
    if type(body) is not type(''):
        body = body.data
    name = getattr(file, 'filename', id)
    content_type, enc = guess_content_type(name, body, content_type)
    return content_type

Does anyone have an opinion on this? Is the current behaviour
completely intentional, maybe even according to some specification
(and thus I should deal with it on the application level)? Should I
file a collector issue?

-1 for using the "guessed" value over the one from the headers; +1 for using the argument over the guessed value (so that the application can "fix" the problem). I agree that having different clients supply different types is painful, but I don't think that "fixing" it at the low level is reasonable (mechanism vs. policy).

In summary, I would prefer the precedence to be:

  1. Passed value

  2. Request header

  3. Guessed value

Tres Seaver                                [EMAIL PROTECTED]
Zope Corporation      "Zope Dealers"       http://www.zope.com

Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )

Reply via email to