2009/10/22 Marcin Hanclik <[email protected]>: > Hi Marcos, All, > >>>If any character in the extension is outside the U+0041-U+005A range >>>and the U+0061-U+007A range, then go to step 7 in this algorithm. > Unfortunately I disagree with that. > Motivation: > a) only ASCII characters are listed > b) no digits are listed. What about file extensions that include digits, like > e.g. .p12 (PKCS#12 certificate)?
I don't see that file format in the "File Identification Table". > c) at present internationalization is a key topic in many circles and I do > not understand why we shall restrict the file extensions in XXI century. > Because we are trying to find stuff in the "File Identification Table" (i.e., the algorithm is limited just to those file names). We are not writing a general algorithm for extension to MIME mapping! That's what SNIFF does. > d) there exist proprietary widget specifications and it seems none of them > restricts the file extensions. I don't know what you mean here? We don't restrict anything. We have the most common types defined, and the ones we don't defined are handled by SNIFF. I don't see the problem? > Proposed actions: > Drop ranges and limits. > Eventually also contact I18N group and ask their opinion. I think you've misunderstood the intention of the specification wrt this section. >>>That is not possible because trying to do Unicode case comparisons is >>>a nightmare (or so I'm told). > I think we should distinguish between possibility and difficulty. this is totally irrelevant for this algorithm? > The whole filenames are to be compared (as per P&C) in many cases, and > suddenly file extensions cannot be compared. > This is just for efficiency. > E.g. > "A default start file is a reserved start file at the root of the widget > package or at the root of a locale folder whose file name case-sensitively > and exactly matches a file name given in the file name column of the default > start files table, and whose media type matches the media type given in the > media type column of the table." > >>>That is correct. This behavior is *nix systems (including Mac OS X). >>>This is not consistent with the behavior of the operating systems I >>>have tested. > I disagree. > Could you please publish your tests? I created the files in the finder on MacOs X (Snow Leopard). I prefer not to send a screenshot to the mailing list. > In general I think that there is no standard for the term "file extension". > P&C actually standardizes it, it seems. > In the *nix, *inux systems it seems not to exist, it can only be somehow > artificially handled by some application (shell etc., see below). > Here is mine test (executed on Ubuntu and Debian): > host:~$ mkdir test > host:~$ touch test/.jpg > host:~$ touch test/img.jpg > host:~$ touch test/.gif > host:~$ touch test/img.gif > host:~$ ls -laX test/ > total 8 > drwxr-xr-x 2 user user 4096 2009-10-22 15:33 . > drwxr-xr-x 5 user user 4096 2009-10-22 15:33 .. > -rw-r--r-- 1 user user 0 2009-10-22 15:33 .gif > -rw-r--r-- 1 user user 0 2009-10-22 15:33 img.gif > -rw-r--r-- 1 user user 0 2009-10-22 15:33 img.jpg > -rw-r--r-- 1 user user 0 2009-10-22 15:33 .jpg > //It seems that shell is confused, or? > host:~$ cd test/ > host:~/test$ ls -laX > total 8 > drwxr-xr-x 2 user user 4096 2009-10-22 15:33 . > drwxr-xr-x 5 user user 4096 2009-10-22 15:33 .. > -rw-r--r-- 1 user user 0 2009-10-22 15:33 .gif > -rw-r--r-- 1 user user 0 2009-10-22 15:33 img.gif > -rw-r--r-- 1 user user 0 2009-10-22 15:33 img.jpg > -rw-r--r-- 1 user user 0 2009-10-22 15:33 .jpg > //It seems that shell is confused, or? > host:~/test$ basename .jpg > .jpg > host:~/test$ cd .. > host:~$ basename test/.jpg > .jpg > host:~$ basename test/.jpg .jpg > .jpg > host:~$ basename test/img.jpg .jpg > img > host:~$ basename test/img.jpg > img.jpg > host:~$ basename test/img.jpg pg > img.j > //this test actually proves that the basename app is looking for the [SUFFIX] > string in the file name. File extension is ARTIFICIAL!! > We know this already, Basename does not exist in the spec anymore? you made me take it out? That's why we have the prose. > host:~$ > > Further comments: > [1] gives the following guidelines for media type registration: > "Various sorts of optional information SHOULD be included in the > specification of a media type if it is available: > ... > o File name extension(s) commonly used on one or more platforms to > indicate that some file contains a given media type. > > o Mac OS File Type code(s) (4 octets) used to label files containing > a given media type." > The term file (name) extension is not defined. MacOS File Type code seems not > to be equivalent to file extension (that stems more from Windows world). > is this even relevant now? Or is this some legacy thing for previous version of Mac Os? > Historically Windows worked with 3 characters and Mac with 4 characters. > > Therefore in P&C we shall assume that file extension is just any sequence of > characters that occur after the last dot (U+002E FULL STOP) including that > dot. > I really don't understand what you are intending to solve or what you think the spec does here? To be clear: All we want to do is check if the file extension of a file case-insensitively matches one of the extensions in the File Identification Table. If you can't match it, then the MIME type gets resolved with SNIFF. -- Marcos Caceres http://datadriven.com.au
