2009/10/22 Marcin Hanclik <[email protected]>:
> Hi Marcos, All,
>
>>>If any character in the extension is outside the U+0041-U+005A range
>>>and the U+0061-U+007A range, then go to step 7 in this algorithm.
> Unfortunately I disagree with that.
> Motivation:
> a) only ASCII characters are listed
> b) no digits are listed. What about file extensions that include digits, like 
> e.g. .p12 (PKCS#12 certificate)?

I don't see that file format in the  "File Identification Table".

> c) at present internationalization is a key topic in many circles and I do 
> not understand why we shall restrict the file extensions in XXI century.
>

Because we are trying to find stuff in the "File Identification Table"
(i.e., the algorithm is limited just to those file names). We are not
writing a general algorithm for extension to MIME mapping! That's what
SNIFF does.

> d) there exist proprietary widget specifications and it seems none of them 
> restricts the file extensions.

I don't know what you mean here? We don't restrict anything. We have
the most common types defined, and the ones we don't defined are
handled by SNIFF. I don't see the problem?

> Proposed actions:
> Drop ranges and limits.
> Eventually also contact I18N group and ask their opinion.

I think you've misunderstood the intention of the specification wrt
this section.

>>>That is not possible because trying to do Unicode case comparisons is
>>>a nightmare (or so I'm told).
> I think we should distinguish between possibility and difficulty.

this is totally irrelevant for this algorithm?

> The whole filenames are to be compared (as per P&C) in many cases, and 
> suddenly file extensions cannot be compared.
>

This is just for efficiency.

> E.g.
> "A default start file is a reserved start file at the root of the widget 
> package or at the root of a locale folder whose file name case-sensitively 
> and exactly matches a file name given in the file name column of the default 
> start files table, and whose media type matches the media type given in the 
> media type column of the table."
>
>>>That is correct. This behavior is *nix systems (including Mac OS X).
>>>This is not consistent with the behavior of the operating systems I
>>>have tested.
> I disagree.
> Could you please publish your tests?

I created the files in the finder on MacOs X (Snow Leopard). I prefer
not to send a screenshot to the mailing list.

> In general I think that there is no standard for the term "file extension". 
> P&C actually standardizes it, it seems.
> In the *nix, *inux systems it seems not to exist, it can only be somehow 
> artificially handled by some application (shell etc., see below).
> Here is mine test (executed on Ubuntu and Debian):
> host:~$ mkdir test
> host:~$ touch test/.jpg
> host:~$ touch test/img.jpg
> host:~$ touch test/.gif
> host:~$ touch test/img.gif
> host:~$ ls -laX test/
> total 8
> drwxr-xr-x 2 user user 4096 2009-10-22 15:33 .
> drwxr-xr-x 5 user user 4096 2009-10-22 15:33 ..
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.jpg
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .jpg
> //It seems that shell is confused, or?
> host:~$ cd test/
> host:~/test$ ls -laX
> total 8
> drwxr-xr-x 2 user user 4096 2009-10-22 15:33 .
> drwxr-xr-x 5 user user 4096 2009-10-22 15:33 ..
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.gif
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 img.jpg
> -rw-r--r-- 1 user user    0 2009-10-22 15:33 .jpg
> //It seems that shell is confused, or?
> host:~/test$ basename .jpg
> .jpg
> host:~/test$ cd ..
> host:~$ basename test/.jpg
> .jpg
> host:~$ basename test/.jpg .jpg
> .jpg
> host:~$ basename test/img.jpg .jpg
> img
> host:~$ basename test/img.jpg
> img.jpg
> host:~$ basename test/img.jpg pg
> img.j
> //this test actually proves that the basename app is looking for the [SUFFIX] 
> string in the file name. File extension is ARTIFICIAL!!
>

We know this already, Basename does not exist in the spec anymore? you
made me take it out? That's why we have the prose.

> host:~$
>
> Further comments:
> [1] gives the following guidelines for media type registration:
> "Various sorts of optional information SHOULD be included in the
> specification of a media type if it is available:
> ...
>   o  File name extension(s) commonly used on one or more platforms to
>      indicate that some file contains a given media type.
>
>   o  Mac OS File Type code(s) (4 octets) used to label files containing
>      a given media type."
> The term file (name) extension is not defined. MacOS File Type code seems not 
> to be equivalent to file extension (that stems more from Windows world).
>

is this even relevant now? Or is this some legacy thing for previous
version of Mac Os?

> Historically Windows worked with 3 characters and Mac with 4 characters.
>
> Therefore in P&C we shall assume that file extension is just any sequence of 
> characters that occur after the last dot (U+002E FULL STOP) including that 
> dot.
>

I really don't understand what you are intending to solve or what you
think the spec does here?

To be clear: All we want to do is check if the file extension of a
file case-insensitively matches one of the extensions in the File
Identification Table. If you can't match it, then the MIME type gets
resolved with SNIFF.




-- 
Marcos Caceres
http://datadriven.com.au

Reply via email to