Attached is an update to the shared mime specification that adds two things:
1) An update to the default mimetype resolve order, based on previous discussions on this list. The basic change is to prefer glob matches if we get non-conflicting matches, otherwise we resolve conflicts in a slightly more advanced way, including using a newly added priority for globs. 2) Adds generic-icon and icon attribute for mimetypes. As per some discussions between Gnome and KDE developers. The generic-icon is used for looking up the icon to use if the mimetype specific icon does not exist (i.e. the if the "application-msword" doesn't exist, use "x-office-document"). The list of generic icon names are availible in Table 10 in the icon naming spec. Without this addition there is currently no way to know what generic icon to use without hardcoding a list of types. The "icon" attribute is used if set before the mimetype specific icon name, and is supposed to only be used in the per-user mimetype database to override things for user customization. It will not be used in the freedesktop database. Opinions?
Index: shared-mime-info-spec.xml =================================================================== RCS file: /cvs/mime/shared-mime-info/shared-mime-info-spec.xml,v retrieving revision 1.56 diff -u -p -r1.56 shared-mime-info-spec.xml --- shared-mime-info-spec.xml 1 Dec 2005 18:53:26 -0000 1.56 +++ shared-mime-info-spec.xml 28 Jan 2008 11:27:25 -0000 @@ -1,8 +1,8 @@ <?xml version="1.0" standalone="no"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" -"/usr/share/sgml/docbook/dtd/xml/4.1.2/docbookx.dtd" [ - <!ENTITY updated "1 December 2005"> - <!ENTITY version "0.15"> +"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [ + <!ENTITY updated "25 January 2007"> + <!ENTITY version "0.16"> ]> <article id="index"> @@ -159,7 +159,10 @@ changes take effect. The files created by <command>update-mime-database</command> are: <itemizedlist> <listitem><para> -<filename><MIME>/globs</filename> (contains a mapping from names to MIME types) +<filename><MIME>/globs</filename> (contains a mapping from names to MIME types [deprecated for glob2]) + </para></listitem> + <listitem><para> +<filename><MIME>/globs2</filename> (contains a mapping from names to MIME types and glob priority) </para></listitem> <listitem><para> <filename><MIME>/magic</filename> (contains a mapping from file contents to MIME types) @@ -171,6 +174,12 @@ The files created by <command>update-mim <filename><MIME>/aliases</filename> (contains a mapping from aliases to MIME types) </para></listitem> <listitem><para> +<filename><MIME>/icons</filename> (contains a mapping from MIME types to icons) + </para></listitem> + <listitem><para> +<filename><MIME>/generic-icons</filename> (contains a mapping from MIME types to generic icons) + </para></listitem> + <listitem><para> <filename><MIME>/XMLnamespaces</filename> (contains a mapping from XML (namespaceURI, localName) pairs to MIME types) </para></listitem> @@ -180,8 +189,9 @@ type, giving details about the type) </para></listitem> <listitem><para> <filename><MIME>/mime.cache</filename> (contains the same information as the <filename>globs</filename>, -<filename>magic</filename>, <filename>subclasses</filename>, <filename>aliases</filename> and -<filename>XMLnamespaces</filename> files, in a binary, mmappable format) +<filename>magic</filename>, <filename>subclasses</filename>, <filename>aliases</filename>, +<filename>icons</filename>, <filename>generic-icons</filename> and <filename>XMLnamespaces</filename> files, +in a binary, mmappable format) </para></listitem> </itemizedlist> The format of these generated files and the source files in <filename>packages</filename> @@ -213,7 +223,9 @@ and in any order: <listitem><para> <userinput>glob</userinput> elements have a <userinput>pattern</userinput> attribute. Any file whose name matches this pattern will be given this MIME type (subject to conflicting rules in -other files, of course). +other files, of course). There is also an optional <userinput>priority</userinput> attribute which +is used when resolving conflicts with other glob matches. The default priority value is 50, and +the maximum is 100. </para> <para> KDE's glob system replaces GNOME's and ROX's ext/regex fields, since it @@ -305,6 +317,24 @@ There may be many of these elements with to provide the text in multiple languages, although these should only be used if absolutely neccessary. </para></listitem> <listitem><para> +<userinput>icon</userinput> elements specify the icon to be used for this particular mime-type, given +by the <userinput>name</userinput> attribute. Generally the icon used for a mimetype is created +based on the mime-type by mapping "/" characters to "-", but users can override this by using +the <userinput>icon</userinput> element to customize the icon for a particular mimetype. +This element is not used in the system database, but only used in the user overridden database. +Only one <userinput>icon</userinput> element is allowed. + </para></listitem> + <listitem><para> +<userinput>generic-icon</userinput> elements specify the icon to use as a generic icon for this +particular mime-type, given by the <userinput>name</userinput> attribute. This is used if there +is no specific icon (see <userinput>icon</userinput> for how these are found). These are +used for categories of similar types (like spreadsheets or archives) that can use a common icon. +The Icon Naming Specification lists a set of such icon names. If this element is not specified +then the mimetype is used to generate the generic icon by using the top-level media type (e.g. +"video" in "video/ogg") and appending "-x-generic" (i.e. "video-x-generic" in the previous example). +Only one <userinput>generic-icon</userinput> element is allowed. + </para></listitem> + <listitem><para> <userinput>root-XML</userinput> elements have <userinput>namespaceURI</userinput> and <userinput>localName</userinput> attributes. If a file is identified as being an XML file, these rules allow a more specific MIME type to be chosen based on the namespace and localname @@ -528,14 +558,16 @@ mmappable format: <programlisting> Header: 2 CARD16 MAJOR_VERSION 1 -2 CARD16 MINOR_VERSION 0 +2 CARD16 MINOR_VERSION 1 4 CARD32 ALIAS_LIST_OFFSET 4 CARD32 PARENT_LIST_OFFSET 4 CARD32 LITERAL_LIST_OFFSET -4 CARD32 SUFFIX_LIST_OFFSET +4 CARD32 REVERSE_SUFFIX_TREE_OFFSET 4 CARD32 GLOB_LIST_OFFSET 4 CARD32 MAGIC_LIST_OFFSET 4 CARD32 NAMESPACE_LIST_OFFSET +4 CARD32 ICONS_LIST_OFFSET +4 CARD32 GENERIC_ICONS_LIST_OFFSET AliasList: 4 CARD32 N_ALIASES @@ -564,6 +596,7 @@ LiteralList: LiteralEntry: 4 CARD32 LITERAL_OFFSET 4 CARD32 MIME_TYPE_OFFSET +4 CARD32 PRIORITY GlobList: 4 CARD32 N_GLOBS @@ -572,16 +605,18 @@ GlobList: GlobEntry: 4 CARD32 GLOB_OFFSET 4 CARD32 MIME_TYPE_OFFSET +4 CARD32 PRIORITY -SuffixTree: +ReverseSuffixTree: 4 CARD32 N_ROOTS 4 CARD32 FIRST_ROOT_OFFSET -SuffixTreeNode: +ReverseSuffixTreeNode: 4 CARD32 CHARACTER 4 CARD32 MIME_TYPE_OFFSET 4 CARD32 N_CHILDREN 4 CARD32 FIRST_CHILD_OFFSET +4 CARD32 PRIORITY MagicList: 4 CARD32 N_MATCHES @@ -612,12 +647,22 @@ NamespaceEntry: 4 CARD32 NAMESPACE_URI_OFFSET 4 CARD32 LOCAL_NAME_OFFSET 4 CARD32 MIME_TYPE_OFFSET + +GenericIconsList: +IconsList: +4 CARD32 N_ICONS +8*N_ICONS IconListEntry + +IconListEntry: +4 CARD32 MIME_TYPE_OFFSET +4 CARD32 ICON_NAME_OFFSET </programlisting> <para> Lists in the file are sorted, to enable binary searching. The list of aliases is sorted by alias, the list of literal globs is sorted by the literal. The SuffixTreeNode siblings are sorted by character. -The list of namespaces is sorted by namespace uri. +The list of namespaces is sorted by namespace uri. The list of icons +is sorted by mimetype. </para> <para> Identical globs are stored in the suffix tree by appending suffix @@ -698,31 +743,48 @@ If a MIME type is provided explicitly (e email attachment, an extended attribute or some other means) then that should be used instead of guessing. </para></listitem> + <listitem><para> -If no explicit type is present, magic rules with a priority of 80 or more -should be tried next. These rules have a very low false-positive rate. - </para></listitem> - <listitem><para> -If there is still no match, the glob rules should be applied to the name to -get the type. +Otherwise, start by doing a glob match of the filename. If one or more glob matches, and all the +matching globs result in the same mimetype, use that mimetype as the result. </para></listitem> + <listitem><para> -If no glob rules match, the remaining magic rules should be tried next. +If the glob matching fails or results in multiple conflicting mimetypes, read the +contents of the file and do magic sniffing on it. If no magic rule matches the data (or if +the content is not availible), use the default type of application/octet-stream for +binary data, or text/plain for textual data. If there was no glob match, or the priority +of the matched magic rule is 80 or more (these rules have very low false-positive rate), +use the magic match as the result. + </para><para> +Note: Checking the first 32 bytes of the file for ASCII control characters is +a good way to guess whether a file is binary or text, but note that files with high-bit-set +characters should still be treated as text since these can appear in UTF-8 text, +unlike control characters. + </para></listitem> + + <listitem><para> +If any of the mimetypes resulting from a glob match is equal to or a subclass of +the result from the magic sniffing, use this as the result. This allows us to for example +distinguish text files called "foo.doc" from MS-Word files with the same name, as the +magic match for the MS-Word file would be application/x-ole-storage which the MS-Word type +inherits. </para></listitem> + <listitem><para> -If nothing matches, the default type of application/octet-stream should be used -for binary data, or text/plain for textual data. Checking the first 32 -bytes of the file for ASCII control characters is a good way to guess -whether a file is binary or text, but note that files with high-bit-set -characters should still be treated as text since these can appear in UTF-8 -text, unlike control characters. +Otherwise use the result of the glob match that has the highest priority. </para></listitem> </itemizedlist> </para> <para> -There are several reasons for checking most of the glob patterns before the magic. -Some applications don't check the magic at all, and this makes it more likely -that both will get the same type. Users can easily understand why calling their +There are several reasons for checking the glob patterns before the magic. +First of all doing magic sniffing is very expensive as reading the contents of the files +causes a lot of seeks, which is very expensive. Secondly, some applications don't check +the magic at all (sometimes the content is not availible or to slow to read), and this +makes it more likely that both will get the same type. + </para> + <para> +Also, users can easily understand why calling their text file <filename>README.mp3</filename> makes the system think it's an MP3, whereas they have trouble understanding why their computer thinks <filename>README.txt</filename> is a PostScript file. If the system guesses wrongly,
_______________________________________________ xdg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xdg
