Hi Johannes,

Many questions in a single email :-)

On Tuesday 09 August 2011 18:54:34 Johannes Pfau wrote:
> The LiteralList:
> Is it safe to assume that the list is consistent?
> I.e. could there be a case insensitive "Hello" entry and another,
> conflicting, case insensitive "hello" entry?
> Can the list contain two entries which are only different in casing, one
> case-sensitive, one not? I.e. can the list contain a case sensitive
> "Hello" entry and a case insensitive "hello" entry? If so should "Hello"
> match the case sensitive variant ore the case insensitive one?

That would be ambiguous, so I would say we should never get into such a case.
We can either solve it with a different weight for the case-sensitive match,
or using case sensitivify for both matches (like we do for *.C and *.c)

[skipping a few questions about mime.cache which I don't know yet]
> ReverseSuffixTreeNode.CHARACTER: What encoding is used? I guess UTF32?
UTF-8, rather? I don't think this code uses UTF-32 anywhere.
Not sure it was tested for non-ascii globs anyway.

> [...]
> Regarding the recommended checking order:
> "Otherwise, start by doing a glob match of the filename."
> In which order should LITERALs the RST and GLOBS be checked?
> Should all 3 of those always be checked?
> For example if a LITERAL match is found, should the RST/GLOBS still be
> checked? (guess not)

Interestingly enough, implementations that don't use mime.cache don't 
distinguish between these types of globs. So I think they should all be 
checked, and then if you get more than one match, you go into the "multiple 
globs matched" resolution (i.e. sniffing and sorting it out).

> "If any of the mimetypes resulting from a glob match is equal to or a
> subclass of the result from the magic sniffing, use this as the result."
> Should this check be done against _all_ matching GLOBS/RST entries, or
> against the list obtained in step 2 ("only biggest weight. If the
> patterns are different, keep only globs with the longest pattern")

If this is about that "glob conflict" resolution, then the goal is to choose 
which of the matched globs is best. So it should be only "against the list 
obtained by glob matching". Otherwise you get funny results.
The extension is still a pretty good hint, we should use it, not just rely on 
fragile magic-only.

> "Otherwise use the result of the glob match that has the highest weight."
> What if there are multiple, different matches with same length & weight?
> Return "application/octet-stream" or the first match?

One of the matches, e.g. the first match.

> The spec assumes there's at most one MAGIC match. What if there are
> multiple matches? Use the one with the highest  PRIORITY? What if there
> are multiple matches with the same PRIORITY?

Good question. In practice my code stops at the first match. I rely on the 
fact that the magic rules are sufficiently well written. But "the one with the 
highest priority" seems like a safer choice indeed. And then if there are two 
with the same priority, we have no choice but to pick one.

> I also had some issues with the test suite:
> 
> My "by-Name" implementation fails for the following files:
> test-template.dot, aportis.pdb, sqlite2.kexi, subtitle-microdvd.sub,
> simple-obj-c.m, linguist.ts, test.ogg

Yep, mine too.

> There's a common pattern with all those tests:  My implementation finds
> multiple equal matches in the tree and bails out according to the spec:
> "If a matching pattern is provided by two or more MIME types,
> applications SHOULD not rely on one of them. They are instead supposed
> to use magic data (see below)"
> 
> Example for test-template.dot:
> Tree: '[{Type: 'application/msword-template' Weight: '50' CaseSensitive:
> 'false' Flags: '[0,0,0]'},{Type: 'text/vnd.graphviz' Weight: '50'
> CaseSensitive: 'false' Flags: '[0,0,0]'}]' Expected:
> 'application/msword-template'
> The test suite always assumes the first of those results is returned?
> Which implementation is correct in this case? I see no reason why
> 'application/msword-template' should be used here, 'text/vnd.graphviz'
> has exactly the same weight, flags and matching pattern.

I completely agree. I think the test description should be rewritten with 
another syntax, in order to be able to express:
test-template.dot:
  match-by-name = application/msword-template or text/vnd.graphviz
  match-by-data = application/x-ole-storage
  match-by-file = application/msword-template

[See, a good example of why the glob-conflict resolution should choose one of 
the matched globs, instead of just returning x-ole-storage]

Such a syntax would also make the test output much clearer, rather than 110 
expected failures from the current "oxx" solution.

> Another question: why do the bug-30656-xchat.conf/menu.ini tests return
> "application/octet-stream"? Those are text files, so if text/binary
> guessing is used the result should be "text/plain"? Is the spec out of
> date and binary/text guessing is obsolete?

No, binary/text guessing was never implemented in xdgmime, and that's a bug 
indeed. It turns out that I fixed it last week, see attached diff.
I just committed it, but I'm attaching the patch so that Alexander or Bastien 
can review it, this is my first commit to xdgmime [which seems to still be in 
CVS? I thought everything had moved to git?]

My patch also implements "a file with size zero and no known extension has 
mimetype application/x-zerosize", as was the intent of the x-zerosize 
mimetype, but it was never explicitely written in the spec or in that code.

> [skipped questions about byte order, would need further research]

-- 
David Faure, [email protected], http://www.davidfaure.fr
Sponsored by Nokia to work on KDE, incl. Konqueror (http://www.konqueror.org).
Index: src/xdgmime.c
===================================================================
RCS file: /cvs/mime/xdgmime/src/xdgmime.c,v
retrieving revision 1.38
diff -u -p -r1.38 xdgmime.c
--- src/xdgmime.c	2 Oct 2009 10:52:47 -0000	1.38
+++ src/xdgmime.c	25 Sep 2011 06:39:06 -0000
@@ -64,6 +64,8 @@ XdgMimeCache **_caches = NULL;
 static int n_caches = 0;
 
 const char xdg_mime_type_unknown[] = "application/octet-stream";
+const char xdg_mime_type_empty[] = "application/x-zerosize";
+const char xdg_mime_type_textplain[] = "text/plain";
 
 
 enum
Index: src/xdgmime.h
===================================================================
RCS file: /cvs/mime/xdgmime/src/xdgmime.h,v
retrieving revision 1.19
diff -u -p -r1.19 xdgmime.h
--- src/xdgmime.h	3 Jun 2008 13:18:05 -0000	1.19
+++ src/xdgmime.h	25 Sep 2011 06:39:06 -0000
@@ -68,6 +68,8 @@ typedef void (*XdgMimeDestroy)  (void *u
 #define xdg_mime_register_reload_callback     XDG_ENTRY(register_reload_callback)
 #define xdg_mime_remove_callback              XDG_ENTRY(remove_callback)
 #define xdg_mime_type_unknown                 XDG_ENTRY(type_unknown)
+#define xdg_mime_type_empty                   XDG_ENTRY(type_empty)
+#define xdg_mime_type_textplain               XDG_ENTRY(type_textplain)
 #define xdg_mime_get_icon                     XDG_ENTRY(get_icon)
 #define xdg_mime_get_generic_icon             XDG_ENTRY(get_generic_icon)
 
@@ -77,7 +79,11 @@ typedef void (*XdgMimeDestroy)  (void *u
 #endif
 
 extern const char xdg_mime_type_unknown[];
+extern const char xdg_mime_type_empty[];
+extern const char xdg_mime_type_textplain[];
 #define XDG_MIME_TYPE_UNKNOWN xdg_mime_type_unknown
+#define XDG_MIME_TYPE_EMPTY xdg_mime_type_empty
+#define XDG_MIME_TYPE_TEXTPLAIN xdg_mime_type_textplain
 
 const char  *xdg_mime_get_mime_type_for_data       (const void *data,
 						    size_t      len,
Index: src/xdgmimecache.c
===================================================================
RCS file: /cvs/mime/xdgmime/src/xdgmimecache.c,v
retrieving revision 1.24
diff -u -p -r1.24 xdgmimecache.c
--- src/xdgmimecache.c	2 Oct 2009 10:52:47 -0000	1.24
+++ src/xdgmimecache.c	25 Sep 2011 06:39:06 -0000
@@ -280,6 +280,12 @@ cache_magic_lookup_data (XdgMimeCache *c
 
   int j, n;
 
+  if (len == 0)
+    {
+      *prio = 100;
+      return XDG_MIME_TYPE_EMPTY;
+    }
+
   *prio = 0;
 
   list_offset = GET_UINT32 (cache->buffer, 24);
@@ -667,7 +673,9 @@ cache_get_mime_type_for_data (const void
 {
   const char *mime_type;
   int i, n, priority;
+  unsigned char* chardata;
 
+  chardata = (unsigned char *) data;
   priority = 0;
   mime_type = NULL;
   for (i = 0; _caches[i]; i++)
@@ -699,7 +707,14 @@ cache_get_mime_type_for_data (const void
 	return mime_types[n];
     }
 
-  return XDG_MIME_TYPE_UNKNOWN;
+  /* Unknown: check if binary or text */
+  for (i = 0; i < 32 && i < len; ++i)
+    {
+       if (chardata[i] < 32 && chardata[i] != 9 && chardata[i] != 10 && chardata[i] != 13)
+         return XDG_MIME_TYPE_UNKNOWN; /* binary data */
+    }
+
+  return XDG_MIME_TYPE_TEXTPLAIN;
 }
 
 const char *
Index: src/xdgmimemagic.c
===================================================================
RCS file: /cvs/mime/xdgmime/src/xdgmimemagic.c,v
retrieving revision 1.26
diff -u -p -r1.26 xdgmimemagic.c
--- src/xdgmimemagic.c	2 Jun 2008 15:58:39 -0000	1.26
+++ src/xdgmimemagic.c	25 Sep 2011 06:39:06 -0000
@@ -665,6 +665,13 @@ _xdg_mime_magic_lookup_data (XdgMimeMagi
   int n;
   int prio;
 
+  if (len == 0)
+    {
+      if (result_prio)
+        *result_prio = 100;
+      return XDG_MIME_TYPE_EMPTY;
+    }
+
   prio = 0;
   mime_type = NULL;
   for (match = mime_magic->match_list; match; match = match->next)
_______________________________________________
xdg mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/xdg

Reply via email to