On Tuesday 18 September 2007, Alexander Larsson wrote:
> On Tue, 2007-09-18 at 11:18 +0200, Patryk Zawadzki wrote:
> > On 9/18/07, Alexander Larsson <[EMAIL PROTECTED]> wrote:
> > > On Tue, 2007-09-18 at 00:51 +0200, David Faure wrote:
> > > > On Tuesday 28 August 2007, Alexander Larsson wrote:
> > > > > If several globs matches, and sniffing fails, or doesn't help:
> > > > >   fall back to the first glob match
> > > > >   (maybe we should do something better here?)
> > > >
> > > > Hmm, I just found the case of "README.txt", which could either be 
> > > > "text/plain" due to *.txt
> > > > or "text/x-readme" due to README*. Which one should we pick? The second 
> > > > pattern "looks"
> > > > more specific to my eyes so it should probably win, but how should we 
> > > > quantify that?
> > > > Should we take the longest pattern?
> > >
> > > Yeah, this is tricky. I think the longest pattern is the traditional way
> > > to solve things like that. It will probably work good enought for us.
> > 
> > Isn't just enough to check if either of them is the subclass of the
> > second? If so, pick the more specific one.
> 
> That only works in the case of subclasses though, which might not always
> be the case. Seems right to use that when its possible though.

Agreed. Do we also agree that this handling of multiple glob matches can be 
done right away
inside glob-matching? No need to delay that to the "If several globs matches" 
resolution
(after sniffing), IMHO.

So the new algorithm would be the one described with Alexander, with something 
like this prepended:
Glob-matching should prefer derived mimetype over base mimetype, and longer 
matches
over shorter ones. However if two globs of the same length match the file, and 
the two
matches are not related in the inheritance tree, then we have a "glob 
conflict", which
will be resolved below.
"If several globs matches" in Alexander's algorithm really becomes "In case of 
a glob conflict", 
i.e. two or more mimetypes with the same glob (like *.doc or *.ogg).

[Well technically you could invent a pattern like foo.* and *.doc, so that 
foo.doc matches both and
you don't have a "longer match", but this is really border case (and would 
simply be handled
as a "glob conflict" too).]

My problem is that I can't test the subclass case, README* is the only case of a
glob match that has a * but not as the first character, so it's the only one 
that can give conflicts...
So after implementing "take longest match", I see no way of testing "take 
subclass",
since in the case of README.txt it is the longest match anyway... I could can 
data, but I also
mean that we might not have a use case for it at the moment :)

==

OK. Anyone knows which other implementations of shared-mime-info around?
xdgmime was mentionned, I don't know who knows it code well enough to modify
it once we all agree on the spec changes, but the first question is: who else 
needs
to approve those spec changes? Rox, I assume? Thomas Leonard CC'ed (see thread
for more info, we covered "preferring globs over contents" before talking about 
glob conflicts).

Thanks for the feedback. Now the kde glob-matching code takes the longest match 
:)

-- 
David Faure, [EMAIL PROTECTED], sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
_______________________________________________
xdg mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/xdg

Reply via email to