On Mon, 2007-09-17 at 22:16 +0200, David Faure wrote: > On Tuesday 28 August 2007, Alexander Larsson wrote: > > On Fri, 2007-08-24 at 16:44 +0200, Alexander Larsson wrote: > > > This is my main problem with hi-priority sniffing. It either causes very > > > bad performance behaviour in the file manager, or it adds user-visible > > > confusion as to the type of some files. > > > > > > I personally prefer to drop the hi-prio sniffing, and use sniffing only > > > on conflicts and on extension match failure. This way you get only one, > > > well defined, usable everywhere, canonical type (well, there is also the > > > first-scan "fast mimetype", but you never open a file based on that). It > > > also means that any user problem with file types is solvable by the user > > > (just rename the problematic file). > Agreed. > > > Here is what I just implemented for gvfs: > > > > If only one glob matches, use that > > > > If no glob matches, sniff and use that > > > > If several globs matches, and sniffing gives a result we do: > > if sniffed prio >= 80, use sniffed type > > for glob_match in glob_matches: > > if glob_match is subclass or equal to sniffed_type, use glob_match > > > > If several globs matches, and sniffing fails, or doesn't help: > > fall back to the first glob match > OK. > > > (maybe we should do something better here?) > Can't think of any further heuristic, actually. Apart from using the "is text > vs is binary" > heuristic (already in the spec) to choose between a text-like and a > binary-like format, > but this is just one case. > (and I thought we needed that until I realized that for the msword case we > have > x-ole-storage which is much better than just "is not text"; but maybe there's > another > case where we don't have a useful base mimetype). > > > This algorithm only sniffs when there is some uncertainty with the > > extension matching (thus, its usable for a file manager). > Yes. > > And actually I think this is much more correct, not just faster. > With the current spec, if I create a "infomation.txt" file that says > "The tag to use for SMIL data is <smil>", then it would be detected as > application/smil > because of this high-priority magic rule: > <magic priority="80"> > <match type="string" value="<smil" offset="0:256"/> > </magic> > We can't have that :) If it's information.txt then it's a text file for sure > :) > And if it has an unknown extension or none then we'll have a false positive > here, > but that's harder to fix. In any case this looks like a fragile rule for a > 80-priority rule...
We should have a better magic instead, I guess. People get miffed off when their files are being detected solely on the file type, eg. loading a playlist from a crappy website that uses PHP, you end up with a /tmp/file.php which is actually a playlist and should be handled as such. _______________________________________________ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg