On Mon, May 4, 2009 at 7:37 PM, Guilhem Bonnefille
<guilhem.bonnefi...@gmail.com> wrote:
>
> Hi,
>
> Thanks for trying to solve this bug.

I'd to tell all my friends I've solved a viking bug ;)
and, it almost bit me...

>
> I prefer to avoid, as far as possible, any detection based over
> filename extension: this is not really UNIX-friendly.

I think that the magic number approach is too complicated for an xml
document, and it does not contribute that much to the usability, if at
all.

Even now, assuming we do not have the bom problem, viking will not
accept gpx files not starting with "<?xml". I'm not sure what the gpx
specs says, but I'm pretty sure that the xml standard does not require
one to start a document with: <?xml version="1.0"?>
And of course, we need to consider the different character encodings.

>
>
> As far as I know, the matter concern the BOM, ie some extra bytes
> added at the begining of the XML file. So, a probably easy way to
> solve the matter is to add a "tolerance" in the search process. For
> example, try to locate the MAGIC at the begining, or somewhere
> ignoring the 2~3 first characters.

Several problems with the magic number approach for gpx:
1) The current magic number for gpx, "<?xml", is too liberal. It will
catch any xml file which starts with which is not very good in the
first place.

2) The current magic number for gpx, "<?xml", is too restrictive. It
won't catch valid gpx files not starting with "<?xml"

3) Character encoding: At least utf-8 and utf-16le and utf16-be must
be supported. Could be nicer to also supprt utf-32le and utf-32be.
...which affects most strncmp().

4) assuming we change the magic number to "<gpx", the 'first' tag: you
can't really be sure the offset in the file where that "<gpx" appears.
I'm pretty sure that an xml comment of any length may appear before
the "<gpx", in a valid gpx file.
Sure, most gpx files will have "<gpx" (as utf-8,utf-16le or utf-16be)
very near the start of the file, but it is very bad to reject a valid
gpx file because it had a longer comment on the start.


>
>
> An other more evolved solution can be to use a mime-type framework.
> Glib/Gtk/Gnome certainly has one.

I'm no expert on the subject, but a quick check show that important
gnome mime types are identified using magic number OR file extention.
for example: totem open dialog suggests opening real flash file
without .flv extension (based on magic val), AND ALSO  *flv regardless
of the file content. Nautilus behaves the same: double click on
123.flv, it will be opened in totem even if its not a flash file.

Also worth mentioning, that on my system, utf-8 gpx file are
recognized as xml files (based on the "$<xml") in nautilus, but utf-16
gpx file are not.

It seems to me that the magic numbers system in gnome is more of an
educated guess to accompany the more traditional file extension
filtering, than a bulletproof system upon  which you may decline to
open a certain input file.

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Viking-devel mailing list
Viking-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viking-devel
Viking home page: http://viking.sf.net/

Reply via email to