On Mon, May 4, 2009 at 7:37 PM, Guilhem Bonnefille <guilhem.bonnefi...@gmail.com> wrote: > > Hi, > > Thanks for trying to solve this bug.
I'd to tell all my friends I've solved a viking bug ;) and, it almost bit me... > > I prefer to avoid, as far as possible, any detection based over > filename extension: this is not really UNIX-friendly. I think that the magic number approach is too complicated for an xml document, and it does not contribute that much to the usability, if at all. Even now, assuming we do not have the bom problem, viking will not accept gpx files not starting with "<?xml". I'm not sure what the gpx specs says, but I'm pretty sure that the xml standard does not require one to start a document with: <?xml version="1.0"?> And of course, we need to consider the different character encodings. > > > As far as I know, the matter concern the BOM, ie some extra bytes > added at the begining of the XML file. So, a probably easy way to > solve the matter is to add a "tolerance" in the search process. For > example, try to locate the MAGIC at the begining, or somewhere > ignoring the 2~3 first characters. Several problems with the magic number approach for gpx: 1) The current magic number for gpx, "<?xml", is too liberal. It will catch any xml file which starts with which is not very good in the first place. 2) The current magic number for gpx, "<?xml", is too restrictive. It won't catch valid gpx files not starting with "<?xml" 3) Character encoding: At least utf-8 and utf-16le and utf16-be must be supported. Could be nicer to also supprt utf-32le and utf-32be. ...which affects most strncmp(). 4) assuming we change the magic number to "<gpx", the 'first' tag: you can't really be sure the offset in the file where that "<gpx" appears. I'm pretty sure that an xml comment of any length may appear before the "<gpx", in a valid gpx file. Sure, most gpx files will have "<gpx" (as utf-8,utf-16le or utf-16be) very near the start of the file, but it is very bad to reject a valid gpx file because it had a longer comment on the start. > > > An other more evolved solution can be to use a mime-type framework. > Glib/Gtk/Gnome certainly has one. I'm no expert on the subject, but a quick check show that important gnome mime types are identified using magic number OR file extention. for example: totem open dialog suggests opening real flash file without .flv extension (based on magic val), AND ALSO *flv regardless of the file content. Nautilus behaves the same: double click on 123.flv, it will be opened in totem even if its not a flash file. Also worth mentioning, that on my system, utf-8 gpx file are recognized as xml files (based on the "$<xml") in nautilus, but utf-16 gpx file are not. It seems to me that the magic numbers system in gnome is more of an educated guess to accompany the more traditional file extension filtering, than a bulletproof system upon which you may decline to open a certain input file. ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Viking-devel mailing list Viking-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viking-devel Viking home page: http://viking.sf.net/