For the record, I think, It is the responsibility of the module developer to ensure that the input to osis2mod is valid. Since there have been several versions of the OSIS spec (currently at 2.1.1) it might be a reasonable question as to which the minimum version we would accept. I'd go with 2.0 or later. As long as Chris is the "pumpkin holder" of module creation, it is not a big deal. But without validation being done by osis2mod, there is no way to ensure this.
Even with xml validation, it is very possible that an OSIS document is not valid OSIS. Part of this is due to the milestoneability of some elements, but no schema imparts semantics. So while schema validation is important, it is not sufficient. Osis2mod needs to ensure that the OSIS is sufficiently valid for the current front-ends. Osis2mod modifies the input into a form that is acceptable to a Sword module. Thus the round trip from input to osis2mod and out again, will not match the original. For example, a module is verse based, so intra-verse material needs to be pre-appended or appended to a verse. Currently osis2mod attempts to check two things: 1) that the document is well formed (this is far from a validity check). This was an error, causing the program to exit. 2) that each verse is well formed. This is a warning. However, my suggestion that osis2mod use a real parser, was not predicated on the need for validation. But rather the need to support all well-formed inputs. Perhaps, I am biased by Java, but I think it can be done without impacting program size significantly. In Java, the xml parser is an implementation of an interface. At runtime it is possible to specify an available implementation. I think that if we were to do something similar in C++, perhaps choosing a SAX interface, we could wrap XMLTag by it. And then one could link in either Xerces, Sword, or some other implementation. Then the size/performance cost would be appropriate for the use. As for validation, one could have an external validator called by fork/exec on the input file. This would not increase the program size significantly. In Him, DM On Apr 25, 2007, at 6:47 PM, Chris Little wrote: > DM and I have been chatting a bit off-list about the future/ > function of > osis2mod and I thought maybe we should open up the discussion a bit. > > Right now osis2mod (the tool for converting OSIS Bibles to Sword Bible > modules) does some mediocre validity checking as it builds its Sword > database. We'll never really get it perfect this way since we aren't > doing real schema validation. > > DM has suggested adding a real validating parser to osis2mod (by > embedding something like xerces or libxml), so it could spit out an > error message if you try to import invalid OSIS. > > I'm not totally convinced we should do that. When I prepare modules > from > OSIS docs, I always perform validation in an external validator. > (Personally I use Oxygen, but there are also XML Spy, MSV, topologi, > Xerces, etc.) > > Do people feel that incorporating a real validator would make osis2mod > easier to use? > > It could potentially cause the filesize to jump dramatically, so would > that be acceptable? > > If we incorporate osis2mod into either front-ends or installmgr so > that > users could import OSIS documents directly into Sword, would that > support or detract from the case for embedding a full validator? > > --Chris > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page