Re: [sword-devel] Entities in modules

DM Smith Thu, 12 Nov 2009 10:07:52 -0800

On 11/12/2009 12:08 PM, Sebastien Koechlin wrote:

On Wed, Nov 11, 2009 at 03:50:12PM -0500, DM Smith wrote:

We have a few modules that have entities in them. These are of the fashion
&nbsp; (a character entity),&#85; (a numeric decimal entity) and&#xC5;
(a numeric hex entity).


These cause various problems:

This is because osis2mod does not use an XML parser.

I'm not seeing the problem in OSIS modules, but in ThML modules. Theyare perfectly valid in ThML modules, but are problematic. I will begoing over all the modules looking for these and will report problematicCrossWire modules in www.crosswire.org/bugs. And I'll pass along anyproblems I find in the Xiphos and Bible.org modules.

My understanding is that a true XML parser has strict requirements as tohow it is to handle errors: put out an error message and die.

If we used a true XML parser for osis2mod, it would die on the firstcharacter entity that was not &, <, > or " unless it weredefined in the schema. OSIS does not define additional character entities.

We make the assumption that input to osis2mod has been validated againstthe OSIS schema. If this is true then there are no character entities inthe input.

  Character entitie is
just a useful way to write a characters you can not or you want not to
put in your XML file. When parsed and resolved, they must not be
distinguable from others characters. The same apply for CDATA sections.

I agree with the statement above as far as it goes. But what is the XMLparser to do when it discovers a character entity that it cannot resolve?

osis2mod should not keep entities when reading an OSIS file. I think it's a
big mistake and we should not rely on external programs many people will
have trouble to run.

I'd agree that numeric entities should be converted. And I think thatosis2mod should complain if it finds entities that are not valid for anOSIS document and prompt the user to validate the input document.

Regarding module writers having trouble running tools, we've talkedabout having a web service at CrossWire.org that would provide theappropriate validation, conversion, creation, .... of an OSIS text.We've just not had a volunteer step up to the task.

We also had troubles with non-canonical Unicode sequences and I think
osis2mod was corrected.

Named entities as nbsp came from HTML and should not be used in OSIS as they
are not declared in osisCore.2.1.1.xsd, it result in an invalid document.
BUT, as we do not use an XML parser, we can use the HTML DTD[1] to resolve its
and be more friendly with OSIS writers.

The problem with using entities that are not allowed in OSIS is that onecannot validate against the OSIS schema. And because OSIS is not HTML,one cannot validate against it either.

For osis2mod to handle other character entities other than the 4mentioned above, means that it cannot expect valid OSIS.


[1] see thoses URL, for this a perl program can produce a .cc or .h file.
        http://www.w3.org/TR/html4/HTMLlat1.ent
        http://www.w3.org/TR/html4/HTMLsymbol.ent
        http://www.w3.org/TR/html4/HTMLspecial.ent


The code I provided does so many more than just these character entities.


(Sorry if my message look rude, I'm not native english speaker)

I didn't take your response as rude. I appreciate your input. I thinkour goals are the same, to produce the highest quality modulesminimizing the effort to do so.

All for God's glory.

In Him,
    DM

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Entities in modules

Reply via email to