I had mentioned earlier that I'd send something on this. These thoughts are 
from working on a few modules and on osis2mod.

There are several things that play into this:
1) Titles: These use the <title> element for their content. This has been the 
focus of much of the discussion. The Show/Hide Heading filter was designed with 
this in mind. Later, the ability to always show canonical titles (e.g. Psalm 
titles) was added.

2) Rich content in titles. Canonical titles are the premier example of this, 
having Strong's Numbers and Morphology info; Markup for Divine name; notes, ....

3) Sections. The OSIS spec suggests that a title should be within and at the 
top of a <div type="section"> element. They typically surround verses. That is 
the <div> and </div> should be between verses.

4) Paragraphing. The <p> element typically surrounding verses. Often they are 
in sections. Likewise the <p> and </p> should be between verses. (Note: <p/> 
(empty paragraphs) is just plain bad form.)

5) Split verses. A verse may be split by titles, sections and paragraphs. I 
don't particularly like it, but I've seen it. I could very will be wrong, but I 
think it is an artifact of a translation using a KJV versification but 
disagreeing where the verses really start and end.

6) Poetry. This uses three elements <lg>, <l> and <lb> (from memory) to create 
a group of stanzas where each might be split over several lines. Poetry often 
starts in the middle of a verse. And may end within a verse. But it is not 
uncommon for it to surround verses. That is to say we can expect these elements 
between verses too.

7) Arbitrary interverse content. Introductory material can be pretty much 
anything. Typically we expect this at the beginning of Bible books and even 
chapters. It is not unreasonable for it to occur between verses within a 
chapter, as in a study Bible.

8) Block element handling. HTML agents have special handling of nested block 
elements. Simplistically, a block element start that follows one or more block 
starts is treated specially, often coalescing vertical whitespace. If the block 
element has particular visual styling (margins, padding, indentation, ...), it 
is applied. I mention this because there have been numerous comments about too 
much vertical whitespace. In handling vertical whitespace, I think a 
distinction needs to be made between structural markup that needs to be 
retained even if titles, headings, introductions are hidden.

9) osis2mod transforms from BSP (Book/Section/Paragraph) into BCV 
(Book/Chapter/Verse). This allows for a verse in isolation to be valid xml. 
This makes <div> (and other block elements) to no longer behave like HTML 
containers.

10) x-preverse markup. Currently osis2mod is using (where %d is a matched pair):
<div type=\"x-milestone\" subType=\"x-preverse\" sID=\"pv%d\"/>...pre-verse 
content...<div type=\"x-milestone\" subType=\"x-preverse\" eID=\"pv%d\"/>
Note: These are merely milestones and should never produce whitespace of any 
kind. The only purpose of the construct is to know what is before the verse. A 
problem is that the Show/Hide Headings filter treats this as something that can 
be toggled. It may contain much that needs to be retained. (see 8)

11) Retention of all markup (except the <verse> element) in the order that it 
appears in the input. Module authors are going far beyond a simple markup of 
just the basic verse content. We've published in the wiki best practices in 
marking up various things. If followed it should have a reasonable rendition in 
a module. (Please, let's not diverge on to the verse element discussion. It 
doesn't change the problem at hand.)

Troy suggested bolstering the test case. I'm not at all sure how to go about 
doing that. Especially the expected output.

Hope this helps.

In His Service,
        DM
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to