On 09/15/2012 10:26 AM, DM Smith wrote:

On Sep 14, 2012, at 8:15 PM, Chris Little <chris...@crosswire.org>
wrote:



On 09/14/2012 01:02 PM, Greg Hellings wrote:
So I've been debugging a module display problem in BibleTime. I
mentioned it on IRC with Troy the other day but we weren't able
to connect at the same time to discuss further. The issue has to
do with paragraph tags - in osis2mod these tags are being
converted from <p> to <div sID="someid" type="paragraph" />.

This is extraordinarily bad. This is a change in semantics, because
<p> and <div type="paragraph"> are not semantically equivalent.



<p> marks the type of paragraph we all probably think of first:
generally, a chunk of text with newlines before and after.

<div type="paragraph"> marks a formal division within a text that
happens to be identified as a 'paragraph' and may consist of
multiple <p>-type paragraphs. Examples of these divisions are found
in many laws and the Catechism of the Catholic Church (which does
exist in OSIS form). Here's part 1, section 1, chapter 1, article
1, paragraph 1 of the CCC:
http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it
consists of many <p>-type paragraphs but is a single <div
type="paragraph">-type paragraph.

No where in the OSIS manual does it give any indication of a semantic
difference.

The manual is, of course, not exhaustive. It doesn't actually say anything about <div type="paragraph">, and notably doesn't suggest that there is any alternative to <p> within the section on paragraphs.

Correct me if I'm wrong, but I don't believe there is any case anywhere within the OSIS spec that two distinct methods of marking a structure are semantically identical. So all of the following are semantically distinct:
<chapter> vs. <div type="chapter">
<p> vs. <div type="paragraph">
<l> vs. <lb/> vs. <milestone type="line">
<closer> vs. <div type="colophon">

It's possible there was some corner case that necessitated allowing two forms of markup for a single type of semantic structure, but I can't think of one and would hope there was a really good reason for allowing it.

The inclusion of <div type="paragraph"> in OSIS is quite possibly to be attributed to me since the Catechism of the Catholic Church was an early OSIS demo document I produced for ABS and presented at a conference at the University of San Francisco. It's still apparent to me that the value is necessary, in spite of the potentially confusing name.

Abhorrent though I consider milestoned <p/>, I think I would much
prefer to see us map <p>...</p> to <p sID=""/>...<p eID=""/> than
see us clobber the semantics of a defined <div> type.

It may be abhorent from a module authoring perspective, but from a
software perspective, it is needed. I think it is better than <div
type="x-p" ...>.

Agreed.

In OSIS the only container element that is not milestoneable is <p>.
The goal of osis2mod is to create BCV where verse is the container.

All SWORD/JSword software requires that a verse in isolation  can be
meaningfully rendered. (for hit lists, verse lists, parallel view,
cross-reference popups, ...)

If we had a mode flag for SWORD and JSword that would indicate the
scope (chapter or verse), then the render filter could do BSP for
chapter and BCV for verse.

I would rather see milestoned <p> too. However, it seems that the
spec is not being maintained/updated. We have a page in the wiki with
our recommendations for changes to the OSIS spec. How can we move
them forward?

I'd suggest that we maintain our own OSIS schema with the changes and
fixes mentioned there and use that in our module validation.

To be clear on my perspective, I don't think milestoned <p> should become valid OSIS. I don't mind us violating the schema internally, but our needs for processing data don't necessitate that milestoned <p> be allowed in any OSIS document anywhere. But then again, I still believe the milstonability of <div> is a travesty.

It's maybe time to nudge all the OSIS principals again, to see if we can get things rolling. In lieu of that, I would recommend that we pick up the standard and fork it. There are bugs in the schema. There are various bits of USFM that have become standardized & need to be mirrored in OSIS to complete mapping. And obviously, there is a collection of reasonable improvements that have been suggested. If no one else will maintain the standard, we may as well.

I would agree that the filter output is buggy if we're generating
disallowed tag forms. OSIS <div> and <p> would need to be
translated to their correct, non-self-closing HTML forms. Beyond
those two, I can't think of any tags that have the same form &
general semantics in both OSIS & HTML.

Table cells and list items are similar between OSIS and HTML:
container elements that generally imply vertical whitespace.

<table> is another element that OSIS & HTML have in common, but I think that's the only common element pertaining to tables or lists. All the other elements for table & lists at least have different element names (e.g. OSIS <list> vs. HTML <ol>/<ul>).

The problem with exactly matching element names (<div>, <p>, & <table>) is that filter-writers are liable to be lazy and forget that they can't just ignore the difference in attributes. Rather, they're likely to pass such elements through the filter, leading to invalid attributes or invalid, self-closing elements.

--Chris

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to