DM Smith wrote:
Kahunapule Michael Johnson wrote:
  
So... it sounds like I could simply convert USFM to OSIS with the
obvious conversions (like \p ... -> <p>...</p>) plus
  
    

Remember to have the <p> surround the entire paragraph.
  
Of course.
In various source I have seen the equivalent of \p be nothing more than 
a paragraph separator, with the ambiguity that the first verse of 
chapters does not have a \p. There may be paragraphs that don't begin or 
end on a chapter boundary.
  
In USFM \p marks only the beginning of a paragraph. The end of a paragraph is implicitly marked by the beginning of any other paragraph-class marker (such as a title, subtitle, another paragraph, poetry lines, a blank line, etc.) or the end of a chapter. To solve the problem of a paragraph not actually ending at a chapter boundary, they invented the \pc marker to indicate that a paragraph continues here. So, the processing is a little more complex than just putting a <p> at the beginning, a </p> at the end, and replacing all \p markers with </p><p>.
osis2mod will convert the open and close tags to <lb 
type="x-begin-paragraph"/> and <lb type="x-end-paragraph"/>, 
respectively. These x- types are non-standard, but they allow a lossless 
reconstruction of the original.

  
\qt -> <seg type="otPassage" sID="someid"/>
\qt* -> <seg type="otPassage" eID="someid"/>
  
    

When you use sID/eID, OSIS "requires" that they be paired and each pair 
have unique values. Sword does not care at this point in time about this.

I found having a stack for each distinct milestone usage (e.g. <q>, 
<seg>, <div>) it is constructive to have a stack and a counter. When a 
open element is found, its counter is pushed onto the stack and 
incremented. When an close element is found, it is popped off the stack. 
For quotes, I find the depth of the stack useful for populating the 
level attribute. If when the document is finished the stack is 
non-empty, then I have a bug somewhere.
  
The contents of the sID and eID are entirely redundant in most cases, but for the sake of the specification, I currently generate them with the OSIS ID of the verse containing the beginning marker, concatenated with the next value from a global counter. There may be better or worse ways to generate the sID/eID pairs, but it seems to me to be largely irrelevant, since the only potential constructive use for them is for an OSIS reader to spit out an error message if an eID doesn't match its corresponding sID. I also use stacks for saving the eID.
\wj -> <q who="Jesus" marker="" sID="someid2"/>
\wj* -> <q who="Jesus" marker="" eID="someid2"/>
  
    

With the WoC, I would ask, selfishly, that you use the container form of 
<q>, that is
<q who="Jesus">...</q>

so
\wj -> <q who="Jesus">
\wj* -> <\q>

JSword cannot handle the milestoned form at this time.
  
OK. This isn't too burdensome since we aren't allowing the WoC markup to cross verse boundaries. I just won't let it cross paragraph boundaries, either. This additional restriction pretty much locks us into per-verse markup of WoC instead of marking, for example, the whole Sermon on the Mount with one pair of milestones.
...
For a general purpose converter, that deals with apostrophes having 
meanings that differ according to the language and the text, it 
reasonable to not disambiguate them.
  
Agreed.
Since the q elements generated from WoC (\wj) markup will never
span verses in this implementation, but the actual quote often does, it
is probably better to not combine the two resulting q elements at the
beginning and end of the quotation into one q element, because then the
start/end points wouldn't line up properly for one or the other of the
meanings of the element (quotation start/end marking vs. text coloration).
  
    

Right. The two cannot be combined. Precisely because they are two 
different semantics, differing in markup and in meaning.

There are some instances where there is an "island" of text, a gloss, a 
parenthetical statement, by the book's author, in the WoC that does not 
force the quote to begin and end around it, but needs to be 
distinguished from it.
  
Good point.

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to