I
appreciate the problems with trying to use an inherently
tree-structured notation like XML to mark overlapping regions. I'm not
in touch with what the OSIS thinkers are thinking: other than the
milestone approach, has anybody considered combining in-line markup
identifying critical base elements (probably words) with standoff
indexed markup? As a concrete (and over-simplified) example, the text
would be marked in-line like this (Matt 27:11):
<t id="1">Jesus</t> <t id="2">said</t> <t id="3">,</t> <t id="4">"</t> <t id="5">You<t> <t id="6">have</t> <t id="7">said</t> <t id="8">so</t> <t id="9">.</t> <t id="10">"</t> with red-letter spans indexed with start-end indices <woc start="5" end="9"/>. The standoff index markup doesn't need to nest, it just marks spans. Assume the numbering space for words extends throughout a book, you can have quotations, sentences, or paragraphs that span verses, etc. And of course individual tokens can be marked as punctuation, starting vs. ending quotes, etc. I assume somebody has already considered this and there's a good reason why it doesn't solve all the problems (or introduces new ones). Sean |
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page