DM Smith wrote:
OK. I have no need to generate the <speech> element, as there is no USFM equivalent, so I'll ignore it, too. :-)Kahunapule Michael Johnson wrote:How does the Sword project handle display of OSIS text quotations when: 1. the <q> or <speech> element is used without a marker attribute,The speech element is not handled, except to process its content. It is as if the element were not in the text at all. I think the speech element is to indicate the speaker, not that what's said is a quote. I won't mention the element <speech> below. In that case, would open quote reminders be inserted at paragraph and stanza beginnings automatically, or would that require a cQuote milestone to make happen? (I'm just curious. Normally, I'm interested in just making sure this doesn't happen, since the quotation punctuation is already fully specified, and it may not conform to current English usage. However, in the hypothetical case where someone wanted this to happen, I'm curious how it would be done.)Assuming that the module's conf does not have osisQToTick=false (i.e. it defaults to true when not present), then the level attribute determines the quotation mark that will be used, alternating double quote and then single quote. If no level attribute is present, then it uses a double quote.It will use the same mark when it gets to </q>. So in the milestone elements, markers may vary. That is actually good, since sometimes quotes are introduced with an em dash and close with a newline, or some other asymmetrical case.The same holds true when milestoned versions of <q> are used, except that <q eID="xxx"/> elements will not cause the code to look at the opening <q sID="xxx"/> for a marker attribute. Instead, it will use the marker attribute, or it's lack to determine what to output. So osisQToTick=false is essentially equivalent to putting a marker="" attribute on all <q> elements?However, if osisQToTick=false, no quotation mark is used. Good. :-)2. the <q> or <speech> element is used with a marker attribute,When the marker attribute is present, it is used. OK... what, exactly, does that mean? Does that make a difference for anything besides the option of rendering Words of Jesus in red (or some other alternate color) for display? Normally, the point of knowing if something is in a quote or not is to display the quotation marks correctly, but if there are no quotation marks to display (or they are already in the text in whatever way is appropriate for that language), then Sword doesn't actually need to "know" when something is a quote or not, does it? Or is there some search feature or function that I'm not aware of that would use such knowledge?3. no <q> or <speech> elements appear, orThen as far as sword is concerned then it is not in a quote. This is good. Very good. :-)4. quotation punctuation (“, ‘, ’, ”, «, », —, newline, etc.) appears outside of <q> or <speech> elements (i. e., not in a marker attribute)?Any punctuation in the text is produced as is. This is good to know. I regard this (or something like it) as an essential feature if all quotation marks are going to be put in markup.Another feature of OSIS is <milestone type="cQuote" marker="xxxx"/> This is used for a continuation quote. (substitute xxxx with the appropriate quote mark) This is an interesting concept-- and one that is helpful to me. You see, I thought that marking WoC per verse was bad OSIS the way I read the documentation, but it sure makes conversion from USFM (which actually demands that sort of markup) easier (because I don't have to discard adjacent end + start pairs with no actual text in between, just a verse marker), and it also makes display easier on a verse-by-verse basis (like Sword does) easier if you are working from raw OSIS. The same technique would be useful for translating the USFM \qt ...\qt* markup (which is marked verse-by-verse to indicate OT quotes in the NT) to <q marker="" who="OT" sID="somethingunique">...<q marker="" who="OT" eID="somethingunique">. If you regard this as acceptable, then I'll just embrace it quickly before anyone objects. :-)Words of Christ (WoC) can be indicated by adding who="Jesus" to the <q> container element or to both the milestone elements. In the KJV, ESV and upcoming NASB modules, the WoC are marked on a per verse basis, using the container form of <q>, with marker="". OSIS is very flexible, and there seem to be many reasonable ways to interpret how Scriptures should be encoded. At this point, there are so many ideas out there, I would like to just start with one goal: encoding OSIS texts from USFM in such a way that Sword displays them properly. If that works, then there is a good chance the resulting OSIS will be of use to others, as well. Would it be too weird to separate q elements intended for replacing punctuation (with marker specified) from those used for what is essentially a character style (i. e. WoC)? Like <q marker="“" sID="aoeu"/><q marker="" sID="qjkx" who="Jesus"/> (actual quotation) <q marker="" eID=qjkx" who=Jesus/><q marker="”" eID="aoeu"/>, where the actual quotation may span several verses, and the inside set of markers may be ended and restarted with each verse? I already have some LGPL C# code that does a reasonably accurate job of recognizing quotation marks in English text that I use for checking quotation-mark balancing. It doesn't work very well for other languages, because it uses some English-specific rules to disambiguate apostrophes and closing single quotes, and doesn't even handle the case where the same marker is used for glottal stop. (The latter is bad practice in Unicode, but some people do it anyway.) Does your quote recognizer work for non-English Bibles with different writing systems and different punctuation rules?I want to (1) ensure that Bible texts are displayed correctly, and (2) minimize the amount of manual labor necessary to make #1 happen. It should not be necessary to do any manual editing of Bible source texts in well-formed Unicode USFM to create a valid Sword module. (USFM or something close to it is the format in which a very large number of minority-language Bibles exist.) In USFM, quotation punctuation, if any, is in the text of the document, with no special markup. In an informal extension to USFM, sometimes << is used for “, < for ‘, etc. (A space is required to disambiguate “‘ and ‘“.) Speaking of ambiguity, apostrophe, closing single quote, and (in some languages) glottal stop all use the same character. This ambiguity, coupled with language and style considerations, seems to be a serious problem in automatically converting from either GBF or USFM to OSIS, in general.I have recently written a quote recognizer in C++. I did find that an apostrophe is potentially ambiguous, but in the source I was working, it was not an issue. Fortunately, my input use ` for a single quote start and ' for an end quote. This made disambiguation significantly easier. If you wish, I can send you the routine. Indeed, it looks like I have at least two ways to get the level of quotation support I want: (1) always put quotation punctuation in marker attributes of q elements or cQuote milestone elements and specify empty marker elements when using q just for WoC, or (2) [pause to don body armor and start running] always put quotation punctuation in the text and use q elements with empty marker attributes just for translating USFM \wj ...\wj* and \qt ...\qt* markup on a per-verse basis. Option #1 has the major disadvantage of requiring finding all of the quotation punctuation in text I may not be able to read, let alone understand the grammar of, for conversion purposes. Option #2 has the disadvantage of potentially offending certain people who have, at least so far, held the deep religious conviction that all quotation punctuation should live in markup, not the text of the Bible, but it has the major advantage of the simplest, fastest conversion possible from USFM to OSIS, with no manual labor required for each translation (other than making sure the source text is really in Unicode USFM). Although option #2 seems like it would work just fine, at least functionally if not idealistically, I'm concerned that someone might think such texts weren't pure enough OSIS, and not use them. If that is the case, then perhaps I really would be better off going back to GBF... or just punting on this whole converter and move on to improving my converters to other formats for other Bible study software.I'm wondering if I should target OSIS or GBF as a target format for a converter I'm writing, and also working on updating the dialect of OSIS that the World English Bible and HNV are distributed in. While I'm not in favor of dropping support for GBF, yet, I'm not very thrilled about the idea of putting any new work into supporting it, either. However, if I can't make an OSIS module without a lot of manual labor, any reasonable alternative is worth considering.Remembering your earlier posts about OSIS's lack of quotation support, I think I can now say that it provides you the level of control that you wish. Having done three modules myself, I think that OSIS 2.1.1 is sufficient for Bible texts. So, I'd suggest OSIS. In the case where the translators have made use of the <<, <, >, >> quotation markup option in their SFM, which is actually a fair number of them, I would like to convert those to the appropriate q elements with markup specifying the normal equivalent of those markings. I'm loathe to mess with apostrophe/ending single quote disambiguation for non-English texts, though. I don't see any benefit to doing so, really, but maybe I'm missing something? What do you think? Michael |
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page