Re: [sword-devel] osis2mod warnings
On Oct 10, 2012, at 1:02 AM, Greg Hellings greg.helli...@gmail.com wrote: There are two osis2mod warnings I'm getting that seem strange to me. The first is when it encounters post-verse material. It will issue a warning such as follows: INFO(V11N): Jude is not in the KJV versification. Appending content to 3John.1.14 INFO(WRITE): Appending entry: 3John.1.14: *snipped content of 3John.1.14* Now the material in question is simply a few closing colophons and/or closing tags from the material in 3 John. Jude is, most definitely, a part of the KJV versification. But when I got this message the first time I ended up spinning in circles a few times looking for the issue before realizing it was just an oddly worded debug statement. Perhaps this should be altered to reflect what is actually happening? Osis2mod splits the input into chunks and tries to figure out which slot in the index to put that chunk. W/o seeing your file, it is hard to tell what the problem actually is, so I'm going to guess. Maybe the colophon is not contained within the last chapter/book of John? Anything between books may be taken to belong to the next book. There may be a problem in assigning that to Jude.0.0 (i.e. the book introduction). That the osisID given is Jude, this indicates that it did not find Jude.1.1 when it tried to add a chunk, therefore it attaches to the last seen verse in the av11n, which in this case is 3John.1.14. OSIS requires a chapter of 1 for single chapter books. I don't think this is the problem. If you can supply the end of 3John and the start of Jude, I can probably pinpoint the problem and perhaps improve the diagnostics, if it is not a code change. Secondly is the warning pair: WARNING(NESTING): verse Gen.10.2 is not well formed:(3,5) WARNING(NESTING): verse Gen.10.4 is not well formed:(5,3) The offending text reads: verse osisID=Gen.10.2 sID=Gen.10.2/listitem type=x-indent-1Ahima a Zhafeti yaali: Goomeri, Magogi, Madayi, Yavani, Tubali, Mexeki, ni Tirasi./item item type=x-indent-1verse eID=Gen.10.2/ verse osisID=Gen.10.3 sID=Gen.10.3/Ahima a Goomeri yaali Axekenazi, Rifati ni Togaarima./item item type=x-indent-1verse eID=Gen.10.3/ verse osisID=Gen.10.4 sID=Gen.10.4/Ahima a Yavani yaali: Elixa, Tarixixi, Kitiimi ni Rodanimu./item/list pverse eID=Gen.10.4/ My assumption is that the not well formed warning is coming from the list ... /list that is spanning across multiple verses. Yet strangely these spanning item tags (misplaced by usfm2osis.py or in the wrong place in the USFM files) are not generating a warning. What is the malformed issue here? xmllint seems to think the file validates against the official OSIS schema, so it's not an XML validation issue. The issue is not an xml one, or one of syntax, but rather semantics and use. First, the warning is just that: a warning. It is not an error. It is saying that the verse as a fragment is not well formed. Osis2mod transforms container elements to milestone form to handle this. Perhaps this is not transformed or is not available to be transformed. If transformed, then the verse is well-formed and the warning should not happen (IIRC). Second, the problem with this is when the verse is shown in isolation. If the verse is shown in isolation, it may not render properly because it does not have the entire context. Also, JSword has a requirement that a verse be a well-formed fragment. Otherwise, it strips out all markup and leaves the content. (This gets confusing when notes are shown inline!) JSword needs to change. I guess the message could be made to be clearer. Also, this list construct is problematic. It has nothing to do with the warning. First, why is list and item used? Is it really a list? I don't know how SWORD renders this, but JSword uses a bulleted list. If this is markup for poetry, it should be using OSIS poetry markup. Second, having verselist rather than listverse will probably cause the verse number to be rendered on the line prior to the start of the list. Third, having verseitem may have the same orphan verse number problem. The item element typically causes rendering to be done at the start tag. A list should typically end after a verse. (Same with other container markup.) It certainly should not be immediately before a verse end. The p is not placed well. It does not make sense that a paragraph starts within a verse, unless it splits the verse content. (same with other container markup.) This may be more of a problem with usfm2osis. Any help would be appreciated. Don't know if that helps. --Greg ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org
[sword-devel] usfm2osis.py and tag \cp
The USFM \cp tag (used for chapter markers different from those of the used versification) crashes usfm2osis.py reliably. The programme needs a Ctrl-C interrupt to get out of its state. Following minimal USFM code creates below attached error message. \id EST \h ESTER \c 1 \cp A \s En Mordekai eh Ouraman \p \v 1 Mordekai, - peter@AOA110:~/Bibles/Pohnpeian/apo_usfm/Pohnapo$ usfm2osis.py Pohnester 17ESTpohn.SFM.bak Sorting book files naturally. Process Worker-2: Traceback (most recent call last): File /usr/lib/python2.7/multiprocessing/process.py, line 258, in _bootstrap self.run() File /usr/local/bin/usfm2osis.py, line 1440, in run osis = convertToOsis(job) File /usr/local/bin/usfm2osis.py, line 1321, in convertToOsis osis = cvtChaptersAndVerses(osis, relaxedConformance) File /usr/local/bin/usfm2osis.py, line 640, in cvtChaptersAndVerses osis = re.sub(r'(chapter [^]+sID[^]+/.+?chapter eID[^]+/)', replaceChapterNumber, osis, flags=re.DOTALL) File /usr/lib/python2.7/re.py, line 151, in sub return _compile(pattern, flags).sub(repl, string, count) File /usr/local/bin/usfm2osis.py, line 633, in replaceChapterNumber ctext = re.sub(r'\$BOOK\$\.([^\.]+)', '$BOOK$.'+ca+'', ctext) UnboundLocalError: local variable 'ca' referenced before assignment ^CTraceback (most recent call last): File /usr/local/bin/usfm2osis.py, line 1557, in module k,v=result_queue.get() File /usr/lib/python2.7/multiprocessing/queues.py, line 117, in get res = self._recv() KeyboardInterrupt ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] usfm2osis.py and tag \cp
Bugs tasks for usfm2osis.py may be reported as issues in JIRA under MODTOOLS. Chris has already begun to use JIRA for this purpose; see http://www.crosswire.org/tracker/browse/MODTOOLS-32 http://www.crosswire.org/tracker/browse/MODTOOLS-33 http://www.crosswire.org/tracker/browse/MODTOOLS-34 http://www.crosswire.org/tracker/browse/MODTOOLS-36 David -- View this message in context: http://sword-dev.350566.n4.nabble.com/usfm2osis-py-and-tag-cp-tp4651145p4651146.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] osis2mod warnings
Greg, I've seen these errors to. I found the 'problem' was associated with the first citation in the error, not the one being tacked on to. In your example: Jude is not in the KJV versification. Appending content to 3John.1.14 Is there anything (such as extra verses) tacked on to the end of Jude that strictly speaking isn't in the versification you are using? If so those 'extra' verses are being treated as the beginning of 3John.1.14. When I've encountered the problem, it was one of the two following scenarios: either the text provided extra verses not found in the versification, OR there was extra text in the chapter that appeared to OSIS2MOD to be extra verses but which should have appeared to be treated differently in the OSIS such as textual variants to be treated as textual variants. In the former case (a few verses not found in the versification), it would be nice to be able to influence versification more directly as a module developer. Until that happens, the solution is to find a versification that MOST matches the text you are working with (my example of this the LXXE). In the latter case the solution is to simply look at the text causing the error, and to treat it differently by using the appropriate OSIS tags (treating variants as variants for example, etc). ~A On Thu, Oct 11, 2012 at 7:21 AM, DM Smith dmsm...@crosswire.org wrote: On Oct 10, 2012, at 1:02 AM, Greg Hellings greg.helli...@gmail.com wrote: There are two osis2mod warnings I'm getting that seem strange to me. The first is when it encounters post-verse material. It will issue a warning such as follows: INFO(V11N): Jude is not in the KJV versification. Appending content to 3John.1.14 INFO(WRITE): Appending entry: 3John.1.14: *snipped content of 3John.1.14* Now the material in question is simply a few closing colophons and/or closing tags from the material in 3 John. Jude is, most definitely, a part of the KJV versification. But when I got this message the first time I ended up spinning in circles a few times looking for the issue before realizing it was just an oddly worded debug statement. Perhaps this should be altered to reflect what is actually happening? Osis2mod splits the input into chunks and tries to figure out which slot in the index to put that chunk. W/o seeing your file, it is hard to tell what the problem actually is, so I'm going to guess. Maybe the colophon is not contained within the last chapter/book of John? Anything between books may be taken to belong to the next book. There may be a problem in assigning that to Jude.0.0 (i.e. the book introduction). That the osisID given is Jude, this indicates that it did not find Jude.1.1 when it tried to add a chunk, therefore it attaches to the last seen verse in the av11n, which in this case is 3John.1.14. OSIS requires a chapter of 1 for single chapter books. I don't think this is the problem. If you can supply the end of 3John and the start of Jude, I can probably pinpoint the problem and perhaps improve the diagnostics, if it is not a code change. Secondly is the warning pair: WARNING(NESTING): verse Gen.10.2 is not well formed:(3,5) WARNING(NESTING): verse Gen.10.4 is not well formed:(5,3) The offending text reads: verse osisID=Gen.10.2 sID=Gen.10.2/listitem type=x-indent-1Ahima a Zhafeti yaali: Goomeri, Magogi, Madayi, Yavani, Tubali, Mexeki, ni Tirasi./item item type=x-indent-1verse eID=Gen.10.2/ verse osisID=Gen.10.3 sID=Gen.10.3/Ahima a Goomeri yaali Axekenazi, Rifati ni Togaarima./item item type=x-indent-1verse eID=Gen.10.3/ verse osisID=Gen.10.4 sID=Gen.10.4/Ahima a Yavani yaali: Elixa, Tarixixi, Kitiimi ni Rodanimu./item/list pverse eID=Gen.10.4/ My assumption is that the not well formed warning is coming from the list ... /list that is spanning across multiple verses. Yet strangely these spanning item tags (misplaced by usfm2osis.py or in the wrong place in the USFM files) are not generating a warning. What is the malformed issue here? xmllint seems to think the file validates against the official OSIS schema, so it's not an XML validation issue. The issue is not an xml one, or one of syntax, but rather semantics and use. First, the warning is just that: a warning. It is not an error. It is saying that the verse as a fragment is not well formed. Osis2mod transforms container elements to milestone form to handle this. Perhaps this is not transformed or is not available to be transformed. If transformed, then the verse is well-formed and the warning should not happen (IIRC). Second, the problem with this is when the verse is shown in isolation. If the verse is shown in isolation, it may not render properly because it does not have the entire context. Also, JSword has a requirement that a verse be a well-formed fragment. Otherwise, it strips out all markup and leaves the content. (This gets confusing when notes are shown inline!) JSword needs to change. I guess
Re: [sword-devel] usfm2osis.py and tag \cp
I hope I've fixed this now. (I haven't tested that it functions correctly, but the error was fairly obvious from the traceback below.) The application will almost always need Ctrl-C to break out because of the multithreading (and because I haven't bothered to add much exception handling). --Chris On 10/11/2012 07:40 AM, Peter von Kaehne wrote: The USFM \cp tag (used for chapter markers different from those of the used versification) crashes usfm2osis.py reliably. The programme needs a Ctrl-C interrupt to get out of its state. Following minimal USFM code creates below attached error message. \id EST \h ESTER \c 1 \cp A \s En Mordekai eh Ouraman \p \v 1 Mordekai, - peter@AOA110:~/Bibles/Pohnpeian/apo_usfm/Pohnapo$ usfm2osis.py Pohnester 17ESTpohn.SFM.bak Sorting book files naturally. Process Worker-2: Traceback (most recent call last): File /usr/lib/python2.7/multiprocessing/process.py, line 258, in _bootstrap self.run() File /usr/local/bin/usfm2osis.py, line 1440, in run osis = convertToOsis(job) File /usr/local/bin/usfm2osis.py, line 1321, in convertToOsis osis = cvtChaptersAndVerses(osis, relaxedConformance) File /usr/local/bin/usfm2osis.py, line 640, in cvtChaptersAndVerses osis = re.sub(r'(chapter [^]+sID[^]+/.+?chapter eID[^]+/)', replaceChapterNumber, osis, flags=re.DOTALL) File /usr/lib/python2.7/re.py, line 151, in sub return _compile(pattern, flags).sub(repl, string, count) File /usr/local/bin/usfm2osis.py, line 633, in replaceChapterNumber ctext = re.sub(r'\$BOOK\$\.([^\.]+)', '$BOOK$.'+ca+'', ctext) UnboundLocalError: local variable 'ca' referenced before assignment ^CTraceback (most recent call last): File /usr/local/bin/usfm2osis.py, line 1557, in module k,v=result_queue.get() File /usr/lib/python2.7/multiprocessing/queues.py, line 117, in get res = self._recv() KeyboardInterrupt ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] genbook lexicons - example problem and potential solutions
I am still working on the Abbott-Smith markup project (over 300 entries and counting). We have four contributors right now, so the pace is picking up. Creating a module is another story. Chris made a lexicon module after the first release, but . . . I would like the module to look like this: http://www.textonline.org/files/abbott-smith/abbott-smith.current_release.html. To do that in SWORD, it needs to be a genbook in order to support: - front- and backmatter - page numbers - a hierarchical structure (In the original TEI it has at least one superEntry, but it is also divided into div's by letter heading [Α, Β, Γ, Δ, Ε, Ζ, Η, Θ, etc.]) The good news is that an OSIS genbook supports the bare-bones essentials of entries. And thankfully BPBible and BibleTime both display entries together in the same view, thanks to BPBible's continuous scrolling and *perhaps* BibleTime not recognizing div type=x-entry. Unfortunately various features of valid OSIS genbooks are inconsistently supported by front-ends. I created a module for testing. You can find it at https://github.com/translatable-exegetical-tools/Abbott-Smith/tree/master/releases/sword, including a valid OSIS file. Issues include: - Some front-ends recognize lb/, others p, but the lexicon uses both (and both are valid OSIS) in various contexts. - Tables are inconsistently supported (mostly not) - Titles should be centered, but there is no way to do that in OSIS, as far as I can tell. I wonder if this is a great example use case of per-module CSS... - Parts of speech should be green and page numbers red, but you can't do color in OSIS (another use case of per-module CSS?) Some of these like p, lb, and tables should just work, I think. Perhaps I will file bug reports. But the other display issues cannot be resolved by OSIS alone. Should TEI be a supported genbook format? I would think the TEI filter (as it evolves) could be pressed into use for genbooks. If that were done, certain lexicon-specific features as well as real book features such as page numbers could be consistently supported and displayed. On the other hand, I could see the value of having per-module CSS in the conf file so that the module developer could have some control over display. Any thoughts? Daniel ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] multiple languages in modules
Working on Abbott-Smith some things came together in my mind about modules that mix languages. I have identified two problems. First, modules that mix languages do not look good when fonts are chosen per module rather than per language (regardless of the language of the module). I go back and forth on which font to use as the default font in SWORD frontends. If I am primarily using Greek lexicons, I use a Greek font. If Hebrew lexicons then Hebrew. Cardo just does not satisfy, and the SBL Biblit font is not out yet. But the foreign element *should* mean that the front-end picks the right font for the right text, right? I was (quietly) ecstatic to discover that BPBible also handles foreign elements properly, displaying the proper fonts for each language in a module (such as Abbott-Smith) that has five languages, using three different fonts. So cool. But most front-ends do not. Is the foreign element passed through the engine? If so, do I need to file bugs with front-ends to encourage support of foreign? Second, when RtoL text is mixed with LtoR text you can get some strange display problems. Punctuation and numbers can work for both types of languages. Take an example from the entry ἀγανάκτησις in Abbott-Smith. It uses the aleph character, with a number 1 following. Just typing this as plain text gives you: א1. This is incorrect even though the numeral was typed after the letter. I found a solution in HTML using bdo. The page at http://www.textonline.org/files/abbott-smith/abbott-smith.current_release.html properly handles this issue. It actually transforms raw TEI using XSL with a CSS stylesheet to handle certain display issues. These stylesheets can be found at https://github.com/translatable-exegetical-tools/Abbott-Smith/tree/master/releases. Could bdo be added to the filters to help with this problem? Daniel ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] multiple languages in modules
I know nothing of foreign, but can only suppose that, if supported, it must pass through the engine with an appropriate (HTML) indication. As a general rule, I suggest either Free Serif or Linux Libertine, with a slight preference for Free Serif. Both have good coverage across every Latin alphabet variant, and pretty display of both Hebrew and Greek. In modules of mine that have Latin, Greek, and Hebrew alphabets, they all show quite well. We include both of these fonts in Xiphos' Win32 installers. You might find the UDHR module useful, from Crosswire Experimental, as a font demonstration module. (Linux Libertine is not Linux-specific. It was just developed in an open source environment.) Is the foreign element passed through the engine? If so, do I need to file bugs with front-ends to encourage support of foreign? Having just looked, the string foreign does not appear in Sword's source tree in src/modules/filters/*.cpp. So it's not supported right now after all. I don't know how BPBible supports it; I had understood that BPBible uses the regular filter sets. Does BPBible actually subclass the filters and extend them for foreign? Second, when RtoL text is mixed with LtoR text you can get some strange display problems. Punctuation and numbers can work for both types of languages. This is often an artifact of how toolkits handle LtoR. Today, Xiphos uses GTK and WebKit, but I don't know how these reflect your example case. Our former use of gtkhtml3 -vs- gtkmozembed -vs- xulrunner -vs- today's WebKit always led to some strange realizations for how LtoR would show up in Xiphos. gtkhtml3 wants to right-justify any text containing (or perhaps it was that leads off with) Hebrew. That peculiarity led to certain unexpected choices for how I created StrongsRealHebrew. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] Sword -r2741
In recent correspondence with Karl Kleinpaste of the Xiphos project about display issues with our project's module. He recommended that I try sword's latest -r2741 because it has recent changes regarding osis headings. I do not have access to this version of sword. Would someone be willing to run our project's osis file through the latest version of sword (apparently -r2741), create a module from it and then send me the results? - My OSIS was builting using the sword script from USFM files. - My OSIS validates - I have already ran the fix for titles on my osis. Please contact me if you are willing, Thanks ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page