Re: [sword-devel] osis2mod warnings

2012-10-11 Thread DM Smith

On Oct 10, 2012, at 1:02 AM, Greg Hellings greg.helli...@gmail.com wrote:

 There are two osis2mod warnings I'm getting that seem strange to me.
 The first is when it encounters post-verse material. It will issue a
 warning such as follows:
 INFO(V11N): Jude is not in the KJV versification. Appending content to
 3John.1.14
 INFO(WRITE): Appending entry: 3John.1.14: *snipped content of 3John.1.14*
 
 Now the material in question is simply a few closing colophons and/or
 closing tags from the material in 3 John. Jude is, most definitely, a
 part of the KJV versification. But when I got this message the first
 time I ended up spinning in circles a few times looking for the issue
 before realizing it was just an oddly worded debug statement. Perhaps
 this should be altered to reflect what is actually happening?

Osis2mod splits the input into chunks and tries to figure out which slot in the 
index to put that chunk. W/o seeing your file, it is hard to tell what the 
problem actually is, so I'm going to guess.

Maybe the colophon is not contained within the last chapter/book of John? 
Anything between books may be taken to belong to the next book. There may be a 
problem in assigning that to Jude.0.0 (i.e. the book introduction).

That the osisID given is Jude, this indicates that it did not find Jude.1.1 
when it tried to add a chunk, therefore it attaches to the last seen verse in 
the av11n, which in this case is 3John.1.14.

OSIS requires a chapter of 1 for single chapter books. I don't think this is 
the problem.

If you can supply the end of 3John and the start of Jude, I can probably 
pinpoint the problem and perhaps improve the diagnostics, if it is not a code 
change.

 
 Secondly is the warning pair:
 WARNING(NESTING): verse Gen.10.2 is not well formed:(3,5)
 WARNING(NESTING): verse Gen.10.4 is not well formed:(5,3)
 
 The offending text reads:
 verse osisID=Gen.10.2 sID=Gen.10.2/listitem
 type=x-indent-1Ahima a Zhafeti yaali: Goomeri, Magogi, Madayi,
 Yavani, Tubali, Mexeki, ni Tirasi./item
 item type=x-indent-1verse eID=Gen.10.2/
 verse osisID=Gen.10.3 sID=Gen.10.3/Ahima a Goomeri yaali
 Axekenazi, Rifati ni Togaarima./item
 item type=x-indent-1verse eID=Gen.10.3/
 verse osisID=Gen.10.4 sID=Gen.10.4/Ahima a Yavani yaali: Elixa,
 Tarixixi, Kitiimi ni Rodanimu./item/list
 pverse eID=Gen.10.4/
 
 My assumption is that the not well formed warning is coming from the
 list ... /list that is spanning across multiple verses. Yet
 strangely these spanning item tags (misplaced by usfm2osis.py or in
 the wrong place in the USFM files) are not generating a warning. What
 is the malformed issue here? xmllint seems to think the file validates
 against the official OSIS schema, so it's not an XML validation issue.

The issue is not an xml one, or one of syntax, but rather semantics and use.

First, the warning is just that: a warning. It is not an error. It is saying 
that the verse as a fragment is not well formed. Osis2mod transforms container 
elements to milestone form to handle this. Perhaps this is not transformed or 
is not available to be transformed. If transformed, then the verse is 
well-formed and the warning should not happen (IIRC).

Second, the problem with this is when the verse is shown in isolation. If the 
verse is shown in isolation, it may not render properly because it does not 
have the entire context. Also, JSword has a requirement that a verse be a 
well-formed fragment. Otherwise, it strips out all markup and leaves the 
content. (This gets confusing when notes are shown inline!) JSword needs to 
change.

I guess the message could be made to be clearer.

Also, this list construct is problematic. It has nothing to do with the warning.
First, why is list and item used? Is it really a list? I don't know how 
SWORD renders this, but JSword uses a bulleted list. If this is markup for 
poetry, it should be using OSIS poetry markup.
Second, having verselist rather than listverse will probably cause the 
verse number to be rendered on the line prior to the start of the list.
Third, having verseitem may have the same orphan verse number problem. The 
item element typically causes rendering to be done at the start tag.

A list should typically end after a verse. (Same with other container markup.) 
It certainly should not be immediately before a verse end.

The p is not placed well. It does not make sense that a paragraph starts 
within a verse, unless it splits the verse content. (same with other container 
markup.)

This may be more of a problem with usfm2osis.

 
 Any help would be appreciated.

Don't know if that helps.

 
 --Greg
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org

[sword-devel] usfm2osis.py and tag \cp

2012-10-11 Thread Peter von Kaehne
The USFM \cp tag (used for chapter markers different from those of the used 
versification) crashes usfm2osis.py reliably. The programme needs a Ctrl-C 
interrupt to get out of its state.

Following minimal USFM code creates below attached error message. 

\id EST
\h ESTER
\c 1
\cp A
\s En Mordekai eh Ouraman
\p
\v 1 Mordekai,

-


peter@AOA110:~/Bibles/Pohnpeian/apo_usfm/Pohnapo$ usfm2osis.py Pohnester 
17ESTpohn.SFM.bak 
Sorting book files naturally.
Process Worker-2:
Traceback (most recent call last):
  File /usr/lib/python2.7/multiprocessing/process.py, line 258, in _bootstrap
self.run()
  File /usr/local/bin/usfm2osis.py, line 1440, in run
osis = convertToOsis(job)
  File /usr/local/bin/usfm2osis.py, line 1321, in convertToOsis
osis = cvtChaptersAndVerses(osis, relaxedConformance)
  File /usr/local/bin/usfm2osis.py, line 640, in cvtChaptersAndVerses
osis = re.sub(r'(chapter [^]+sID[^]+/.+?chapter eID[^]+/)', 
replaceChapterNumber, osis, flags=re.DOTALL)
  File /usr/lib/python2.7/re.py, line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
  File /usr/local/bin/usfm2osis.py, line 633, in replaceChapterNumber
ctext = re.sub(r'\$BOOK\$\.([^\.]+)', '$BOOK$.'+ca+'', ctext)
UnboundLocalError: local variable 'ca' referenced before assignment

^CTraceback (most recent call last):
  File /usr/local/bin/usfm2osis.py, line 1557, in module
k,v=result_queue.get()
  File /usr/lib/python2.7/multiprocessing/queues.py, line 117, in get
res = self._recv()
KeyboardInterrupt

 

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] usfm2osis.py and tag \cp

2012-10-11 Thread David Haslam
Bugs  tasks for usfm2osis.py may be reported as issues in JIRA under
MODTOOLS.

Chris has already begun to use JIRA for this purpose; see

http://www.crosswire.org/tracker/browse/MODTOOLS-32
http://www.crosswire.org/tracker/browse/MODTOOLS-33
http://www.crosswire.org/tracker/browse/MODTOOLS-34
http://www.crosswire.org/tracker/browse/MODTOOLS-36

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/usfm2osis-py-and-tag-cp-tp4651145p4651146.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] osis2mod warnings

2012-10-11 Thread Andrew Thule
Greg, I've seen these errors to.  I found the 'problem' was associated
with the first citation in the error, not the one being tacked on to.
In your example:
Jude is not in the KJV versification. Appending content to 3John.1.14

Is there anything (such as extra verses) tacked on to the end of Jude
that strictly speaking isn't in the versification you are using?  If
so those 'extra' verses are being treated as the beginning of
3John.1.14.

When I've encountered the problem, it was one of the two following
scenarios: either the text provided extra verses not found in the
versification, OR there was extra text in the chapter that appeared to
OSIS2MOD to be extra verses but which should have appeared to be
treated differently in the OSIS such as textual variants to be treated
as textual variants.

In the former case (a few verses not found in the versification), it
would be nice to be able to influence versification more directly as a
module developer.  Until that happens, the solution is to find a
versification that MOST matches the text you are working with (my
example of this the LXXE).  In the latter case the solution is to
simply look at the text causing the error, and to treat it differently
by using the appropriate OSIS tags (treating variants as variants for
example, etc).

~A

On Thu, Oct 11, 2012 at 7:21 AM, DM Smith dmsm...@crosswire.org wrote:

 On Oct 10, 2012, at 1:02 AM, Greg Hellings greg.helli...@gmail.com wrote:

 There are two osis2mod warnings I'm getting that seem strange to me.
 The first is when it encounters post-verse material. It will issue a
 warning such as follows:
 INFO(V11N): Jude is not in the KJV versification. Appending content to
 3John.1.14
 INFO(WRITE): Appending entry: 3John.1.14: *snipped content of 3John.1.14*

 Now the material in question is simply a few closing colophons and/or
 closing tags from the material in 3 John. Jude is, most definitely, a
 part of the KJV versification. But when I got this message the first
 time I ended up spinning in circles a few times looking for the issue
 before realizing it was just an oddly worded debug statement. Perhaps
 this should be altered to reflect what is actually happening?

 Osis2mod splits the input into chunks and tries to figure out which slot in 
 the index to put that chunk. W/o seeing your file, it is hard to tell what 
 the problem actually is, so I'm going to guess.

 Maybe the colophon is not contained within the last chapter/book of John? 
 Anything between books may be taken to belong to the next book. There may be 
 a problem in assigning that to Jude.0.0 (i.e. the book introduction).

 That the osisID given is Jude, this indicates that it did not find Jude.1.1 
 when it tried to add a chunk, therefore it attaches to the last seen verse in 
 the av11n, which in this case is 3John.1.14.

 OSIS requires a chapter of 1 for single chapter books. I don't think this is 
 the problem.

 If you can supply the end of 3John and the start of Jude, I can probably 
 pinpoint the problem and perhaps improve the diagnostics, if it is not a code 
 change.


 Secondly is the warning pair:
 WARNING(NESTING): verse Gen.10.2 is not well formed:(3,5)
 WARNING(NESTING): verse Gen.10.4 is not well formed:(5,3)

 The offending text reads:
 verse osisID=Gen.10.2 sID=Gen.10.2/listitem
 type=x-indent-1Ahima a Zhafeti yaali: Goomeri, Magogi, Madayi,
 Yavani, Tubali, Mexeki, ni Tirasi./item
 item type=x-indent-1verse eID=Gen.10.2/
 verse osisID=Gen.10.3 sID=Gen.10.3/Ahima a Goomeri yaali
 Axekenazi, Rifati ni Togaarima./item
 item type=x-indent-1verse eID=Gen.10.3/
 verse osisID=Gen.10.4 sID=Gen.10.4/Ahima a Yavani yaali: Elixa,
 Tarixixi, Kitiimi ni Rodanimu./item/list
 pverse eID=Gen.10.4/

 My assumption is that the not well formed warning is coming from the
 list ... /list that is spanning across multiple verses. Yet
 strangely these spanning item tags (misplaced by usfm2osis.py or in
 the wrong place in the USFM files) are not generating a warning. What
 is the malformed issue here? xmllint seems to think the file validates
 against the official OSIS schema, so it's not an XML validation issue.

 The issue is not an xml one, or one of syntax, but rather semantics and use.

 First, the warning is just that: a warning. It is not an error. It is saying 
 that the verse as a fragment is not well formed. Osis2mod transforms 
 container elements to milestone form to handle this. Perhaps this is not 
 transformed or is not available to be transformed. If transformed, then the 
 verse is well-formed and the warning should not happen (IIRC).

 Second, the problem with this is when the verse is shown in isolation. If the 
 verse is shown in isolation, it may not render properly because it does not 
 have the entire context. Also, JSword has a requirement that a verse be a 
 well-formed fragment. Otherwise, it strips out all markup and leaves the 
 content. (This gets confusing when notes are shown inline!) JSword needs to 
 change.

 I guess 

Re: [sword-devel] usfm2osis.py and tag \cp

2012-10-11 Thread Chris Little
I hope I've fixed this now. (I haven't tested that it functions 
correctly, but the error was fairly obvious from the traceback below.)


The application will almost always need Ctrl-C to break out because of 
the multithreading (and because I haven't bothered to add much exception 
handling).


--Chris

On 10/11/2012 07:40 AM, Peter von Kaehne wrote:

The USFM \cp tag (used for chapter markers different from those of the used 
versification) crashes usfm2osis.py reliably. The programme needs a Ctrl-C 
interrupt to get out of its state.

Following minimal USFM code creates below attached error message.

\id EST
\h ESTER
\c 1
\cp A
\s En Mordekai eh Ouraman
\p
\v 1 Mordekai,

-


peter@AOA110:~/Bibles/Pohnpeian/apo_usfm/Pohnapo$ usfm2osis.py Pohnester 
17ESTpohn.SFM.bak
Sorting book files naturally.
Process Worker-2:
Traceback (most recent call last):
   File /usr/lib/python2.7/multiprocessing/process.py, line 258, in _bootstrap
 self.run()
   File /usr/local/bin/usfm2osis.py, line 1440, in run
 osis = convertToOsis(job)
   File /usr/local/bin/usfm2osis.py, line 1321, in convertToOsis
 osis = cvtChaptersAndVerses(osis, relaxedConformance)
   File /usr/local/bin/usfm2osis.py, line 640, in cvtChaptersAndVerses
 osis = re.sub(r'(chapter [^]+sID[^]+/.+?chapter eID[^]+/)', 
replaceChapterNumber, osis, flags=re.DOTALL)
   File /usr/lib/python2.7/re.py, line 151, in sub
 return _compile(pattern, flags).sub(repl, string, count)
   File /usr/local/bin/usfm2osis.py, line 633, in replaceChapterNumber
 ctext = re.sub(r'\$BOOK\$\.([^\.]+)', '$BOOK$.'+ca+'', ctext)
UnboundLocalError: local variable 'ca' referenced before assignment

^CTraceback (most recent call last):
   File /usr/local/bin/usfm2osis.py, line 1557, in module
 k,v=result_queue.get()
   File /usr/lib/python2.7/multiprocessing/queues.py, line 117, in get
 res = self._recv()
KeyboardInterrupt



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page




___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


[sword-devel] genbook lexicons - example problem and potential solutions

2012-10-11 Thread Daniel Owens
I am still working on the Abbott-Smith markup project (over 300 entries 
and counting). We have four contributors right now, so the pace is 
picking up. Creating a module is another story. Chris made a lexicon 
module after the first release, but . . .


I would like the module to look like this: 
http://www.textonline.org/files/abbott-smith/abbott-smith.current_release.html. 
To do that in SWORD, it needs to be a genbook in order to support:

- front- and backmatter
- page numbers
- a hierarchical structure (In the original TEI it has at least one 
superEntry, but it is also divided into div's by letter heading [Α, Β, 
Γ, Δ, Ε, Ζ, Η, Θ, etc.])


The good news is that an OSIS genbook supports the bare-bones essentials 
of entries. And thankfully BPBible and BibleTime both display entries 
together in the same view, thanks to BPBible's continuous scrolling and 
*perhaps* BibleTime not recognizing div type=x-entry.


Unfortunately various features of valid OSIS genbooks are inconsistently 
supported by front-ends. I created a module for testing. You can find it 
at 
https://github.com/translatable-exegetical-tools/Abbott-Smith/tree/master/releases/sword, 
including a valid OSIS file. Issues include:
- Some front-ends recognize lb/, others p, but the lexicon uses both 
(and both are valid OSIS) in various contexts.

- Tables are inconsistently supported (mostly not)
- Titles should be centered, but there is no way to do that in OSIS, as 
far as I can tell. I wonder if this is a great example use case of 
per-module CSS...
- Parts of speech should be green and page numbers red, but you can't do 
color in OSIS (another use case of per-module CSS?)


Some of these like p, lb, and tables should just work, I think. 
Perhaps I will file bug reports. But the other display issues cannot be 
resolved by OSIS alone.


Should TEI be a supported genbook format? I would think the TEI filter 
(as it evolves) could be pressed into use for genbooks. If that were 
done, certain lexicon-specific features as well as real book features 
such as page numbers could be consistently supported and displayed. On 
the other hand, I could see the value of having per-module CSS in the 
conf file so that the module developer could have some control over 
display.


Any thoughts?

Daniel

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

[sword-devel] multiple languages in modules

2012-10-11 Thread Daniel Owens
Working on Abbott-Smith some things came together in my mind about 
modules that mix languages. I have identified two problems.


First, modules that mix languages do not look good when fonts are chosen 
per module rather than per language (regardless of the language of the 
module). I go back and forth on which font to use as the default font in 
SWORD frontends. If I am primarily using Greek lexicons, I use a Greek 
font. If Hebrew lexicons then Hebrew. Cardo just does not satisfy, and 
the SBL Biblit font is not out yet. But the foreign element *should* 
mean that the front-end picks the right font for the right text, right? 
I was (quietly) ecstatic to discover that BPBible also handles foreign 
elements properly, displaying the proper fonts for each language in a 
module (such as Abbott-Smith) that has five languages, using three 
different fonts. So cool. But most front-ends do not. Is the foreign 
element passed through the engine? If so, do I need to file bugs with 
front-ends to encourage support of foreign?


Second, when RtoL text is mixed with LtoR text you can get some strange 
display problems. Punctuation and numbers can work for both types of 
languages. Take an example from the entry ἀγανάκτησις in Abbott-Smith. 
It uses the aleph character, with a number 1 following. Just typing this 
as plain text gives you: א1. This is incorrect even though the numeral 
was typed after the letter. I found a solution in HTML using bdo. The 
page at 
http://www.textonline.org/files/abbott-smith/abbott-smith.current_release.html 
properly handles this issue. It actually transforms raw TEI using XSL 
with a CSS stylesheet to handle certain display issues. These 
stylesheets can be found at 
https://github.com/translatable-exegetical-tools/Abbott-Smith/tree/master/releases.


Could bdo be added to the filters to help with this problem?

Daniel

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] multiple languages in modules

2012-10-11 Thread Karl Kleinpaste
I know nothing of foreign, but can only suppose that, if supported, it
must pass through the engine with an appropriate (HTML) indication.

As a general rule, I suggest either Free Serif or Linux Libertine, with
a slight preference for Free Serif.  Both have good coverage across
every Latin alphabet variant, and pretty display of both Hebrew and
Greek.  In modules of mine that have Latin, Greek, and Hebrew alphabets,
they all show quite well.  We include both of these fonts in Xiphos'
Win32 installers.

You might find the UDHR module useful, from Crosswire Experimental, as a
font demonstration module.

(Linux Libertine is not Linux-specific.  It was just developed in an
open source environment.)

 Is the foreign element passed through the engine? If so, do I need
 to file bugs with front-ends to encourage support of foreign?

Having just looked, the string foreign does not appear in Sword's
source tree in src/modules/filters/*.cpp.  So it's not supported right
now after all.  I don't know how BPBible supports it; I had understood
that BPBible uses the regular filter sets.  Does BPBible actually
subclass the filters and extend them for foreign?

 Second, when RtoL text is mixed with LtoR text you can get some
 strange display problems. Punctuation and numbers can work for both
 types of languages.

This is often an artifact of how toolkits handle LtoR.  Today, Xiphos
uses GTK and WebKit, but I don't know how these reflect your example
case.  Our former use of gtkhtml3 -vs- gtkmozembed -vs- xulrunner -vs-
today's WebKit always led to some strange realizations for how LtoR
would show up in Xiphos.  gtkhtml3 wants to right-justify any text
containing (or perhaps it was that leads off with) Hebrew.  That
peculiarity led to certain unexpected choices for how I created
StrongsRealHebrew.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


[sword-devel] Sword -r2741

2012-10-11 Thread luke
In recent correspondence with Karl Kleinpaste of the Xiphos project about 
display issues with our project's module.  He recommended that I try sword's 
latest -r2741 because it has recent changes regarding osis headings.  I do not 
have access to this version of sword.

Would someone be willing to run our project's osis file through the latest 
version of sword (apparently -r2741), create a module from it and then send me 
the results?
- My OSIS was builting using the sword script from USFM files.
- My OSIS validates
- I have already ran the fix for titles on my osis.

Please contact me if you are willing,
Thanks
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page