Mattias Põldaru wrote:
Ühel kenal päeval, K, 2009-06-03 kell 19:25, kirjutas DM Smith:
On Jun 3, 2009, at 1:36 PM, Mattias Põldaru wrote:
Hi everybody.
It is nice to see you (DM, I suppose) got the osis2mod working in no
time at all. There is one more issue with preverse stuff. Some
whitespace gets counted as preverse on my file and I think this is
wrong, although it isn't that complicated at all to remove whitespace
from my source document. I paste a example here.
Here is the input osis file. Please correct me, if I have something
wrong here.
<!-- start of example clip -->
<div type="bookGroup">
<title>Vana Testament</title>
<div type="book" osisID="Gen" canonical="true">
<title type="main">1. Moosese</title>
<div type="section" scope="Gen.1.1-Gen.2.3" >
<title>Maailma ja inimese loomine</title>
<chapter sID="Gen.1" osisID="Gen.1" />
<title type="chapter">1. peatükk</title>
<p>
<verse sID="Gen.1.1" osisID="Gen.
1.1" />
Alguses lõi Jumal taevad ja maa.
<verse eID="Gen.1.1" />
</p>
<p>
<verse sID="Gen.1.2" osisID="Gen.
1.2" />
Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja Jumala Vaim
hõljus vete kohal.
<verse eID="Gen.1.2" />
</p>
<!-- end of example clip -->
And here is the corresponding module output. Please notice the one
space
only preverse.
<!-- start of example clip -->
<div sID="gen1" type="bookGroup"/> <title>Vana Testament</title> <div
canonical="true" osisID="Gen" sID="gen2" type="book"/> <title
type="main">1. Moosese</title> <div sID="gen3" scope="Gen.1.1-Gen.2.3"
type="section"/> <title>Maailma ja inimese loomine</title>
<chapter osisID="Gen.1" sID="Gen.1"/> <title type="chapter">1.
peatükk</title> <div sID="gen4" type="paragraph"/>
Alguses lõi Jumal taevad ja maa. <div eID="gen4" type="paragraph"/>
<div type="x-milestone" subType="x-preverse" sID="pv1"/><div
sID="gen5"
type="paragraph"/> <div type="x-milestone" subType="x-preverse"
eID="pv1"/> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja
Jumala Vaim hõljus vete kohal. <div eID="gen5" type="paragraph"/>
<!-- end of example clip -->
The pre-verse contains "<p> " (the paragraph start and the space)
Handling of whitespace is a bit problematic. What osis2mod does is
replace sequences of whitespace (newlines, spaces and tabs) with a
single space. If a verse contains leading or trailing space, it is
trimmed. (I don't think it should do this trimming.)
What osis2mod does not have knowledge of the containment model of the
OSIS schema. That is, if it did, it could remove whitespace between
element tags that don't allow for text.
In this case, the OSIS schema allows for whitespace after the opening
paragraph tag and before the verse tag. One could have:
<p>yada yada yada <verse>verse text</verse> yada yada yada</p>
In this case, it would be inappropriate to trim the whitespace off of
the text that precedes the verse.
If we can come up with a good heuristic I'd be glad to implement it.
For the case I have, it would be sufficient to check if the preverse has
any printing characters and not to add an empty preverse.
The preverse is not empty, it contains
<div type="paragraph" sID="gen5">
which is the transformation of <p> into a milestoned representation.
It also has a single space following that element.
Where should the paragraph be put? It either is appended to the prior
verse or it is pre-verse.
The one solution I thought of is that any whitespace immediately
following a block element start (<div>, <lg>, <p>, ...) can be deleted.
Likewise for any whitespace immediately before the end element.
Would this work?
In Him,
DM
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page