Re: [sword-devel] Catholic versification / inter-versification mappings

2024-02-20 Thread Kahunapule Michael Johnson
ycliffe use - even when allowing 
for verse bridges, reordering and more.

Which makes me think we should think more about a presentation overlay 
which we have at the moment very rudimentary only. OSIS certainly allows for 
more than we incorporate right now.

This could take eg in the engine the form of a resorting filter where text 
is reordered and marked up for user facing presentation based on additional 
presentation info in modules.

This would to my mind be a much better solution than ever increasing 
further system of av11n or indeed a complete rewrite.

Peter


Sent from Outlook for iOS <https://aka.ms/o0ukef>

--------------------
*From:* sword-devel  on behalf of Troy A. 
Griffitts 
*Sent:* Tuesday, February 20, 2024 12:24 am
*To:* sword-devel@crosswire.org 
*Subject:* Re: [sword-devel] Catholic versification / inter-versification 
mappings

Dear all,

These comments are a mix of background, history, and thoughts:

1) VERSIFICATION (v11n):

Variation between reference systems sucks.  Until you get into the weeds of 
the details, it is normal to assume the problems are not complex.  SWORD tries 
to implement a simple 90% solution.

SWORD and JSword support defined abstract versification schemes with 3 
simple dimensions: [bookid : chapterMax][chapterNumber : verseMax][verseNumber 
: verseEntry]

Conceptually we also operate on these assumptions (I've skimmed the 
proposal by Arnaud which differs here, but I haven't given it the thought it 
deserves to comment yet): that book order is defined in the v11n system; that 
chapter and verse numbers are numeric and begin at 1 and increase to verseMax.  
We also allocate a special slot '0' for: Module Introduction; Testament 
Introduction; Book Introduction; and Chapter Introduction (e.g., Matt.0.0 can 
hold an introduction to Matthew).

Those who have been exposed to many Bibles will immediately think of places 
these assumption fall short.  But for >90% of our Bibles, these assumption hold 
true, and these assumption make many aspects of our work much simpler (abstract 
parsing of verse lists and ranges, bookmark ordering, etc.).

Historically, SWORD previously supported dynamic, per module, 
versification, with a 3 phase lookup:

index file .bks[book number] = book offset in next index;
index file .cps[book offset + chapter number] = chapter offset in next 
index;
index file .vss[chapter offset + verse number] = verse offset and entry 
size in data file.

20 years or so, we made the decision to begin the hard work to understand 
versification systems within Bibles so we could begin to map them 
appropriately. This let us remove the .bks, and .cps index files and store that 
data in versification system definitions, leaving only the final .vss index 
file which gave the offsets and entry sizes into the data file.

Caring about versifications was a decision we made. Our previous design let 
any Bible decide how many books, how many chapters, and how many verses each 
chapter contained.  This had its merits because any new versification could be 
defined in each module without anyone caring what it was.  But the drawback was 
the same: any Bible could decide how many books, how many chapters, and how 
many verses without anyone knowing why or what they were.

Some have pushed for dynamic definitions of v11n systems again, and I 
understand why.  I am in favor of moving forward with a hybrid approach: a set 
of defined versification systems, which a module will still need to choose 
from, to which it most closely adheres, + the ability for that module to 
specify its variation.

Toward 98%: We have tried to work around the cons of this simple design and 
approach 100% support by accounting for the most common types of problems, e.g.

  * The engine allows common verse suffixes (e.g. Matt.2.7b);
  * The engine skips verses in a Bible which are not present-- this allows 
us to create v11n schemes which are a superset of n number of closely related 
v11n schemes, knowing that the engine will skip over the verses that are not 
present in the module; We also have tools which print out missing verses which 
has proven a good QA check for our modules team.
  * When we run across a Bible which adds an odd verse here or there or an 
out of order verse, our workaround has been to append these to end of the verse 
just before where they should appear, so the text

Re: [sword-devel] Catholic versification / inter-versification mappings

2024-02-20 Thread Arnaud Vié
e eg in the engine the form of a resorting filter where text
> is reordered and marked up for user facing presentation based on additional
> presentation info in modules.
>
> This would to my mind be a much better solution than ever increasing
> further system of av11n or indeed a complete rewrite.
>
> Peter
>
>
> Sent from Outlook for iOS <https://aka.ms/o0ukef>
>
> --
> *From:* sword-devel  on behalf of Troy
> A. Griffitts 
> *Sent:* Tuesday, February 20, 2024 12:24 am
> *To:* sword-devel@crosswire.org 
> *Subject:* Re: [sword-devel] Catholic versification / inter-versification
> mappings
>
>
> Dear all,
>
> These comments are a mix of background, history, and thoughts:
>
> 1) VERSIFICATION (v11n):
>
> Variation between reference systems sucks.  Until you get into the weeds
> of the details, it is normal to assume the problems are not complex.  SWORD
> tries to implement a simple 90% solution.
>
> SWORD and JSword support defined abstract versification schemes with 3
> simple dimensions: [bookid : chapterMax][chapterNumber :
> verseMax][verseNumber : verseEntry]
>
> Conceptually we also operate on these assumptions (I've skimmed the
> proposal by Arnaud which differs here, but I haven't given it the thought
> it deserves to comment yet): that book order is defined in the v11n system;
> that chapter and verse numbers are numeric and begin at 1 and increase to
> verseMax.  We also allocate a special slot '0' for: Module Introduction;
> Testament Introduction; Book Introduction; and Chapter Introduction (e.g.,
> Matt.0.0 can hold an introduction to Matthew).
>
> Those who have been exposed to many Bibles will immediately think of
> places these assumption fall short.  But for >90% of our Bibles, these
> assumption hold true, and these assumption make many aspects of our work
> much simpler (abstract parsing of verse lists and ranges, bookmark
> ordering, etc.).
>
> Historically, SWORD previously supported dynamic, per module,
> versification, with a 3 phase lookup:
> index file .bks[book number] = book offset in next index;
> index file .cps[book offset + chapter number] = chapter offset in next
> index;
> index file .vss[chapter offset + verse number] = verse offset and entry
> size in data file.
>
> 20 years or so, we made the decision to begin the hard work to understand
> versification systems within Bibles so we could begin to map them
> appropriately.  This let us remove the .bks, and .cps index files and store
> that data in versification system definitions, leaving only the final .vss
> index file which gave the offsets and entry sizes into the data file.
>
> Caring about versifications was a decision we made.  Our previous design
> let any Bible decide how many books, how many chapters, and how many verses
> each chapter contained.  This had its merits because any new versification
> could be defined in each module without anyone caring what it was.  But the
> drawback was the same: any Bible could decide how many books, how many
> chapters, and how many verses without anyone knowing why or what they were.
>
> Some have pushed for dynamic definitions of v11n systems again, and I
> understand why.  I am in favor of moving forward with a hybrid approach: a
> set of defined versification systems, which a module will still need to
> choose from, to which it most closely adheres, + the ability for that
> module to specify its variation.
>
> Toward 98%: We have tried to work around the cons of this simple design
> and approach 100% support by accounting for the most common types of
> problems, e.g.
>
>- The engine allows common verse suffixes (e.g. Matt.2.7b);
>- The engine skips verses in a Bible which are not present-- this
>allows us to create v11n schemes which are a superset of n number of
>closely related v11n schemes, knowing that the engine will skip over the
>verses that are not present in the module; We also have tools which print
>out missing verses which has proven a good QA check for our modules team.
>- When we run across a Bible which adds an odd verse here or there or
>an out of order verse, our workaround has been to append these to end of
>the verse just before where they should appear, so the text flows the same
>as the printed Bible, and we include for the reader an inline visual
>separator and marker showing the publisher's verse number.
>
>
> These work arounds get us pretty close to being able to support 98% of our
> Bibles exactly as the publisher wishes, and the remaining 2% is supported
> "well enough" for no complaints by publishers.  Could we build a system
> which allowed out of order verses, or which allow

Re: [sword-devel] Catholic versification / inter-versification mappings

2024-02-20 Thread Peter von Kaehne

  
  
  

There are two aspects here:Historical developments which have been by and large well explored by Chris and are mirrored well in our various av11n system - with specific and now largely known gaps - we have less clue on the Roman catholic development of versification systems and probably even less on a variety of autochthone churches with their own historical development of translations an versification. But these are well defined tasks and e.g. the work of Dominique to bring in the French systems documents well how we should handle that side. The complications coming in due to modern translation are actually very limited:FWIW, most modern translations which play about with versifications still actually follow an underlying well known and documented plan. NRSV(A) eg seems to be all what UBS and Wycliffe use - even when allowing for verse bridges, reordering and more. Which makes me think we should think more about a presentation overlay which we have at the moment very rudimentary only. OSIS certainly allows for more than we incorporate right now. This could take eg in the engine the form of a resorting filter where text is reordered and marked up for user facing presentation based on additional presentation info in modules. This would to my mind be a much better solution than ever increasing further system of av11n or indeed a complete rewrite. Peter
Sent from Outlook for iOS
  

 From: sword-devel  on behalf of Troy A. Griffitts Sent: Tuesday, February 20, 2024 12:24 amTo: sword-devel@crosswire.org Subject: Re: [sword-devel] Catholic versification / inter-versification mappings 

  

  
  
Dear all,
These comments are a mix of background, history, and thoughts:
1) VERSIFICATION (v11n):
Variation between reference systems sucks.  Until you get into
  the weeds of the details, it is normal to assume the problems are
  not complex.  SWORD tries to implement a simple 90% solution.

SWORD and JSword support defined abstract versification schemes
  with 3 simple dimensions: [bookid : chapterMax][chapterNumber :
  verseMax][verseNumber : verseEntry]

Conceptually we also operate on these assumptions (I've skimmed
  the proposal by Arnaud which differs here, but I haven't given it
  the thought it deserves to comment yet): that book order is
  defined in the v11n system; that chapter and verse numbers are
  numeric and begin at 1 and increase to verseMax.  We also allocate
  a special slot '0' for: Module Introduction; Testament
  Introduction; Book Introduction; and Chapter Introduction (e.g.,
  Matt.0.0 can hold an introduction to Matthew).

Those who have been exposed to many Bibles will immediately think
  of places these assumption fall short.  But for >90% of our
  Bibles, these assumption hold true, and these assumption make many
  aspects of our work much simpler (abstract parsing of verse lists
  and ranges, bookmark ordering, etc.).
Historically, SWORD previously supported dynamic, per module,
  versification, with a 3 phase lookup:
index file .bks[book number] = book offset in next index;
index file .cps[book offset + chapter number] = chapter offset in
next index;
index file .vss[chapter offset + verse number] = verse offset and
entry size in data file.
20 years or so, we made the decision to begin the hard work to
  understand versification systems within Bibles so we could begin
  to map them appropriately.  This let us remove the .bks, and .cps
  index files and store that data in versification system
  definitions, leaving only the final .vss index file which gave the
  offsets and entry sizes into the data file.
Caring about versifications was a decision we made.  Our previous
  design let any Bible decide how many books, how many chapters, and
  how many verses each chapter contained.  This had its merits
  because any new versification could be defined in each module
  without anyone caring what it was.  But the drawback was the same:
  any Bible could decide how many books, how many chapters, and how
  many verses without anyone knowing why or what they were.
Some have pushed for dynamic definitions of v11n systems again,
  and I understand why.  I am in favor of moving forward with a
  hybrid approach: a set of defined versification systems, which a
  module will still need to choose from, to which it most closely
  adheres, + the ability for that module to specify its variation.

Toward 98%: We have tried to work around the cons of this simple
  design and approach 100% support by accounting for the most common
  types of problems, e.g.

  The engine allows common verse suffixes (e.g. Matt.2.7b);
  The engine skips verses in a Bible which are not present--
this allows us to create v11n schemes which are a superset o

Re: [sword-devel] Catholic versification / inter-versification mappings

2024-02-19 Thread Troy A. Griffitts
Michael and I obviously differ in opinion on this subject, and he knows 
that I love him and so appreciate the work he does at ebible.org.


I would like to propose that our differences are due to our perspective 
in light of our primary ministries:


Among many other things, Michael with ebible.org edits and publishes the 
World English Bible and its variations and also transforms every freely 
distributable Bible from the United Bible Societies' (USB) Digital Bible 
Library (DBL) repository to many formats, including SWORD, and for that 
we are so grateful and blessed by his efforts.


For his work, he needs to try to fit all these Bibles into one of our 
versification schemes (actually maybe he has punted on that and may use 
our NRSVA scheme which is one of our largest supersets; I don't know 
these days). I can imagine the frustration this must be for him, trying 
to automate this conversion as much as possible. He has had much grace 
to convert DBL markup to OSIS (a format he doesn't love) and select a 
versification, which he sees many of his Bibles don't match exactly. I 
am so grateful for his friendship and his partnership in ministry over 
many decades.


From my perspective, SWORD is focused on Bible research. In my mind, I 
am need to define a finite set of unambiguous segments of Scripture, and 
once defined, for each segment:


 * show that same segment of Scripture in all Bibles which contain that
   segment.
   
https://crosswire.org/study/parallelstudy.jsp?del=all=KJV=NA28DBG=NASB=RST=Matt.16.28#cv
 * even try to align individual words, if we can, within that segment
   (try clicking on any word from the link above).
 * find the precise folio in all ancient manuscripts of that segment
   and showing images of that segment:
   https://crosswire.org/study/fetchdata.jsp?mod=TC=Matt.16.28=NASB
 * find the parallel passages of that segment across the Four Gospels,
   using Eusebian Canon Tables, with 2 segments of context before and
   after:
   https://crosswire.org/study/eusebian.jsp?key=Matt.16.28=2#cv
 * find and show commentary about that segment:
   https://crosswire.org/study/passagestudy.jsp?key=Matt.16.28-28=JFB
 * compute and show variation in manuscripts about that segment:
 * 
https://ntvmr.uni-muenster.de/community/vmr/api/collate/?baseText=ECM=Matt+16%3A28=0=-1=true=true=true=true=graph=undefined=intfadmin


For my daily work, I need tightly defined, unambiguous segments of the 
original Text.  Michael is daily trying to get a publisher's translation 
displaying the way the publisher desires-- with all the publisher's 
unique adjustments of the text, and understandably he gets frustrated 
that a simple task isn't so simple in SWORD because he needs to segment 
and fit each Bible into a versification system with IDs which we can use 
to do all our research.  I get it.  I understand.  Simple things should 
be simple, and right now they are not.


We have discussed alternatives for a more ePub/PDF like reader's view of 
a Bible, where we preserve the imported data file's ordering such that a 
reader oriented view could possibly ask for, say 'chapter' chunks and 
just display what it gets, and then slice things up into our indices 
behind the scenes as we do now.  We've had 'reader view' requests for 
our lexicon modules, as well, because we order entries to allow fast 
binary search lookups, and when a client of the SWORD library asks to 
step through a lexicon module, they get an ordered list of entries 
instead of a the published order in the original lexicon.  We could do 
much better preserving original publisher display; even simply to the 
point that we support this use case as a first class citizen in the 
engine and preserve a ePub or / PDF of the original with each module.


I want to move forward with solutions for each task we each individually 
feel called to work toward.  I am sorry that I prioritize my tasks I 
feel I am called to work toward instead of helping others find solutions 
for their ministry.  This might be normal for a software engineer, but 
not for a good leader.


Serving together,

Troy



On February 19, 2024 18:12:55 MST, Kahunapule Michael Johnson 
 wrote:


   Dear All,

   If I understand Arnaud correctly, I really like his ideas. The BEST
   part is that the next time a Bible is submitted for processing with
   yet another unique versification (after the changes are
   implemented), it doesn't have to be either force-fit into a
   versification that doesn't fit or wait for decades for someone to
   update the hard-coded versifications in the Sword engine, and for
   those to be incorporated into all of the front ends.

   I regard the current minimalist versification system to be seriously
   in need of an upgrade. It is based on false assumptions (listed by
   Troy, no offense intended) that seemed good at the time they were
   made. However, with 1404 Bible translations (and counting) is that
   (1) 90% success is an over-estimate of how well 

Re: [sword-devel] Catholic versification / inter-versification mappings

2024-02-19 Thread Kahunapule Michael Johnson

Dear All,

If I understand Arnaud correctly, I really like his ideas. The BEST part is 
that the next time a Bible is submitted for processing with yet another unique 
versification (after the changes are implemented), it doesn't have to be either 
force-fit into a versification that doesn't fit or wait for decades for someone 
to update the hard-coded versifications in the Sword engine, and for those to 
be incorporated into all of the front ends.

I regard the current minimalist versification system to be seriously in need of an upgrade. It is based on false assumptions (listed by Troy, no offense intended) that seemed good at the time they were made. However, with 1404 Bible translations (and counting) is that (1) 90% success is an over-estimate of how well it works, and (2) Sword versification is a complete failure for numerous projects because none of the existing versifications fit, the fall-back mechanisms fail and result in wrong outputs or 
crashes in osis2mod, and nobody is actively fixing the situation.


I have found the following to be true:

The number of versifications needed to represent all Bibles properly tomorrow 
is highly likely to be more than the number that works today. Hard-coding 
versifications into slowly-changing code that is only updated in fits and 
starts is doomed to fail (and already has, in my not-so-humble opinion).

Verse numbers in a chapter don't always proceed in numerical order. Several 
Bible translations move the statement about the motion of the shadow on 
Hezekaiah's steps to a more logical place in terms of discourse, without 
changing the verse numbers. Indeed, they split verses into segments and 
straddle other verses with them.

Chapter and verse "numbers" aren't always pure numbers. Letters get involved in 
the Deuterocanon/Apocrypha. Some Bible translators like to use verse segments (like 6a 
and 6b) heavily.

Verse bridges (like verse 1-3 with everything from verses 1 through 3 but 
possibly rearranged and with no other verse markings within them) are very 
common.

Mapping any arbitrary versification to any other is NICE, but NOT NECESSARY. 
Displaying the text as the translators intended is NECESSARY. If you can do 
both, do it. If you cannot, at least display the versification of the Bible 
translation as the translator intended.

I am fully aware of the changes in architecture and code adapting to the 
realities I perceive imply. At this point, I'm not sure if modifying the Sword 
engine or rewriting it would be easier. Either way, it is a lot of work.

It is my understanding that JSword is a bit better than Sword in this regard, 
in that it doesn't assume fixed versifications.

As far as volunteering for pumpkin holder for versifications, I nominate 
Arnaud. (I already bit off more than I can chew by myself. Sorry.)

On 2/19/24 14:23, Troy A. Griffitts wrote:


Dear all,

These comments are a mix of background, history, and thoughts:

1) VERSIFICATION (v11n):

Variation between reference systems sucks.  Until you get into the weeds of the 
details, it is normal to assume the problems are not complex.  SWORD tries to 
implement a simple 90% solution.

SWORD and JSword support defined abstract versification schemes with 3 simple 
dimensions: [bookid : chapterMax][chapterNumber : verseMax][verseNumber : 
verseEntry]

Conceptually we also operate on these assumptions (I've skimmed the proposal by 
Arnaud which differs here, but I haven't given it the thought it deserves to 
comment yet): that book order is defined in the v11n system; that chapter and 
verse numbers are numeric and begin at 1 and increase to verseMax.  We also 
allocate a special slot '0' for: Module Introduction; Testament Introduction; 
Book Introduction; and Chapter Introduction (e.g., Matt.0.0 can hold an 
introduction to Matthew).

Those who have been exposed to many Bibles will immediately think of places these 
assumption fall short.  But for >90% of our Bibles, these assumption hold true, 
and these assumption make many aspects of our work much simpler (abstract parsing 
of verse lists and ranges, bookmark ordering, etc.).

Historically, SWORD previously supported dynamic, per module, versification, 
with a 3 phase lookup:

index file .bks[book number] = book offset in next index;
index file .cps[book offset + chapter number] = chapter offset in next index;
index file .vss[chapter offset + verse number] = verse offset and entry size in 
data file.

20 years or so, we made the decision to begin the hard work to understand 
versification systems within Bibles so we could begin to map them 
appropriately.  This let us remove the .bks, and .cps index files and store 
that data in versification system definitions, leaving only the final .vss 
index file which gave the offsets and entry sizes into the data file.

Caring about versifications was a decision we made.  Our previous design let 
any Bible decide how many books, how many chapters, and how many verses each 
chapter contained. 

Re: [sword-devel] Catholic versification / inter-versification mappings

2024-02-19 Thread Troy A. Griffitts

Dear all,

These comments are a mix of background, history, and thoughts:

1) VERSIFICATION (v11n):

Variation between reference systems sucks.  Until you get into the weeds 
of the details, it is normal to assume the problems are not complex.  
SWORD tries to implement a simple 90% solution.


SWORD and JSword support defined abstract versification schemes with 3 
simple dimensions: [bookid : chapterMax][chapterNumber : 
verseMax][verseNumber : verseEntry]


Conceptually we also operate on these assumptions (I've skimmed the 
proposal by Arnaud which differs here, but I haven't given it the 
thought it deserves to comment yet): that book order is defined in the 
v11n system; that chapter and verse numbers are numeric and begin at 1 
and increase to verseMax.  We also allocate a special slot '0' for: 
Module Introduction; Testament Introduction; Book Introduction; and 
Chapter Introduction (e.g., Matt.0.0 can hold an introduction to Matthew).


Those who have been exposed to many Bibles will immediately think of 
places these assumption fall short.  But for >90% of our Bibles, these 
assumption hold true, and these assumption make many aspects of our work 
much simpler (abstract parsing of verse lists and ranges, bookmark 
ordering, etc.).


Historically, SWORD previously supported dynamic, per module, 
versification, with a 3 phase lookup:


index file .bks[book number] = book offset in next index;
index file .cps[book offset + chapter number] = chapter offset in next 
index;
index file .vss[chapter offset + verse number] = verse offset and entry 
size in data file.


20 years or so, we made the decision to begin the hard work to 
understand versification systems within Bibles so we could begin to map 
them appropriately.  This let us remove the .bks, and .cps index files 
and store that data in versification system definitions, leaving only 
the final .vss index file which gave the offsets and entry sizes into 
the data file.


Caring about versifications was a decision we made.  Our previous design 
let any Bible decide how many books, how many chapters, and how many 
verses each chapter contained.  This had its merits because any new 
versification could be defined in each module without anyone caring what 
it was.  But the drawback was the same: any Bible could decide how many 
books, how many chapters, and how many verses without anyone knowing why 
or what they were.


Some have pushed for dynamic definitions of v11n systems again, and I 
understand why.  I am in favor of moving forward with a hybrid approach: 
a set of defined versification systems, which a module will still need 
to choose from, to which it most closely adheres, + the ability for that 
module to specify its variation.


Toward 98%: We have tried to work around the cons of this simple design 
and approach 100% support by accounting for the most common types of 
problems, e.g.


 * The engine allows common verse suffixes (e.g. Matt.2.7b);
 * The engine skips verses in a Bible which are not present-- this
   allows us to create v11n schemes which are a superset of n number of
   closely related v11n schemes, knowing that the engine will skip over
   the verses that are not present in the module; We also have tools
   which print out missing verses which has proven a good QA check for
   our modules team.
 * When we run across a Bible which adds an odd verse here or there or
   an out of order verse, our workaround has been to append these to
   end of the verse just before where they should appear, so the text
   flows the same as the printed Bible, and we include for the reader
   an inline visual separator and marker showing the publisher's verse
   number.


These work arounds get us pretty close to being able to support 98% of 
our Bibles exactly as the publisher wishes, and the remaining 2% is 
supported "well enough" for no complaints by publishers.  Could we build 
a system which allowed out of order verses, or which allowed any scheme 
a Bible wished to follow? Sure, but the added complexity for various 
tasks increases quite a bit for some of these allowances-- e.g., think 
index math for book chapter verse when we cannot assume numeric 
sequence; think abstract ordering of bookmarks not tied to any specific 
Bible, search results across Bibles, etc.


Our vision with v11n definitions is that they will be a few as possible 
allowing us to map between them most easily; and as many as necessary to 
allow us to represent well enough a published work.


Chris Little previously was our versification pumpkin holder and did 
some amazing work researching all this material.  As a demonstration of 
his thorough work and an example of the difficulties with v11n, see his 
work on just the LXX tradition:


https://www.crosswire.org/svn/sword-tools/trunk/versification/lxx_v11ns/

Chris has left our community after many years of volunteering massive 
time and effort.


We haven't had anyone step up who is willing to commit the time and