I'm forwarding this to the wider community, in order to obtain a response regarding my suggestion that we design a new SWORD filter to process abbreviations.
See my last reply to the modules team for details. Best regards, David Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email. ------- Forwarded Message ------- From: David Haslam <dfh...@protonmail.com> Date: On Sunday, May 11th, 2025 at 4:45 PM Subject: Re: [modules] New Beta Module: Tyndale To: dom...@crosswire.org <dom...@crosswire.org>, Fr Cyrille <fr.cyri...@tiberiade.be> CC: modu...@crosswire.org <modu...@crosswire.org> > Dear all, > > Today, I have begun to examine the use of Roman numerals to translate numbers > in the Tyndale module text exported using diatheke. > > The following records match a simple PCRE that simply looks for words that > consist entirely of the permitted lowercase letters found in numbers using > Roman numerals. > > Here's my PCRE: [ijvxlcdm]+ > > The search was performed on the word frequency analysis already done using > BabelPad Tools. > >> 1 cxliiii >> 38 did >> 15 i >> 16 ii >> 1 iic >> 22 iii >> 31 iiii >> 1 iiiii >> 68 iiij >> 81 iij >> 137 ij >> 16 ix >> 25 l >> 1 li >> 1 liii >> 2 liiij >> 3 liij >> 1 lij >> 2 lix >> 3 lvij >> 10 lx >> 2 lxi >> 2 lxiiij >> 4 lxij >> 1 lxix >> 4 lxv >> 1 lxvj >> 25 lxx >> 2 lxxiiij >> 2 lxxiij >> 2 lxxij >> 6 lxxv >> 1 lxxvi >> 1 lxxvij >> 3 lxxx >> 1 lxxxiij >> 2 lxxxij >> 2 lxxxvi >> 1 lxxxvij >> 1 lxxxx >> 1 m >> 7 mi >> 1 mid >> 86 v >> 26 vi >> 43 vii >> 5 viii >> 26 viij >> 133 vij >> 5 vj >> 51 x >> 9 xi >> 45 xii >> 1 xiiii >> 20 xiiij >> 4 xiij >> 31 xij >> 1 xix >> 1 xj >> 59 xl >> 2 xli >> 2 xlii >> 3 xliiii >> 1 xliij >> 1 xlij >> 1 xlix >> 4 xlv >> 3 xlvi >> 1 xlviij >> 1 xlvij >> 18 xv >> 6 xvi >> 3 xviii >> 1 xviij >> 4 xvij >> 53 xx >> 1 xxiii >> 7 xxiiii >> 2 xxiiij >> 1 xxiij >> 3 xxij >> 2 xxix >> 1 xxj >> 2 xxv >> 2 xxviij >> 2 xxvij >> 51 xxx >> 1 xxxiiij >> 3 xxxiij >> 6 xxxij >> 3 xxxv >> 2 xxxvi >> 1 xxxviii >> 1 xxxviij >> 5 xxxvij > > Observations: > > - Most of the numbers in verse text that potentially match Roman numerals are > lowercase. > - There are 103 unique strings that potentially matchRoman numerals > irrespective of case. > > - There are 95 unique strings that potentially match lowercase Roman numerals. > > - A few of these can be discounted as being ordinary words: "did", "mi", > "mid", etc. > - Arabic numeral 4 is often represented as either "iiii" or "iiij" instead of > "iv" reflecting the usage of that period. > - The use of the alternative final letter "j" in place of "i" is likely to be > a printer's flourish of that period. > - The vast majority of such strings found in verse text are marked with a > period (full stop) fore 'n' aft. e.g. ".xxx." > - Some strings omit one or both of these period delimiters! > - Some strings are wrongly preceded by ". " rather than " ." (misplaced > delimiter due to OCR error ?) > - The total number of matches to PCRE "\W[ijvxlcdm]+\W" (without the quotes) > is 1293 > - Ofthose 1293, only 958 match the PCRE "\.[ijvxlcdm]+\." (i.e. with both the > properperioddelimiters). > > - That leaves 335 instances in which there's a missing or misplaced period > delimiter (or which are ordinary words). > - Searching for patterns that include uppercase Roman numerals is more > difficult because of the very common word "I" (first person pronoun). > - The total number of matches to PCRE "\W[ijvxlcdmJVXLCDM]+\W" (without the > quotes) is 1314. > - That means we thereby discovered 21 further potential candidates in which > at least one letter is uppercase, excluding "I", > > If the Tyndale Bible was printed consistently with every number properly > delimited between two periods, and always lowercase, > then it has become apparent that there are many instances where the digitised > text did not faithfully transcribe many of these! > > We therefore require the upstream source to be thoroughly checked in this > regard, and edited to fix all such OCR errors. > > Looking to the future, we might also make good use of the OSIS element abbr > to encode all such numbers. E.g. > > <abbr type="x-Roman" expansion="30">.xxx.</abbr> > > Aside: It would be a cool enhancement to the SWORD API to provide support for > a new filter: > > GlobalOptionFilter=OSISExpandAbbreviations > > cf. Does the SWORD API already provide any support for the abbr element? If > so, what is the functionality ? > > Best regards, > > David > > Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email. > > On Sunday, May 11th, 2025 at 3:35 PM, David Haslam <dfh...@protonmail.com> > wrote: > >> Dear Cyrille, dear Dom, >> >> In numerous places, the digital text of the Tyndale module omits the macron >> over a vowel that's there in the original printed pages. e.g. Abraha - >> should be Abrahā. >> >> This is just one example of the many kinds of deficiencies in the upstream >> source. >> >> Fixing these in the upstream source would require a lot of intensive effort. >> >> Best regards, >> >> David >> >> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email. >> >> On Wednesday, May 7th, 2025 at 7:29 PM, David Haslam <dfh...@protonmail.com> >> wrote: >> >>> Hi Cyrille, >>> >>> Unless users know what the MALTESE CROSS & the CROSS PATTY WITH RIGHT >>> CROSSBAR actually denote, how does including them help the Bible student? >>> >>> - Can we try to we find out more? >>> - Would ChatGPT help in any way? >>> >>> Best regards, >>> >>> David >>> >>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email. >>> >>> On Wednesday, May 7th, 2025 at 6:30 PM, Fr Cyrille >>> <fr.cyri...@tiberiade.be> wrote: >>> >>>> Le 07/05/2025 à 15:08, David Haslam a écrit : >>>> >>>>> Hi Cyrille, >>>>> >>>>> Why was only one correction made? >>>>> >>>>> I listed two locations where the verse hadn't been properly referenced! >>>>> >>>>> - You have fixed Acts 9:38: >>>>> - >>>>> >>>>> You have not fixed Revelation of John 1:9: >>>> >>>> I did, but i missed osisID.... >>>> >>>>> And those tow types of peculiar symbol are all still there! >>>>> >>>>> - 3 of U+2720 ✠ MALTESE CROSS >>>>> - 5 of U+2E50 ⹐ CROSS PATTY WITH RIGHT CROSSBAR >>>> >>>> Ok you want it to be removed? >>>> >>>>> Best regards, >>>>> >>>>> David >>>>> >>>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email. >>>>> >>>>> On Wednesday, May 7th, 2025 at 1:50 PM, dom...@crosswire.org >>>>> dom...@crosswire.org wrote: >>>>> >>>>>> This is to announce that we have just now uploaded Tyndale >>>>>> in the CrossWire beta repository for testing purposes. >>>>>> >>>>>> If no raised concern nor a quality alert has been sent on the list, >>>>>> Tyndale will be published in a week. >>>>>> >>>>>> This is an update. >>>>>> Language=English >>>>>> Version=2.0 >>>>>> History_2.0=(2025-05-07) New source >>>>>> TextSource=https://en.wikisource.org/wiki/Bible_(Tyndale) >>>>>> Versification=KJV >>>>>> >>>>>> Many thanks to everyone who contributed to this release. >>>>>> >>>>>> yours >>>>>> >>>>>> P.S.: This email is sent automatically. >>>>>> >>>>>> _______________________________________________ >>>>>> modules mailing list >>>>>> modu...@crosswire.org >>>>>> http://www.crosswire.org/mailman/listinfo/modules >>>> >>>> -- >>>> Vous aimez la Bible ? Vous êtes étudiant en théologie ? Utilisez >>>> l'application libre [Xiphos](https://xiphos.org/) ou >>>> [Andbible](https://andbible.github.io/) et accédez aux textes sources, à >>>> des commentaires, des dictionnaires et beaucoup d'autres >>>> fonctionnalités... Me contacter pour des traductions en français.
Analysis.7z
Description: application/compressed
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page