I'm forwarding this to the wider community, in order to obtain a response 
regarding my suggestion that we design a new SWORD filter to process 
abbreviations.

See my last reply to the modules team for details.

Best regards,

David

Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.

------- Forwarded Message -------
From: David Haslam <dfh...@protonmail.com>
Date: On Sunday, May 11th, 2025 at 4:45 PM
Subject: Re: [modules] New Beta Module: Tyndale
To: dom...@crosswire.org <dom...@crosswire.org>, Fr Cyrille 
<fr.cyri...@tiberiade.be>
CC: modu...@crosswire.org <modu...@crosswire.org>

> Dear all,
>
> Today, I have begun to examine the use of Roman numerals to translate numbers 
> in the Tyndale module text exported using diatheke.
>
> The following records match a simple PCRE that simply looks for words that 
> consist entirely of the permitted lowercase letters found in numbers using 
> Roman numerals.
>
> Here's my PCRE: [ijvxlcdm]+
>
> The search was performed on the word frequency analysis already done using 
> BabelPad Tools.
>
>> 1 cxliiii
>> 38 did
>> 15 i
>> 16 ii
>> 1 iic
>> 22 iii
>> 31 iiii
>> 1 iiiii
>> 68 iiij
>> 81 iij
>> 137 ij
>> 16 ix
>> 25 l
>> 1 li
>> 1 liii
>> 2 liiij
>> 3 liij
>> 1 lij
>> 2 lix
>> 3 lvij
>> 10 lx
>> 2 lxi
>> 2 lxiiij
>> 4 lxij
>> 1 lxix
>> 4 lxv
>> 1 lxvj
>> 25 lxx
>> 2 lxxiiij
>> 2 lxxiij
>> 2 lxxij
>> 6 lxxv
>> 1 lxxvi
>> 1 lxxvij
>> 3 lxxx
>> 1 lxxxiij
>> 2 lxxxij
>> 2 lxxxvi
>> 1 lxxxvij
>> 1 lxxxx
>> 1 m
>> 7 mi
>> 1 mid
>> 86 v
>> 26 vi
>> 43 vii
>> 5 viii
>> 26 viij
>> 133 vij
>> 5 vj
>> 51 x
>> 9 xi
>> 45 xii
>> 1 xiiii
>> 20 xiiij
>> 4 xiij
>> 31 xij
>> 1 xix
>> 1 xj
>> 59 xl
>> 2 xli
>> 2 xlii
>> 3 xliiii
>> 1 xliij
>> 1 xlij
>> 1 xlix
>> 4 xlv
>> 3 xlvi
>> 1 xlviij
>> 1 xlvij
>> 18 xv
>> 6 xvi
>> 3 xviii
>> 1 xviij
>> 4 xvij
>> 53 xx
>> 1 xxiii
>> 7 xxiiii
>> 2 xxiiij
>> 1 xxiij
>> 3 xxij
>> 2 xxix
>> 1 xxj
>> 2 xxv
>> 2 xxviij
>> 2 xxvij
>> 51 xxx
>> 1 xxxiiij
>> 3 xxxiij
>> 6 xxxij
>> 3 xxxv
>> 2 xxxvi
>> 1 xxxviii
>> 1 xxxviij
>> 5 xxxvij
>
> Observations:
>
> - Most of the numbers in verse text that potentially match Roman numerals are 
> lowercase.
> - There are 103 unique strings that potentially matchRoman numerals 
> irrespective of case.
>
> - There are 95 unique strings that potentially match lowercase Roman numerals.
>
> - A few of these can be discounted as being ordinary words: "did", "mi", 
> "mid", etc.
> - Arabic numeral 4 is often represented as either "iiii" or "iiij" instead of 
> "iv" reflecting the usage of that period.
> - The use of the alternative final letter "j" in place of "i" is likely to be 
> a printer's flourish of that period.
> - The vast majority of such strings found in verse text are marked with a 
> period (full stop) fore 'n' aft. e.g. ".xxx."
> - Some strings omit one or both of these period delimiters!
> - Some strings are wrongly preceded by ". " rather than " ." (misplaced 
> delimiter due to OCR error ?)
> - The total number of matches to PCRE "\W[ijvxlcdm]+\W" (without the quotes) 
> is 1293
> - Ofthose 1293, only 958 match the PCRE "\.[ijvxlcdm]+\." (i.e. with both the 
> properperioddelimiters).
>
> - That leaves 335 instances in which there's a missing or misplaced period 
> delimiter (or which are ordinary words).
> - Searching for patterns that include uppercase Roman numerals is more 
> difficult because of the very common word "I" (first person pronoun).
> - The total number of matches to PCRE "\W[ijvxlcdmJVXLCDM]+\W" (without the 
> quotes) is 1314.
> - That means we thereby discovered 21 further potential candidates in which 
> at least one letter is uppercase, excluding "I",
>
> If the Tyndale Bible was printed consistently with every number properly 
> delimited between two periods, and always lowercase,
> then it has become apparent that there are many instances where the digitised 
> text did not faithfully transcribe many of these!
>
> We therefore require the upstream source to be thoroughly checked in this 
> regard, and edited to fix all such OCR errors.
>
> Looking to the future, we might also make good use of the OSIS element abbr 
> to encode all such numbers. E.g.
>
> <abbr type="x-Roman" expansion="30">.xxx.</abbr>
>
> Aside: It would be a cool enhancement to the SWORD API to provide support for 
> a new filter:
>
> GlobalOptionFilter=OSISExpandAbbreviations
>
> cf. Does the SWORD API already provide any support for the abbr element? If 
> so, what is the functionality ?
>
> Best regards,
>
> David
>
> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>
> On Sunday, May 11th, 2025 at 3:35 PM, David Haslam <dfh...@protonmail.com> 
> wrote:
>
>> Dear Cyrille, dear Dom,
>>
>> In numerous places, the digital text of the Tyndale module omits the macron 
>> over a vowel that's there in the original printed pages. e.g. Abraha - 
>> should be Abrahā.
>>
>> This is just one example of the many kinds of deficiencies in the upstream 
>> source.
>>
>> Fixing these in the upstream source would require a lot of intensive effort.
>>
>> Best regards,
>>
>> David
>>
>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>
>> On Wednesday, May 7th, 2025 at 7:29 PM, David Haslam <dfh...@protonmail.com> 
>> wrote:
>>
>>> Hi Cyrille,
>>>
>>> Unless users know what the MALTESE CROSS & the CROSS PATTY WITH RIGHT 
>>> CROSSBAR actually denote, how does including them help the Bible student?
>>>
>>> - Can we try to we find out more?
>>> - Would ChatGPT help in any way?
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>
>>> On Wednesday, May 7th, 2025 at 6:30 PM, Fr Cyrille 
>>> <fr.cyri...@tiberiade.be> wrote:
>>>
>>>> Le 07/05/2025 à 15:08, David Haslam a écrit :
>>>>
>>>>> Hi Cyrille,
>>>>>
>>>>> Why was only one correction made?
>>>>>
>>>>> I listed two locations where the verse hadn't been properly referenced!
>>>>>
>>>>> -  You have fixed Acts 9:38:
>>>>> -
>>>>>
>>>>> You have not fixed Revelation of John 1:9:
>>>>
>>>> I did, but i missed osisID....
>>>>
>>>>> And those tow types of peculiar symbol are all still there!
>>>>>
>>>>> - 3 of U+2720 ✠ MALTESE CROSS
>>>>> - 5 of U+2E50 ⹐ CROSS PATTY WITH RIGHT CROSSBAR
>>>>
>>>> Ok you want it to be removed?
>>>>
>>>>> Best regards,
>>>>>
>>>>> David
>>>>>
>>>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>>>
>>>>> On Wednesday, May 7th, 2025 at 1:50 PM, dom...@crosswire.org 
>>>>> dom...@crosswire.org wrote:
>>>>>
>>>>>> This is to announce that we have just now uploaded Tyndale
>>>>>> in the CrossWire beta repository for testing purposes.
>>>>>>
>>>>>> If no raised concern nor a quality alert has been sent on the list,
>>>>>> Tyndale will be published in a week.
>>>>>>
>>>>>> This is an update.
>>>>>> Language=English
>>>>>> Version=2.0
>>>>>> History_2.0=(2025-05-07) New source
>>>>>> TextSource=https://en.wikisource.org/wiki/Bible_(Tyndale)
>>>>>> Versification=KJV
>>>>>>
>>>>>> Many thanks to everyone who contributed to this release.
>>>>>>
>>>>>> yours
>>>>>>
>>>>>> P.S.: This email is sent automatically.
>>>>>>
>>>>>> _______________________________________________
>>>>>> modules mailing list
>>>>>> modu...@crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/modules
>>>>
>>>> --
>>>> Vous aimez la Bible ? Vous êtes étudiant en théologie ? Utilisez 
>>>> l'application libre [Xiphos](https://xiphos.org/) ou 
>>>> [Andbible](https://andbible.github.io/) et accédez aux textes sources, à 
>>>> des commentaires, des dictionnaires et beaucoup d'autres 
>>>> fonctionnalités... Me contacter pour des traductions en français.

Attachment: Analysis.7z
Description: application/compressed

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to