On 19/12/23 00:06, Matěj Cepl wrote:
I have decided not to rely on very kind help by David
with his Windows tools and I have written (hopefully)
completely platform neutral pure Python 3 script for checking
pairwise-characters. So, far it was used only for fixing
https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2  and
I am quite sure it is pretty buggy, but it could be proven useful
for somebody.

Thank you for doing this work! This seems like it could be a useful tool for validating texts of all kinds.

I tried running it over my BSB module, and I hit problems fairly quickly, some of which are more easily solved than others.

1. No support for language “en”

This was easy enough to handle, there's a configuration variable near the top of the file that lets you configure which quotes are used for which languages.

2. Apostrophes

In English, the apostrophe used for possession (“the boy’s train”) and omission (“don’t let’s start") is traditionally set with the same character used as the closing single quote, so in any non-trivial document there will almost certainly be more "closing single quotes" than opening single quotes, it's not worth reporting on.

I got around this by just deleting single quotes from the configuration.

3. Nested quotations

In Genesis 20:11-13, Abraham tells Abimelech that he told Sarah to tell other people that she was Abraham’s brother. In the BSB (and NIV, and ESV, and NASB) this results in a triple-nested quotation. In English typesetting conventions the outermost quotation gets double-quotes, the second level gets single-quotes, and the third level gets double quotes again. This causes the script to report an error:

Balance for  character “ is over one in Gen.20.13

I couldn't immediately think of a way to get around this.

Another quirk that occurs to me is that in English typesetting, if one person speaks multiple paragraphs (for example, the Sermon on the Mount) then each paragraph gets an opening double-quote, but no closing double-quote. That's going to play havoc with this kind of quote-checking tool, too.

Perhaps this kind of tool just isn't suited to checking English text... but I'm sure there's other languages with more sensible conventions that it could help with. Good luck with it!

sword-devel mailing list: sword-devel@crosswire.org
Instructions to unsubscribe/change your settings at above page

Reply via email to