On Jun 30, 2012, at 7:42 AM, David Haslam <dfh...@googlemail.com> wrote:

> 
> For that reason, I've developed a TextPipe filter to transliterate the
> Cherokee text to the Sequoyah Latin equivalents, using the information in
> the Wikipedia page about Cherokee.
> 

If you're using the icu-sword data bundle, which you are if you use Sword 
utilities that I compiled for Win32, then you have a reversible Cherokee-Latin 
transliterator already. It's then trivial to get transliterated text either by 
telling diatheke to transliterate as it outputs or you can use mod2imp followed 
by uconv, which can perform any transliteration transform known to its icu data 
bundle.

> At least this provides the possibility whereby proper names in the English
> KJV could be mapped to the right words in the Cherokee translation
> (i.e. by fuzzy matching and manual editing, perhaps with some intelligent
> guesswork).
> 
> This could pave the way for back-conversion from the Latin script to the
> Cherokee symbols, while at the same time converting the capitalized words to
> a suitable XML markup.
> 

It should be fairly trivial to automate tagging of the names. We really only 
need a list of names present within a particular verse, in English preferably. 
Then we can compute the edit distance of the Cherokee words in a verse to the 
names on our list of names in that verse. Finally, assign the Cherokee word 
with the lowest edit distance to the English name and tag accordingly. A type 
of Soundex edit distance would probably work best, but Levenshtein might 
suffice.

If you can locate a list of names in the Bible and all the verses in which they 
appear, I can implement the above algorithm to do the tagging.

--Chris
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to