Peter von Kaehne wrote:
As a side issue of the other debate - how can I achieve NFC for a text I
am working on via commandline utilities?

All I can find in ICU documentation is about programming methods
available, but I have seen no command line utilities.

DM's suggestion of using the Perl facility is fine, and I use it myself plenty often when I'm scripting Perl. But there's also an ICU utility which can achieve normalization (and much more).

uconv (meant as a replacement for iconv, if you're familiar with that) does codepage/encoding conversion, transliteration, and normalization. It's part of the standard ICU distribution and we have Windows binaries on the FTP site:
http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip
http://crosswire.org/ftpmirror/pub/sword/utils/win32/icudt40-big.zip

(I'd recommend the big, 7.6 MB version of the ICU data for this.)

Use is fairly straightforward, but to take a file "input" and NFC normalize it as a file "output" you would use (assuming both are UTF-8):

uconv -f utf-8 -t utf-8 -x NFC -o output input

--Chris

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to