Hello Samuele,

From all(?) the packages, I found that Uridecode (http://pypi.python.org/pypi/Unidecode) supports most of the languages that could be transliterated (although, for Greek it does not support the standard ISO 843 but a 'custom' one which is not very good as a practice. More details for the official transliteration standards for Greek, here: http://transliteration.eki.ee/pdf/Greek.pdf)

The usage is very simple and I run an example with the following VERY complex Unicode string (with Hebrew, Hindi, Chinese and Greek):
---------------
The decomposition mapping is <츠, U+11B8>, and not <0x110E, ᅳ, 11B8>.
<p>The title says ‫פעילות הבינאום, W3C‬ in Hebrew</p>
abcáßçकखी國際𐎄𐎔𐎘
Ελληνικά
---------------

and is converted to:

---------------
The decomposition mapping is <ceu, b>, and not <c, eu, 11B8>.
<p>The title says p`ylvt hbynvm, W3C in Hebrew</p>
abcassckkhiiGuo Ji
Ellenika
---------------

You may verify it for yourself with the following test code:
from unidecode import unidecode
print unidecode(u"The decomposition mapping is <\uCE20, \u11B8>, and not <\u110E, \u1173, 11B8>.\n<p>The title says \u202B\u05E4\u05E2\u05D9\u05DC\u05D5\u05EA \u05D4\u05D1\u05D9\u05E0\u05D0\u05D5\u05DD, W3C\u202C in Hebrew</p>\nabc\u00E1\u00DF\u00E7\u0915\u0916\u0940\u570B\u969B\uD800\uDF84\uD800\uDF94\uD800\uDF98\n\u0395\u03BB\u03BB\u03B7\u03BD\u03B9\u03BA\u03AC")

Again, the best transliteration for Greek is the pleiades package (http://pypi.python.org/packages/source/p/pleiades.transliteration/pleiades.transliteration-0.5.tar.gz), but unidecode seems to support many more languages, so, for now, I suppose is a best all-around solution.

Of course, I will be excited to do some checking of the code you're planning to use, with our records that have authors in both Greek and transliterated Greek form.

Best regards,
Theodoropoulos Theodoros


ps. In case this 'extra' step slows down the merging process considerably, it could be a good idea to incorporate an additional flag in the invenio.conf file that would enable/disable this functionality...

Reply via email to