On 05/17/2012 10:38 AM, Theodoros Theodoropoulos wrote:
Hello everyone,

There could be a case where one would like to match author names to their latin 
transliteration. This would be very useful for non latin 
(Chinese/Japanese/Greek/Russian/Hindi/Arabic/...) author names that have two 
representations and might appear in papers with both!
Google has an excellent API for that, and I believe there are some 
transliteration packages for python[1] as well, so if you believe it could be 
of interest to the project, it might not be so difficult to implement in the 
author matching/merging algorithm!

Thanks in advance,
Theodoros Theodoropoulos

[1] Unidecode, translitcodec, isounidecode, pleiades.transliteration, ...

Hello,

this is actually a really good thing to do!
We do have a pretty complicated function for the comparison of names, which is 
already using dictionaries to find out synonyms (richard/dick) and gender 
(mario/maria), for example. Plugging in an additional check for 
transliterations will not be terribly difficult, we have just to make sure not 
to introduce any unexpected behavior (which in this case seems to be painfully 
easy, in my experience).

Now, to make things as easy as possible, the best way is to create a boolean 
function like:

def check_transliteration(name1, name2):
        c1 = check_unidecode(name1, name2)
        c2 = check_translitcodec(...
        c3 = check_pleaides...
        ...
        return (c1 or c2 or ...)

where name is either a name or a surname, but never a composition (we'll have 
to check for transliteration the various part of names separately).

This then can be easily integrated in the main comparison function as an 
additional criteria to authoritatively decide that two names are equal as in 
the synonym case.

Do you think you can provide an example of such a function which I can then 
integrate? If I didn't misread between your words and if you'll have a bit of 
time, probably your database will be a good place to directly test the behavior 
of this!

Thanks and have a nice day,
Samuele

--
|--
| Samuele Carli
|--
| Contacts:
|
|       Home page   : www.csspace.net
|       E-mail      : carlisamuele _at_ csspace.net
|       Icq         : 60401601
|       MSN         : [email protected] (no emails here!)
|       Skype       : wohthan
|       jabber/gtalk: [email protected] (no emails here!)
|--

Reply via email to