On 05/17/2012 10:38 AM, Theodoros Theodoropoulos wrote:
Hello everyone,
There could be a case where one would like to match author names to their latin
transliteration. This would be very useful for non latin
(Chinese/Japanese/Greek/Russian/Hindi/Arabic/...) author names that have two
representations and might appear in papers with both!
Google has an excellent API for that, and I believe there are some
transliteration packages for python[1] as well, so if you believe it could be
of interest to the project, it might not be so difficult to implement in the
author matching/merging algorithm!
Thanks in advance,
Theodoros Theodoropoulos
[1] Unidecode, translitcodec, isounidecode, pleiades.transliteration, ...
Hello,
this is actually a really good thing to do!
We do have a pretty complicated function for the comparison of names, which is
already using dictionaries to find out synonyms (richard/dick) and gender
(mario/maria), for example. Plugging in an additional check for
transliterations will not be terribly difficult, we have just to make sure not
to introduce any unexpected behavior (which in this case seems to be painfully
easy, in my experience).
Now, to make things as easy as possible, the best way is to create a boolean
function like:
def check_transliteration(name1, name2):
c1 = check_unidecode(name1, name2)
c2 = check_translitcodec(...
c3 = check_pleaides...
...
return (c1 or c2 or ...)
where name is either a name or a surname, but never a composition (we'll have
to check for transliteration the various part of names separately).
This then can be easily integrated in the main comparison function as an
additional criteria to authoritatively decide that two names are equal as in
the synonym case.
Do you think you can provide an example of such a function which I can then
integrate? If I didn't misread between your words and if you'll have a bit of
time, probably your database will be a good place to directly test the behavior
of this!
Thanks and have a nice day,
Samuele
--
|--
| Samuele Carli
|--
| Contacts:
|
| Home page : www.csspace.net
| E-mail : carlisamuele _at_ csspace.net
| Icq : 60401601
| MSN : [email protected] (no emails here!)
| Skype : wohthan
| jabber/gtalk: [email protected] (no emails here!)
|--