I've been looking a bit closer at the YSO places catalog [1] in Mix'n'match and I'm wondering why only 20% of the places were automatically matched.

For example, Nepal (http://www.yso.fi/onto/yso/p107682) was automatically matched to Nepal (Q837).


Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761).

Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).

Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to Akkunusjoki (Q12253027).

There are many more cases like this. So the precision of the automatic matching seems good (all but one were correct so far), but the recall is rather low, and even in cases where the label is identical a match has not been suggested. Is there anything that could be done about this?

Somewhat related to this, it seems that none of the places with parenthetical qualifiers in their names were matched. For example "Ahjo (Kerava)" could have been matched to Q11849902 (which has a Finnish label that is identical) and "Ala-Malmi (Helsinki)" could have been matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names include parenthetical qualifiers - to make them unique despite different places having identical names - this means that a lot of potential matches are missing. Could something be done to improve the situation?

If Mix'n'match is incapable of automatically matching cases like this, would it help if I did an automatic matching externally using some other tool, and then gave the potential matches as e.g. a CSV file that could then be imported into Mix'n'match so that they can be verified there?


[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473

Thanks a lot, that was fast! And the results look very good!

I confirmed a couple dozen automated mapping and fixed an incorrect one ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I started hitting rate limit errors. I guess it would be possible to avoid those with some extra permissions?

About 20% of the places were automatically matched. Probably most of the remaining ones - around 5000 - do not exist in Wikidata because they are e.g. towns and villages in Finland. Would it be fair game to create all of them in Wikidata?


