Hi Magnus!
It's even higher now - 45%. Thanks a lot! This helps a lot with the
verifying.
Also matching of names with parenthetical qualifiers works better now. I
see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However,
"Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to
Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking
workshop, not a specific place). Neither Wikidata entity has a type
statement, the latter has "subclass-of <workshop>" statement.
In any case, I think this is now good enough for serious work, so we
will start verifying the suggested matches. 2.5% (173) already done...
-Osma
Magnus Manske kirjoitti 19.06.2017 klo 12:02:
I fiddled with it a bit, now 35% automatched.
Will try some more, but there are some sanity constraints on the
matching. If it finds more than one match for the name, it does not set
any match, because random matches on the same name were annoying in the
past. There is also a type constraint, which might skip some Wikidata
items without appropriate instance/subclass.
On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suomi...@helsinki.fi
<mailto:osma.suomi...@helsinki.fi>> wrote:
Hi Magnus, all,
I've been looking a bit closer at the YSO places catalog [1] in
Mix'n'match and I'm wondering why only 20% of the places were
automatically matched.
For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
automatically matched to Nepal (Q837).
But:
Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
(Q3761).
Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
(Q1823).
Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
Akkunusjoki (Q12253027).
There are many more cases like this. So the precision of the automatic
matching seems good (all but one were correct so far), but the recall is
rather low, and even in cases where the label is identical a match has
not been suggested. Is there anything that could be done about this?
Somewhat related to this, it seems that none of the places with
parenthetical qualifiers in their names were matched. For example "Ahjo
(Kerava)" could have been matched to Q11849902 (which has a Finnish
label that is identical) and "Ala-Malmi (Helsinki)" could have been
matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
include parenthetical qualifiers - to make them unique despite different
places having identical names - this means that a lot of potential
matches are missing. Could something be done to improve the situation?
If Mix'n'match is incapable of automatically matching cases like this,
would it help if I did an automatic matching externally using some other
tool, and then gave the potential matches as e.g. a CSV file that could
then be imported into Mix'n'match so that they can be verified there?
-Osma
[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
Osma Suominen kirjoitti 17.06.2017 klo 13:13:
> Hi Magnus,
>
> Thanks a lot, that was fast! And the results look very good!
>
> I confirmed a couple dozen automated mapping and fixed an
incorrect one
> ("Amerikka" was matched to USA, but I changed it to "Americas").
Then I
> started hitting rate limit errors. I guess it would be possible
to avoid
> those with some extra permissions?
>
> About 20% of the places were automatically matched. Probably most
of the
> remaining ones - around 5000 - do not exist in Wikidata because
they are
> e.g. towns and villages in Finland. Would it be fair game to
create all
> of them in Wikidata?
>
> -Osma
>
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529 <tel:+358%2050%203199529>
osma.suomi...@helsinki.fi <mailto:osma.suomi...@helsinki.fi>
http://www.nationallibrary.fi
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata