Hi Magnus!

It's even higher now - 45%. Thanks a lot! This helps a lot with the verifying.


Also matching of names with parenthetical qualifiers works better now. I see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However, "Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking workshop, not a specific place). Neither Wikidata entity has a type statement, the latter has "subclass-of <workshop>" statement.

In any case, I think this is now good enough for serious work, so we will start verifying the suggested matches. 2.5% (173) already done...

-Osma


Magnus Manske kirjoitti 19.06.2017 klo 12:02:
I fiddled with it a bit, now 35% automatched.

Will try some more, but there are some sanity constraints on the matching. If it finds more than one match for the name, it does not set any match, because random matches on the same name were annoying in the past. There is also a type constraint, which might skip some Wikidata items without appropriate instance/subclass.

On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suomi...@helsinki.fi <mailto:osma.suomi...@helsinki.fi>> wrote:

    Hi Magnus, all,

    I've been looking a bit closer at the YSO places catalog [1] in
    Mix'n'match and I'm wondering why only 20% of the places were
    automatically matched.

    For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
    automatically matched to Nepal (Q837).

    But:

    Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
    (Q3761).

    Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
    (Q1823).

    Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
    Akkunusjoki (Q12253027).

    There are many more cases like this. So the precision of the automatic
    matching seems good (all but one were correct so far), but the recall is
    rather low, and even in cases where the label is identical a match has
    not been suggested. Is there anything that could be done about this?


    Somewhat related to this, it seems that none of the places with
    parenthetical qualifiers in their names were matched. For example "Ahjo
    (Kerava)" could have been matched to Q11849902 (which has a Finnish
    label that is identical) and "Ala-Malmi (Helsinki)" could have been
    matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
    include parenthetical qualifiers - to make them unique despite different
    places having identical names - this means that a lot of potential
    matches are missing. Could something be done to improve the situation?


    If Mix'n'match is incapable of automatically matching cases like this,
    would it help if I did an automatic matching externally using some other
    tool, and then gave the potential matches as e.g. a CSV file that could
    then be imported into Mix'n'match so that they can be verified there?

    -Osma

    [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473


    Osma Suominen kirjoitti 17.06.2017 klo 13:13:
     > Hi Magnus,
     >
     > Thanks a lot, that was fast! And the results look very good!
     >
     > I confirmed a couple dozen automated mapping and fixed an
    incorrect one
     > ("Amerikka" was matched to USA, but I changed it to "Americas").
    Then I
     > started hitting rate limit errors. I guess it would be possible
    to avoid
     > those with some extra permissions?
     >
     > About 20% of the places were automatically matched. Probably most
    of the
     > remaining ones - around 5000 - do not exist in Wikidata because
    they are
     > e.g. towns and villages in Finland. Would it be fair game to
    create all
     > of them in Wikidata?
     >
     > -Osma
     >

    --
    Osma Suominen
    D.Sc. (Tech), Information Systems Specialist
    National Library of Finland
    P.O. Box 26 (Kaikukatu 4)
    00014 HELSINGIN YLIOPISTO
    Tel. +358 50 3199529 <tel:+358%2050%203199529>
    osma.suomi...@helsinki.fi <mailto:osma.suomi...@helsinki.fi>
    http://www.nationallibrary.fi

    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to