Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Thad Guidry
In Freebase, back in the day, we also created new entities for the same
reasons as Magnus gives.
We found that just creating an entity and having potentially duplicate
entities created less problems than not having any entity.
We later just dealt with duplicate entities through simple human merge
requests.
Duplicate entities ended up being a very very minor occurrence after we
improved our search algorithms to account for popularity as well as
entities that had more than 1 filled out property.

In the case of non-mission critical datasets...More data, even duplicate,
is better than no data at all.

-Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Osma Suominen

Magnus Manske kirjoitti 19.06.2017 klo 14:58:

My official policy now is to create a new item if one does not exist; 
the fact that there is an entry in a (good) third-party catalog alone 
makes them notable on Wikidata, but villages and lakes etc. are also 
notable by default.


Thanks, I guess this is as official as it gets :) We will follow your 
advice and simply create new entities as necessary.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Magnus Manske
On Mon, Jun 19, 2017 at 12:16 PM Osma Suominen 
wrote:

>
> I couldn't see the "not on Wikidata" button that was mentioned in the
> manual in any of the modes. Has it been removed? It would be useful to
> be able to mark that something is not (yet) in Wikidata, though I
> suppose it could be added by someone else at any time, so this type of
> information may become obsolete over time.
>
> That was indeed removed, as it takes a long time to finish large catalogs
(years), and by that time new  items may have been created, so all the "not
in Wikidata" entries have to be checked again.

My official policy now is to create a new item if one does not exist; the
fact that there is an entry in a (good) third-party catalog alone makes
them notable on Wikidata, but villages and lakes etc. are also notable by
default.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Osma Suominen

Magnus Manske kirjoitti 19.06.2017 klo 13:54:

For "casual matching", try the game mode:
https://tools.wmflabs.org/mix-n-match/#/random/473


Thanks, I already tried all the modes. They are good for different 
purposes. The manual mode seems most efficient for verifying the 
automated matches, most of which can just be confirmed with a single 
click without reloading a whole page, but the game modes are better for 
handling the unmatched ones since they provide some fuzzier suggestions 
without having to do manual searches.


I couldn't see the "not on Wikidata" button that was mentioned in the 
manual in any of the modes. Has it been removed? It would be useful to 
be able to mark that something is not (yet) in Wikidata, though I 
suppose it could be added by someone else at any time, so this type of 
information may become obsolete over time.


In any case we need to decide whether to add all the places (e.g. 
villages and small lakes) that are not yet in Wikidata as new entities 
or not. Is there any guidance on this? I know the notability guidelines 
[1], but they are rather vague.


For most of the places we would like to add, there is at least one other 
public source - the Finnish place names registry, which contains 
information such as names, type, administrative hierarchy and 
coordinates - even though it is currently not linked to Wikidata in any 
way. And since this set of places is originally based on a library 
authority file that is maintained based on indexing needs, there should 
be at least one document about each place in libraries, archives and/or 
museum collections. So every place we have is at least slightly notable, 
but I'm not sure whether that's notable enough for Wikidata.


-Osma

[1] https://www.wikidata.org/wiki/Wikidata:Notability

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Magnus Manske
For "casual matching", try the game mode:
https://tools.wmflabs.org/mix-n-match/#/random/473

On Mon, Jun 19, 2017 at 10:16 AM Osma Suominen 
wrote:

> Hi Magnus!
>
> It's even higher now - 45%. Thanks a lot! This helps a lot with the
> verifying.
>
> Also matching of names with parenthetical qualifiers works better now. I
> see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However,
> "Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to
> Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking
> workshop, not a specific place). Neither Wikidata entity has a type
> statement, the latter has "subclass-of " statement.
>
> In any case, I think this is now good enough for serious work, so we
> will start verifying the suggested matches. 2.5% (173) already done...
>
> -Osma
>
>
> Magnus Manske kirjoitti 19.06.2017 klo 12:02:
> > I fiddled with it a bit, now 35% automatched.
> >
> > Will try some more, but there are some sanity constraints on the
> > matching. If it finds more than one match for the name, it does not set
> > any match, because random matches on the same name were annoying in the
> > past. There is also a type constraint, which might skip some Wikidata
> > items without appropriate instance/subclass.
> >
> > On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen  > > wrote:
> >
> > Hi Magnus, all,
> >
> > I've been looking a bit closer at the YSO places catalog [1] in
> > Mix'n'match and I'm wondering why only 20% of the places were
> > automatically matched.
> >
> > For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
> > automatically matched to Nepal (Q837).
> >
> > But:
> >
> > Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
> > (Q3761).
> >
> > Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
> > (Q1823).
> >
> > Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
> > Akkunusjoki (Q12253027).
> >
> > There are many more cases like this. So the precision of the
> automatic
> > matching seems good (all but one were correct so far), but the
> recall is
> > rather low, and even in cases where the label is identical a match
> has
> > not been suggested. Is there anything that could be done about this?
> >
> >
> > Somewhat related to this, it seems that none of the places with
> > parenthetical qualifiers in their names were matched. For example
> "Ahjo
> > (Kerava)" could have been matched to Q11849902 (which has a Finnish
> > label that is identical) and "Ala-Malmi (Helsinki)" could have been
> > matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place
> names
> > include parenthetical qualifiers - to make them unique despite
> different
> > places having identical names - this means that a lot of potential
> > matches are missing. Could something be done to improve the
> situation?
> >
> >
> > If Mix'n'match is incapable of automatically matching cases like
> this,
> > would it help if I did an automatic matching externally using some
> other
> > tool, and then gave the potential matches as e.g. a CSV file that
> could
> > then be imported into Mix'n'match so that they can be verified there?
> >
> > -Osma
> >
> > [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
> >
> >
> > Osma Suominen kirjoitti 17.06.2017 klo 13:13:
> >  > Hi Magnus,
> >  >
> >  > Thanks a lot, that was fast! And the results look very good!
> >  >
> >  > I confirmed a couple dozen automated mapping and fixed an
> > incorrect one
> >  > ("Amerikka" was matched to USA, but I changed it to "Americas").
> > Then I
> >  > started hitting rate limit errors. I guess it would be possible
> > to avoid
> >  > those with some extra permissions?
> >  >
> >  > About 20% of the places were automatically matched. Probably most
> > of the
> >  > remaining ones - around 5000 - do not exist in Wikidata because
> > they are
> >  > e.g. towns and villages in Finland. Would it be fair game to
> > create all
> >  > of them in Wikidata?
> >  >
> >  > -Osma
> >  >
> >
> > --
> > Osma Suominen
> > D.Sc. (Tech), Information Systems Specialist
> > National Library of Finland
> > P.O. Box 26 (Kaikukatu 4)
> > 00014 HELSINGIN YLIOPISTO
> > Tel. +358 50 3199529 <+358%2050%203199529> 
> > osma.suomi...@helsinki.fi 
> > http://www.nationallibrary.fi
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org 
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/lis

Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Osma Suominen

Hi Magnus!

It's even higher now - 45%. Thanks a lot! This helps a lot with the 
verifying.


Also matching of names with parenthetical qualifiers works better now. I 
see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However, 
"Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to 
Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking 
workshop, not a specific place). Neither Wikidata entity has a type 
statement, the latter has "subclass-of " statement.


In any case, I think this is now good enough for serious work, so we 
will start verifying the suggested matches. 2.5% (173) already done...


-Osma


Magnus Manske kirjoitti 19.06.2017 klo 12:02:

I fiddled with it a bit, now 35% automatched.

Will try some more, but there are some sanity constraints on the 
matching. If it finds more than one match for the name, it does not set 
any match, because random matches on the same name were annoying in the 
past. There is also a type constraint, which might skip some Wikidata 
items without appropriate instance/subclass.


On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen > wrote:


Hi Magnus, all,

I've been looking a bit closer at the YSO places catalog [1] in
Mix'n'match and I'm wondering why only 20% of the places were
automatically matched.

For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
automatically matched to Nepal (Q837).

But:

Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
(Q3761).

Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
(Q1823).

Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
Akkunusjoki (Q12253027).

There are many more cases like this. So the precision of the automatic
matching seems good (all but one were correct so far), but the recall is
rather low, and even in cases where the label is identical a match has
not been suggested. Is there anything that could be done about this?


Somewhat related to this, it seems that none of the places with
parenthetical qualifiers in their names were matched. For example "Ahjo
(Kerava)" could have been matched to Q11849902 (which has a Finnish
label that is identical) and "Ala-Malmi (Helsinki)" could have been
matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
include parenthetical qualifiers - to make them unique despite different
places having identical names - this means that a lot of potential
matches are missing. Could something be done to improve the situation?


If Mix'n'match is incapable of automatically matching cases like this,
would it help if I did an automatic matching externally using some other
tool, and then gave the potential matches as e.g. a CSV file that could
then be imported into Mix'n'match so that they can be verified there?

-Osma

[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473


Osma Suominen kirjoitti 17.06.2017 klo 13:13:
 > Hi Magnus,
 >
 > Thanks a lot, that was fast! And the results look very good!
 >
 > I confirmed a couple dozen automated mapping and fixed an
incorrect one
 > ("Amerikka" was matched to USA, but I changed it to "Americas").
Then I
 > started hitting rate limit errors. I guess it would be possible
to avoid
 > those with some extra permissions?
 >
 > About 20% of the places were automatically matched. Probably most
of the
 > remaining ones - around 5000 - do not exist in Wikidata because
they are
 > e.g. towns and villages in Finland. Would it be fair game to
create all
 > of them in Wikidata?
 >
 > -Osma
 >

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529 
osma.suomi...@helsinki.fi 
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Magnus Manske
I fiddled with it a bit, now 35% automatched.

Will try some more, but there are some sanity constraints on the matching.
If it finds more than one match for the name, it does not set any match,
because random matches on the same name were annoying in the past. There is
also a type constraint, which might skip some Wikidata items without
appropriate instance/subclass.

On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen 
wrote:

> Hi Magnus, all,
>
> I've been looking a bit closer at the YSO places catalog [1] in
> Mix'n'match and I'm wondering why only 20% of the places were
> automatically matched.
>
> For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
> automatically matched to Nepal (Q837).
>
> But:
>
> Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
> (Q3761).
>
> Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).
>
> Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
> Akkunusjoki (Q12253027).
>
> There are many more cases like this. So the precision of the automatic
> matching seems good (all but one were correct so far), but the recall is
> rather low, and even in cases where the label is identical a match has
> not been suggested. Is there anything that could be done about this?
>
>
> Somewhat related to this, it seems that none of the places with
> parenthetical qualifiers in their names were matched. For example "Ahjo
> (Kerava)" could have been matched to Q11849902 (which has a Finnish
> label that is identical) and "Ala-Malmi (Helsinki)" could have been
> matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
> include parenthetical qualifiers - to make them unique despite different
> places having identical names - this means that a lot of potential
> matches are missing. Could something be done to improve the situation?
>
>
> If Mix'n'match is incapable of automatically matching cases like this,
> would it help if I did an automatic matching externally using some other
> tool, and then gave the potential matches as e.g. a CSV file that could
> then be imported into Mix'n'match so that they can be verified there?
>
> -Osma
>
> [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
>
>
> Osma Suominen kirjoitti 17.06.2017 klo 13:13:
> > Hi Magnus,
> >
> > Thanks a lot, that was fast! And the results look very good!
> >
> > I confirmed a couple dozen automated mapping and fixed an incorrect one
> > ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I
> > started hitting rate limit errors. I guess it would be possible to avoid
> > those with some extra permissions?
> >
> > About 20% of the places were automatically matched. Probably most of the
> > remaining ones - around 5000 - do not exist in Wikidata because they are
> > e.g. towns and villages in Finland. Would it be fair game to create all
> > of them in Wikidata?
> >
> > -Osma
> >
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <+358%2050%203199529>
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-19 Thread Osma Suominen

Hi Magnus, all,

I've been looking a bit closer at the YSO places catalog [1] in 
Mix'n'match and I'm wondering why only 20% of the places were 
automatically matched.


For example, Nepal (http://www.yso.fi/onto/yso/p107682) was 
automatically matched to Nepal (Q837).


But:

Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761).

Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).

Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to 
Akkunusjoki (Q12253027).


There are many more cases like this. So the precision of the automatic 
matching seems good (all but one were correct so far), but the recall is 
rather low, and even in cases where the label is identical a match has 
not been suggested. Is there anything that could be done about this?



Somewhat related to this, it seems that none of the places with 
parenthetical qualifiers in their names were matched. For example "Ahjo 
(Kerava)" could have been matched to Q11849902 (which has a Finnish 
label that is identical) and "Ala-Malmi (Helsinki)" could have been 
matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names 
include parenthetical qualifiers - to make them unique despite different 
places having identical names - this means that a lot of potential 
matches are missing. Could something be done to improve the situation?



If Mix'n'match is incapable of automatically matching cases like this, 
would it help if I did an automatic matching externally using some other 
tool, and then gave the potential matches as e.g. a CSV file that could 
then be imported into Mix'n'match so that they can be verified there?


-Osma

[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473


Osma Suominen kirjoitti 17.06.2017 klo 13:13:

Hi Magnus,

Thanks a lot, that was fast! And the results look very good!

I confirmed a couple dozen automated mapping and fixed an incorrect one 
("Amerikka" was matched to USA, but I changed it to "Americas"). Then I 
started hitting rate limit errors. I guess it would be possible to avoid 
those with some extra permissions?


About 20% of the places were automatically matched. Probably most of the 
remaining ones - around 5000 - do not exist in Wikidata because they are 
e.g. towns and villages in Finland. Would it be fair game to create all 
of them in Wikidata?


-Osma



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-17 Thread Osma Suominen

Hi Magnus,

Thanks a lot, that was fast! And the results look very good!

I confirmed a couple dozen automated mapping and fixed an incorrect one 
("Amerikka" was matched to USA, but I changed it to "Americas"). Then I 
started hitting rate limit errors. I guess it would be possible to avoid 
those with some extra permissions?


About 20% of the places were automatically matched. Probably most of the 
remaining ones - around 5000 - do not exist in Wikidata because they are 
e.g. towns and villages in Finland. Would it be fair game to create all 
of them in Wikidata?


-Osma

Magnus Manske kirjoitti 16.06.2017 klo 20:07:

Now at https://tools.wmflabs.org/mix-n-match/#/catalog/473

Location data as well, example:
https://tools.wmflabs.org/mix-n-match/#/entry/22733305


On Fri, Jun 16, 2017 at 2:40 PM Osma Suominen > wrote:


Hi Magnus!

That's excellent news! Thanks a lot!

I'm currently preparing a CSV dump of YSO places. Most of the entries
have coordinates. I will send it to you soon for inclusion as a catalog
in Mix'n'match.

-Osma

Magnus Manske kirjoitti 16.06.2017 klo 00:00:
 > Just to update everyone in this thread, I have added location support
 > for Mix'n'match. This will show on entries with a location, e.g.:
 >
 > https://tools.wmflabs.org/mix-n-match/#/entry/1655814
 >
 > All Mix'n'match locations (just short of half a million at the
moment)
 > can be seen as a layer in WikiShootMe, e.g.:
 >
 > https://goo.gl/kqfjoj
 >
 > Cheers,
 > Magnus
 >
 > On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim
mailto:j.neub...@zbw.eu>
 > >> wrote:
 >
 > Hi Osma,
 >
 > sorry for jumping in late. I've been at ELAG last week, talking
 > about a very similar topic (Wikidata as authority linking hub,
 > https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing
 > mapping between RePEc author IDs and GND IDs into Wikidata (and
 > furtheron extending it there). In that course, we had to match as
 > many persons as possible on the GND as well as on the RePEc side
 > (via Mix'n'match), before creating new items. The code used for
 > preparing the (quickstatements2) insert statements is linked from
 > the slides.
 >
 > Additionally, I've added ~12,000 GND IDs to Wikidata via their
 > existing VIAF identifiers (derived from a federated query on a
 > custom VIAF endpoint and the public WD endpoint -
 >

https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq).
 > This sounds very similar to your use case; also another query
which
 > can derive future STW ID properties from the existing STW-GND
 > mapping
 >   
  (https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq

 > - currently hits a timeout at the WD subquery, but worked
before). I
 > would be happy if that could be helpful.
 >
 > The plan to divide the m'n'm catalogs (places vs. subjects) makes
 > sense for me, we plan the same for STW. I'm not sure, if a
 > restriction to locations (Q17334923, or something more specific)
 > will match also all subclasses, but Magnus could perhaps take
care
 > of that when you send him the files.
 >
 > Cheers, Joachim
 >
 >  > -Ursprüngliche Nachricht-
 >  > Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org

 > >] Im Auftrag von
 >  > Osma Suominen
 >  > Gesendet: Dienstag, 6. Juni 2017 12:19
 >  > An: Discussion list for the Wikidata project.
 >  > Betreff: [Wikidata] Mix'n'Match with existing (indirect)
mappings
 >  >
 >  > Hi Wikidatans,
 >  >
 >  > After several delays we are finally starting to think
seriously
 > about mapping the
 >  > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID"
 >  > property (https://www.wikidata.org/wiki/Property:P2347)
was added to
 >  > Wikidata some time ago, but it has been used only a few
times so far.
 >  >
 >  > Recently some 6000 places have been added to "YSO Places"
[2], a new
 >  > extension of YSO, which was generated from place names in
YSA and
 > Allärs,
 >  > our earlier subject indexing vocabularies. It would
probably make
 > sense to map
 >  > these places to Wikidata, in addition to the general
concepts in
 > YSO. We have
 >  > already manually added a few links from YSA/YSO places to
 > Wikidata for newl

Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-16 Thread Magnus Manske
Now at https://tools.wmflabs.org/mix-n-match/#/catalog/473

Location data as well, example:
https://tools.wmflabs.org/mix-n-match/#/entry/22733305


On Fri, Jun 16, 2017 at 2:40 PM Osma Suominen 
wrote:

> Hi Magnus!
>
> That's excellent news! Thanks a lot!
>
> I'm currently preparing a CSV dump of YSO places. Most of the entries
> have coordinates. I will send it to you soon for inclusion as a catalog
> in Mix'n'match.
>
> -Osma
>
> Magnus Manske kirjoitti 16.06.2017 klo 00:00:
> > Just to update everyone in this thread, I have added location support
> > for Mix'n'match. This will show on entries with a location, e.g.:
> >
> > https://tools.wmflabs.org/mix-n-match/#/entry/1655814
> >
> > All Mix'n'match locations (just short of half a million at the moment)
> > can be seen as a layer in WikiShootMe, e.g.:
> >
> > https://goo.gl/kqfjoj
> >
> > Cheers,
> > Magnus
> >
> > On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim  > > wrote:
> >
> > Hi Osma,
> >
> > sorry for jumping in late. I've been at ELAG last week, talking
> > about a very similar topic (Wikidata as authority linking hub,
> > https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing
> > mapping between RePEc author IDs and GND IDs into Wikidata (and
> > furtheron extending it there). In that course, we had to match as
> > many persons as possible on the GND as well as on the RePEc side
> > (via Mix'n'match), before creating new items. The code used for
> > preparing the (quickstatements2) insert statements is linked from
> > the slides.
> >
> > Additionally, I've added ~12,000 GND IDs to Wikidata via their
> > existing VIAF identifiers (derived from a federated query on a
> > custom VIAF endpoint and the public WD endpoint -
> >
> https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq
> ).
> > This sounds very similar to your use case; also another query which
> > can derive future STW ID properties from the existing STW-GND
> > mapping
> > (
> https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq
> > - currently hits a timeout at the WD subquery, but worked before). I
> > would be happy if that could be helpful.
> >
> > The plan to divide the m'n'm catalogs (places vs. subjects) makes
> > sense for me, we plan the same for STW. I'm not sure, if a
> > restriction to locations (Q17334923, or something more specific)
> > will match also all subclasses, but Magnus could perhaps take care
> > of that when you send him the files.
> >
> > Cheers, Joachim
> >
> >  > -Ursprüngliche Nachricht-
> >  > Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org
> > ] Im Auftrag von
> >  > Osma Suominen
> >  > Gesendet: Dienstag, 6. Juni 2017 12:19
> >  > An: Discussion list for the Wikidata project.
> >  > Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings
> >  >
> >  > Hi Wikidatans,
> >  >
> >  > After several delays we are finally starting to think seriously
> > about mapping the
> >  > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID"
> >  > property (https://www.wikidata.org/wiki/Property:P2347) was
> added to
> >  > Wikidata some time ago, but it has been used only a few times so
> far.
> >  >
> >  > Recently some 6000 places have been added to "YSO Places" [2], a
> new
> >  > extension of YSO, which was generated from place names in YSA and
> > Allärs,
> >  > our earlier subject indexing vocabularies. It would probably make
> > sense to map
> >  > these places to Wikidata, in addition to the general concepts in
> > YSO. We have
> >  > already manually added a few links from YSA/YSO places to
> > Wikidata for newly
> >  > added places, but this approach does not scale if we want to link
> > the thousands
> >  > of existing places.
> >  >
> >  > We also have some indirect sources of YSO/Wikidata mappings:
> >  >
> >  > 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244,
> > LC/NACO
> >  > Authority File ID). I digged a bit into both sets of mappings and
> > found that
> >  > approximately 1200 YSO-Wikidata links could be generated from the
> >  > intersection of these mappings.
> >  >
> >  > 2. The Finnish broadcasting company Yle has also created some
> > mappings
> >  > between KOKO (which includes YSO) and Wikidata. Last time I
> > looked at those,
> >  > we could generate at least 5000 YSO-Wikidata links from them.
> >  > Probably more nowadays.
> >  >
> >  >
> >  > Of course, indirect mappings are a bit dangerous. It's possible
> > that there are
> >  > some differences in meaning, especially with LCSH which has a
> > very different
> >  > structure (and cultural

Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-16 Thread Osma Suominen

Hi Magnus!

That's excellent news! Thanks a lot!

I'm currently preparing a CSV dump of YSO places. Most of the entries 
have coordinates. I will send it to you soon for inclusion as a catalog 
in Mix'n'match.


-Osma

Magnus Manske kirjoitti 16.06.2017 klo 00:00:
Just to update everyone in this thread, I have added location support 
for Mix'n'match. This will show on entries with a location, e.g.:


https://tools.wmflabs.org/mix-n-match/#/entry/1655814

All Mix'n'match locations (just short of half a million at the moment) 
can be seen as a layer in WikiShootMe, e.g.:


https://goo.gl/kqfjoj

Cheers,
Magnus

On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim > wrote:


Hi Osma,

sorry for jumping in late. I've been at ELAG last week, talking
about a very similar topic (Wikidata as authority linking hub,
https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing
mapping between RePEc author IDs and GND IDs into Wikidata (and
furtheron extending it there). In that course, we had to match as
many persons as possible on the GND as well as on the RePEc side
(via Mix'n'match), before creating new items. The code used for
preparing the (quickstatements2) insert statements is linked from
the slides.

Additionally, I've added ~12,000 GND IDs to Wikidata via their
existing VIAF identifiers (derived from a federated query on a
custom VIAF endpoint and the public WD endpoint -

https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq).
This sounds very similar to your use case; also another query which
can derive future STW ID properties from the existing STW-GND
mapping

(https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq
- currently hits a timeout at the WD subquery, but worked before). I
would be happy if that could be helpful.

The plan to divide the m'n'm catalogs (places vs. subjects) makes
sense for me, we plan the same for STW. I'm not sure, if a
restriction to locations (Q17334923, or something more specific)
will match also all subclasses, but Magnus could perhaps take care
of that when you send him the files.

Cheers, Joachim

 > -Ursprüngliche Nachricht-
 > Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org
] Im Auftrag von
 > Osma Suominen
 > Gesendet: Dienstag, 6. Juni 2017 12:19
 > An: Discussion list for the Wikidata project.
 > Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings
 >
 > Hi Wikidatans,
 >
 > After several delays we are finally starting to think seriously
about mapping the
 > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID"
 > property (https://www.wikidata.org/wiki/Property:P2347) was added to
 > Wikidata some time ago, but it has been used only a few times so far.
 >
 > Recently some 6000 places have been added to "YSO Places" [2], a new
 > extension of YSO, which was generated from place names in YSA and
Allärs,
 > our earlier subject indexing vocabularies. It would probably make
sense to map
 > these places to Wikidata, in addition to the general concepts in
YSO. We have
 > already manually added a few links from YSA/YSO places to
Wikidata for newly
 > added places, but this approach does not scale if we want to link
the thousands
 > of existing places.
 >
 > We also have some indirect sources of YSO/Wikidata mappings:
 >
 > 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244,
LC/NACO
 > Authority File ID). I digged a bit into both sets of mappings and
found that
 > approximately 1200 YSO-Wikidata links could be generated from the
 > intersection of these mappings.
 >
 > 2. The Finnish broadcasting company Yle has also created some
mappings
 > between KOKO (which includes YSO) and Wikidata. Last time I
looked at those,
 > we could generate at least 5000 YSO-Wikidata links from them.
 > Probably more nowadays.
 >
 >
 > Of course, indirect mappings are a bit dangerous. It's possible
that there are
 > some differences in meaning, especially with LCSH which has a
very different
 > structure (and cultural context) than YSO. Nevertheless I think
these could be a
 > good starting point, especially if a tool such as Mix'n'Match
could be used to
 > verify them.
 >
 > Now my question is, given that we already have or could easily
generate
 > thousands of Wikidata-YSO mappings, but the rest would still have
to be semi-
 > automatically linked using Mix'n'Match, what would be a good way to
 > approach this? Does Mix'n'Match look at existing statements (in
this case YSO
 > ID / P2347) in Wikidata when you load a new catalog, or ignore them?
 >
   

Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-15 Thread Magnus Manske
Just to update everyone in this thread, I have added location support for
Mix'n'match. This will show on entries with a location, e.g.:

https://tools.wmflabs.org/mix-n-match/#/entry/1655814

All Mix'n'match locations (just short of half a million at the moment) can
be seen as a layer in WikiShootMe, e.g.:

https://goo.gl/kqfjoj

Cheers,
Magnus

On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim  wrote:

> Hi Osma,
>
> sorry for jumping in late. I've been at ELAG last week, talking about a
> very similar topic (Wikidata as authority linking hub,
> https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing
> mapping between RePEc author IDs and GND IDs into Wikidata (and furtheron
> extending it there). In that course, we had to match as many persons as
> possible on the GND as well as on the RePEc side (via Mix'n'match), before
> creating new items. The code used for preparing the (quickstatements2)
> insert statements is linked from the slides.
>
> Additionally, I've added ~12,000 GND IDs to Wikidata via their existing
> VIAF identifiers (derived from a federated query on a custom VIAF endpoint
> and the public WD endpoint -
> https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq).
> This sounds very similar to your use case; also another query which can
> derive future STW ID properties from the existing STW-GND mapping (
> https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq
> - currently hits a timeout at the WD subquery, but worked before). I would
> be happy if that could be helpful.
>
> The plan to divide the m'n'm catalogs (places vs. subjects) makes sense
> for me, we plan the same for STW. I'm not sure, if a restriction to
> locations (Q17334923, or something more specific) will match also all
> subclasses, but Magnus could perhaps take care of that when you send him
> the files.
>
> Cheers, Joachim
>
> > -Ursprüngliche Nachricht-
> > Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] Im Auftrag
> von
> > Osma Suominen
> > Gesendet: Dienstag, 6. Juni 2017 12:19
> > An: Discussion list for the Wikidata project.
> > Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings
> >
> > Hi Wikidatans,
> >
> > After several delays we are finally starting to think seriously about
> mapping the
> > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID"
> > property (https://www.wikidata.org/wiki/Property:P2347) was added to
> > Wikidata some time ago, but it has been used only a few times so far.
> >
> > Recently some 6000 places have been added to "YSO Places" [2], a new
> > extension of YSO, which was generated from place names in YSA and Allärs,
> > our earlier subject indexing vocabularies. It would probably make sense
> to map
> > these places to Wikidata, in addition to the general concepts in YSO. We
> have
> > already manually added a few links from YSA/YSO places to Wikidata for
> newly
> > added places, but this approach does not scale if we want to link the
> thousands
> > of existing places.
> >
> > We also have some indirect sources of YSO/Wikidata mappings:
> >
> > 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO
> > Authority File ID). I digged a bit into both sets of mappings and found
> that
> > approximately 1200 YSO-Wikidata links could be generated from the
> > intersection of these mappings.
> >
> > 2. The Finnish broadcasting company Yle has also created some mappings
> > between KOKO (which includes YSO) and Wikidata. Last time I looked at
> those,
> > we could generate at least 5000 YSO-Wikidata links from them.
> > Probably more nowadays.
> >
> >
> > Of course, indirect mappings are a bit dangerous. It's possible that
> there are
> > some differences in meaning, especially with LCSH which has a very
> different
> > structure (and cultural context) than YSO. Nevertheless I think these
> could be a
> > good starting point, especially if a tool such as Mix'n'Match could be
> used to
> > verify them.
> >
> > Now my question is, given that we already have or could easily generate
> > thousands of Wikidata-YSO mappings, but the rest would still have to be
> semi-
> > automatically linked using Mix'n'Match, what would be a good way to
> > approach this? Does Mix'n'Match look at existing statements (in this
> case YSO
> > ID / P2347) in Wikidata when you load a new catalog, or ignore them?
> >
> > I can think of at least these approaches:
> >
> > 1. First import the indirect mappings we already have to Wikidata as
> > P2347 statements, then create a Mix'n'Match catalog with the remaining
> YSO
> > concepts. The indirect mappings would have to be verified separately.
> >
> > 2. First import the indirect mappings we already have to Wikidata as
> > P2347 statements, then create a Mix'n'Match catalog with ALL the YSO
> > concepts, including the ones for which we already have imported a
> mapping.
> > Use Mix'n'Match to verify the indirect mappings.
> >
> > 3. Forget abo

Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-13 Thread Neubert, Joachim
Hi Osma,

sorry for jumping in late. I've been at ELAG last week, talking about a very 
similar topic (Wikidata as authority linking hub, 
https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing mapping 
between RePEc author IDs and GND IDs into Wikidata (and furtheron extending it 
there). In that course, we had to match as many persons as possible on the GND 
as well as on the RePEc side (via Mix'n'match), before creating new items. The 
code used for preparing the (quickstatements2) insert statements is linked from 
the slides. 

Additionally, I've added ~12,000 GND IDs to Wikidata via their existing VIAF 
identifiers (derived from a federated query on a custom VIAF endpoint and the 
public WD endpoint - 
https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq).
 This sounds very similar to your use case; also another query which can derive 
future STW ID properties from the existing STW-GND mapping 
(https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq
 - currently hits a timeout at the WD subquery, but worked before). I would be 
happy if that could be helpful.

The plan to divide the m'n'm catalogs (places vs. subjects) makes sense for me, 
we plan the same for STW. I'm not sure, if a restriction to locations 
(Q17334923, or something more specific) will match also all subclasses, but 
Magnus could perhaps take care of that when you send him the files.

Cheers, Joachim

> -Ursprüngliche Nachricht-
> Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] Im Auftrag von
> Osma Suominen
> Gesendet: Dienstag, 6. Juni 2017 12:19
> An: Discussion list for the Wikidata project.
> Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings
> 
> Hi Wikidatans,
> 
> After several delays we are finally starting to think seriously about mapping 
> the
> General Finnish Ontology YSO [1] to Wikidata. A "YSO ID"
> property (https://www.wikidata.org/wiki/Property:P2347) was added to
> Wikidata some time ago, but it has been used only a few times so far.
> 
> Recently some 6000 places have been added to "YSO Places" [2], a new
> extension of YSO, which was generated from place names in YSA and Allärs,
> our earlier subject indexing vocabularies. It would probably make sense to map
> these places to Wikidata, in addition to the general concepts in YSO. We have
> already manually added a few links from YSA/YSO places to Wikidata for newly
> added places, but this approach does not scale if we want to link the 
> thousands
> of existing places.
> 
> We also have some indirect sources of YSO/Wikidata mappings:
> 
> 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO
> Authority File ID). I digged a bit into both sets of mappings and found that
> approximately 1200 YSO-Wikidata links could be generated from the
> intersection of these mappings.
> 
> 2. The Finnish broadcasting company Yle has also created some mappings
> between KOKO (which includes YSO) and Wikidata. Last time I looked at those,
> we could generate at least 5000 YSO-Wikidata links from them.
> Probably more nowadays.
> 
> 
> Of course, indirect mappings are a bit dangerous. It's possible that there are
> some differences in meaning, especially with LCSH which has a very different
> structure (and cultural context) than YSO. Nevertheless I think these could 
> be a
> good starting point, especially if a tool such as Mix'n'Match could be used to
> verify them.
> 
> Now my question is, given that we already have or could easily generate
> thousands of Wikidata-YSO mappings, but the rest would still have to be semi-
> automatically linked using Mix'n'Match, what would be a good way to
> approach this? Does Mix'n'Match look at existing statements (in this case YSO
> ID / P2347) in Wikidata when you load a new catalog, or ignore them?
> 
> I can think of at least these approaches:
> 
> 1. First import the indirect mappings we already have to Wikidata as
> P2347 statements, then create a Mix'n'Match catalog with the remaining YSO
> concepts. The indirect mappings would have to be verified separately.
> 
> 2. First import the indirect mappings we already have to Wikidata as
> P2347 statements, then create a Mix'n'Match catalog with ALL the YSO
> concepts, including the ones for which we already have imported a mapping.
> Use Mix'n'Match to verify the indirect mappings.
> 
> 3. Forget about the existing mappings and just create a Mix'n'Match catalog
> with all the YSO concepts.
> 
> Any advice?
> 
> Thanks,
> 
> -Osma
> 
> [1] http://finto.fi/yso/
> 
> [2] http://finto.fi/yso-paikat/
> 
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. 
> Box
> 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wiki

Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-07 Thread Osma Suominen

07.06.2017, 14:10, Susanna Ånäs kirjoitti:

We will also need a coordinate transformation since all official Finnish
coordinates are in EPSG:3067. Before or in MixnMatch.


The (experimental) Linked Data service of NLS already provides WGS84 
coordinates in addition to the official ones, so this should be easy.


-Osma


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-07 Thread Magnus Manske
I won't be getting into coordinate cleanup ;-)

Coordinates would have to be compatible with
https://www.wikidata.org/wiki/Property:P625

On Wed, Jun 7, 2017 at 12:10 PM Susanna Ånäs  wrote:

> We will also need a coordinate transformation since all official Finnish
> coordinates are in EPSG:3067. Before or in MixnMatch.
>
> Susanna
>
> 2017-06-07 14:03 GMT+03:00 Osma Suominen :
>
>> 07.06.2017, 13:10, Magnus Manske kirjoitti:
>>
>>> Does that imply coordinates in Mix'n'match? Because there is no support
>>> for that yet, though I could add it. Do you have an example catalog
>>> (existing or to-be-created)?
>>>
>>
>> For YSO places, it would be possible to create a Mix'n'Match catalog
>> where the majority of places have coordinates. YSO places doesn't itself
>> contain coordinates, but the Finnish places within it have been mapped to
>> the Place Name Registry (Paikannimirekisteri) maintained by National Land
>> Survey of Finland (Maanmittauslaitos), which includes point coordinates for
>> all places. So it would be possible to pick the coordinates from there for
>> the 4400 or so places that have been mapped, if that helps with the linking
>> in Mix'n'Match.
>>
>>
>> -Osma
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suomi...@helsinki.fi
>> http://www.nationallibrary.fi
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-07 Thread Susanna Ånäs
We will also need a coordinate transformation since all official Finnish
coordinates are in EPSG:3067. Before or in MixnMatch.

Susanna

2017-06-07 14:03 GMT+03:00 Osma Suominen :

> 07.06.2017, 13:10, Magnus Manske kirjoitti:
>
>> Does that imply coordinates in Mix'n'match? Because there is no support
>> for that yet, though I could add it. Do you have an example catalog
>> (existing or to-be-created)?
>>
>
> For YSO places, it would be possible to create a Mix'n'Match catalog where
> the majority of places have coordinates. YSO places doesn't itself contain
> coordinates, but the Finnish places within it have been mapped to the Place
> Name Registry (Paikannimirekisteri) maintained by National Land Survey of
> Finland (Maanmittauslaitos), which includes point coordinates for all
> places. So it would be possible to pick the coordinates from there for the
> 4400 or so places that have been mapped, if that helps with the linking in
> Mix'n'Match.
>
>
> -Osma
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-07 Thread Osma Suominen

07.06.2017, 13:10, Magnus Manske kirjoitti:

Does that imply coordinates in Mix'n'match? Because there is no support
for that yet, though I could add it. Do you have an example catalog
(existing or to-be-created)?


For YSO places, it would be possible to create a Mix'n'Match catalog 
where the majority of places have coordinates. YSO places doesn't itself 
contain coordinates, but the Finnish places within it have been mapped 
to the Place Name Registry (Paikannimirekisteri) maintained by National 
Land Survey of Finland (Maanmittauslaitos), which includes point 
coordinates for all places. So it would be possible to pick the 
coordinates from there for the 4400 or so places that have been mapped, 
if that helps with the linking in Mix'n'Match.


-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-07 Thread Magnus Manske
Does that imply coordinates in Mix'n'match? Because there is no support for
that yet, though I could add it. Do you have an example catalog (existing
or to-be-created)?

On Tue, Jun 6, 2017 at 6:30 PM Susanna Ånäs  wrote:

> I thought of something like this:
> https://drive.google.com/file/d/0BxuJSZymOK8-R1Q0SXpmVGk3dkE/view
>
> Susanna
>
> 2017-06-06 19:21 GMT+03:00 Alex Stinson :
>
>> @Sandra: are you suggesting another layer on top of something like
>> https://tools.wmflabs.org/wikishootme/ ?
>>
>> Cheers,
>>
>> Alex
>>
>> On Tue, Jun 6, 2017 at 10:22 AM, Susanna Ånäs 
>> wrote:
>>
>>> Would anyone be interested in creating a map interface for matching
>>> places in Mix'n'Match?
>>>
>>> Just a thought...
>>>
>>> Susanna
>>>
>>> 2017-06-06 17:17 GMT+03:00 Osma Suominen :
>>>
 Magnus Manske kirjoitti 06.06.2017 klo 17:06:

> By the way, we also have multilingual labels that could perhaps
> improve
> the automatic matching. YSO generally has fi/sv/en, YSO places has
> fi/sv. Can you make use of these too if I provided them in
> additional
> columns?
>
> Sorry, mix'n'match only does single language labels.
>

 Ok, then I have to think which language to pick for Mix'n'Match use.
 For YSO, Finnish and Swedish labels are generally the best quality, but
 probably wouldn't produce as many automated hits as the English ones. Also
 it depends on who is going to do the manual matching.

 Any advice on this?

 It does redirect like this already. See e.g.
> http://www.yso.fi/onto/yso/p138653
>
> Great! So you could bunch the "old" ones and the new places into one
> list?
>

 In principle yes, but in practice, I think it would make sense to use
 two lists, because the places are quite different from the general
 concepts. Also the matching could be more focused for the places - don't
 try to match with any Wikidata entity that is not a place.


 -Osma

 --
 Osma Suominen
 D.Sc. (Tech), Information Systems Specialist
 National Library of Finland
 P.O. Box 26 (Kaikukatu 4)
 00014 HELSINGIN YLIOPISTO
 Tel. +358 50 3199529
 osma.suomi...@helsinki.fi
 http://www.nationallibrary.fi

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>>
>> --
>> Alex Stinson
>> GLAM-Wiki Strategist
>> Wikimedia Foundation
>> Twitter:@glamwiki/@sadads
>>
>> Learn more about how the communities behind Wikipedia, Wikidata and other
>> Wikimedia projects partner with cultural heritage organizations:
>> http://glamwiki.org
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Susanna Ånäs
I thought of something like this:
https://drive.google.com/file/d/0BxuJSZymOK8-R1Q0SXpmVGk3dkE/view

Susanna

2017-06-06 19:21 GMT+03:00 Alex Stinson :

> @Sandra: are you suggesting another layer on top of something like
> https://tools.wmflabs.org/wikishootme/ ?
>
> Cheers,
>
> Alex
>
> On Tue, Jun 6, 2017 at 10:22 AM, Susanna Ånäs 
> wrote:
>
>> Would anyone be interested in creating a map interface for matching
>> places in Mix'n'Match?
>>
>> Just a thought...
>>
>> Susanna
>>
>> 2017-06-06 17:17 GMT+03:00 Osma Suominen :
>>
>>> Magnus Manske kirjoitti 06.06.2017 klo 17:06:
>>>
 By the way, we also have multilingual labels that could perhaps
 improve
 the automatic matching. YSO generally has fi/sv/en, YSO places has
 fi/sv. Can you make use of these too if I provided them in
 additional
 columns?

 Sorry, mix'n'match only does single language labels.

>>>
>>> Ok, then I have to think which language to pick for Mix'n'Match use. For
>>> YSO, Finnish and Swedish labels are generally the best quality, but
>>> probably wouldn't produce as many automated hits as the English ones. Also
>>> it depends on who is going to do the manual matching.
>>>
>>> Any advice on this?
>>>
>>> It does redirect like this already. See e.g.
 http://www.yso.fi/onto/yso/p138653

 Great! So you could bunch the "old" ones and the new places into one
 list?

>>>
>>> In principle yes, but in practice, I think it would make sense to use
>>> two lists, because the places are quite different from the general
>>> concepts. Also the matching could be more focused for the places - don't
>>> try to match with any Wikidata entity that is not a place.
>>>
>>>
>>> -Osma
>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suomi...@helsinki.fi
>>> http://www.nationallibrary.fi
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
> Alex Stinson
> GLAM-Wiki Strategist
> Wikimedia Foundation
> Twitter:@glamwiki/@sadads
>
> Learn more about how the communities behind Wikipedia, Wikidata and other
> Wikimedia projects partner with cultural heritage organizations:
> http://glamwiki.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Alex Stinson
@Sandra: are you suggesting another layer on top of something like
https://tools.wmflabs.org/wikishootme/ ?

Cheers,

Alex

On Tue, Jun 6, 2017 at 10:22 AM, Susanna Ånäs 
wrote:

> Would anyone be interested in creating a map interface for matching places
> in Mix'n'Match?
>
> Just a thought...
>
> Susanna
>
> 2017-06-06 17:17 GMT+03:00 Osma Suominen :
>
>> Magnus Manske kirjoitti 06.06.2017 klo 17:06:
>>
>>> By the way, we also have multilingual labels that could perhaps
>>> improve
>>> the automatic matching. YSO generally has fi/sv/en, YSO places has
>>> fi/sv. Can you make use of these too if I provided them in additional
>>> columns?
>>>
>>> Sorry, mix'n'match only does single language labels.
>>>
>>
>> Ok, then I have to think which language to pick for Mix'n'Match use. For
>> YSO, Finnish and Swedish labels are generally the best quality, but
>> probably wouldn't produce as many automated hits as the English ones. Also
>> it depends on who is going to do the manual matching.
>>
>> Any advice on this?
>>
>> It does redirect like this already. See e.g.
>>> http://www.yso.fi/onto/yso/p138653
>>>
>>> Great! So you could bunch the "old" ones and the new places into one
>>> list?
>>>
>>
>> In principle yes, but in practice, I think it would make sense to use two
>> lists, because the places are quite different from the general concepts.
>> Also the matching could be more focused for the places - don't try to match
>> with any Wikidata entity that is not a place.
>>
>>
>> -Osma
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suomi...@helsinki.fi
>> http://www.nationallibrary.fi
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Alex Stinson
GLAM-Wiki Strategist
Wikimedia Foundation
Twitter:@glamwiki/@sadads

Learn more about how the communities behind Wikipedia, Wikidata and other
Wikimedia projects partner with cultural heritage organizations:
http://glamwiki.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Susanna Ånäs
Would anyone be interested in creating a map interface for matching places
in Mix'n'Match?

Just a thought...

Susanna

2017-06-06 17:17 GMT+03:00 Osma Suominen :

> Magnus Manske kirjoitti 06.06.2017 klo 17:06:
>
>> By the way, we also have multilingual labels that could perhaps
>> improve
>> the automatic matching. YSO generally has fi/sv/en, YSO places has
>> fi/sv. Can you make use of these too if I provided them in additional
>> columns?
>>
>> Sorry, mix'n'match only does single language labels.
>>
>
> Ok, then I have to think which language to pick for Mix'n'Match use. For
> YSO, Finnish and Swedish labels are generally the best quality, but
> probably wouldn't produce as many automated hits as the English ones. Also
> it depends on who is going to do the manual matching.
>
> Any advice on this?
>
> It does redirect like this already. See e.g.
>> http://www.yso.fi/onto/yso/p138653
>>
>> Great! So you could bunch the "old" ones and the new places into one list?
>>
>
> In principle yes, but in practice, I think it would make sense to use two
> lists, because the places are quite different from the general concepts.
> Also the matching could be more focused for the places - don't try to match
> with any Wikidata entity that is not a place.
>
>
> -Osma
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Osma Suominen

Magnus Manske kirjoitti 06.06.2017 klo 17:06:

By the way, we also have multilingual labels that could perhaps improve
the automatic matching. YSO generally has fi/sv/en, YSO places has
fi/sv. Can you make use of these too if I provided them in additional
columns?

Sorry, mix'n'match only does single language labels.


Ok, then I have to think which language to pick for Mix'n'Match use. For 
YSO, Finnish and Swedish labels are generally the best quality, but 
probably wouldn't produce as many automated hits as the English ones. 
Also it depends on who is going to do the manual matching.


Any advice on this?


It does redirect like this already. See e.g.
http://www.yso.fi/onto/yso/p138653

Great! So you could bunch the "old" ones and the new places into one list?


In principle yes, but in practice, I think it would make sense to use 
two lists, because the places are quite different from the general 
concepts. Also the matching could be more focused for the places - don't 
try to match with any Wikidata entity that is not a place.


-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Magnus Manske
On Tue, Jun 6, 2017 at 2:44 PM Osma Suominen 
wrote:

> Hi Magnus!
>
> Thanks for your quick response. Comments inline.
>
> Magnus Manske kirjoitti 06.06.2017 klo 15:57:
> > * If you want to "seed" Mix'n'match with third-party/indirect IDs
> > already in Wikidata, best to not create the catalog yourself, but mail
> > me the data instead
>
> Okay, great! What's the best format? The same as for creating catalogs,
> but with an additional Wikidata ID column with values from the existing
> mappings?
>
That would work fine.

>
> By the way, we also have multilingual labels that could perhaps improve
> the automatic matching. YSO generally has fi/sv/en, YSO places has
> fi/sv. Can you make use of these too if I provided them in additional
> columns?
>
Sorry, mix'n'match only does single language labels.

>
> > * If you want "YSO places" in Wikidata, we will need a new property for
> > that, unless the P2347 formatter URL would redirect automatically to
> > "/yso-paikat/"
>
> It does redirect like this already. See e.g.
> http://www.yso.fi/onto/yso/p138653
>
> Great! So you could bunch the "old" ones and the new places into one list?


> > * You can create a Mix'n'match catalog before there is a property, and
> > link them up later. The catalog will then synchronize
>
> I don't think we need an additional property, but good to know anyway.
>
> -Osma
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <+358%2050%203199529>
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Osma Suominen

Hi Magnus!

Thanks for your quick response. Comments inline.

Magnus Manske kirjoitti 06.06.2017 klo 15:57:
* If you want to "seed" Mix'n'match with third-party/indirect IDs 
already in Wikidata, best to not create the catalog yourself, but mail 
me the data instead


Okay, great! What's the best format? The same as for creating catalogs, 
but with an additional Wikidata ID column with values from the existing 
mappings?


By the way, we also have multilingual labels that could perhaps improve 
the automatic matching. YSO generally has fi/sv/en, YSO places has 
fi/sv. Can you make use of these too if I provided them in additional 
columns?


* If you want "YSO places" in Wikidata, we will need a new property for 
that, unless the P2347 formatter URL would redirect automatically to 
"/yso-paikat/"


It does redirect like this already. See e.g. 
http://www.yso.fi/onto/yso/p138653


* You can create a Mix'n'match catalog before there is a property, and 
link them up later. The catalog will then synchronize


I don't think we need an additional property, but good to know anyway.

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Mix'n'Match with existing (indirect) mappings

2017-06-06 Thread Magnus Manske
Hi Osma,

just a few remarks:

* If you want to "seed" Mix'n'match with third-party/indirect IDs already
in Wikidata, best to not create the catalog yourself, but mail me the data
instead

* If you want "YSO places" in Wikidata, we will need a new property for
that, unless the P2347 formatter URL would redirect automatically to
"/yso-paikat/"

* You can create a Mix'n'match catalog before there is a property, and link
them up later. The catalog will then synchronize

Cheers,
Magnus

On Tue, Jun 6, 2017 at 11:19 AM Osma Suominen 
wrote:

> Hi Wikidatans,
>
> After several delays we are finally starting to think seriously about
> mapping the General Finnish Ontology YSO [1] to Wikidata. A "YSO ID"
> property (https://www.wikidata.org/wiki/Property:P2347) was added to
> Wikidata some time ago, but it has been used only a few times so far.
>
> Recently some 6000 places have been added to "YSO Places" [2], a new
> extension of YSO, which was generated from place names in YSA and
> Allärs, our earlier subject indexing vocabularies. It would probably
> make sense to map these places to Wikidata, in addition to the general
> concepts in YSO. We have already manually added a few links from YSA/YSO
> places to Wikidata for newly added places, but this approach does not
> scale if we want to link the thousands of existing places.
>
> We also have some indirect sources of YSO/Wikidata mappings:
>
> 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO
> Authority File ID). I digged a bit into both sets of mappings and found
> that approximately 1200 YSO-Wikidata links could be generated from the
> intersection of these mappings.
>
> 2. The Finnish broadcasting company Yle has also created some mappings
> between KOKO (which includes YSO) and Wikidata. Last time I looked at
> those, we could generate at least 5000 YSO-Wikidata links from them.
> Probably more nowadays.
>
>
> Of course, indirect mappings are a bit dangerous. It's possible that
> there are some differences in meaning, especially with LCSH which has a
> very different structure (and cultural context) than YSO. Nevertheless I
> think these could be a good starting point, especially if a tool such as
> Mix'n'Match could be used to verify them.
>
> Now my question is, given that we already have or could easily generate
> thousands of Wikidata-YSO mappings, but the rest would still have to be
> semi-automatically linked using Mix'n'Match, what would be a good way to
> approach this? Does Mix'n'Match look at existing statements (in this
> case YSO ID / P2347) in Wikidata when you load a new catalog, or ignore
> them?
>
> I can think of at least these approaches:
>
> 1. First import the indirect mappings we already have to Wikidata as
> P2347 statements, then create a Mix'n'Match catalog with the remaining
> YSO concepts. The indirect mappings would have to be verified separately.
>
> 2. First import the indirect mappings we already have to Wikidata as
> P2347 statements, then create a Mix'n'Match catalog with ALL the YSO
> concepts, including the ones for which we already have imported a
> mapping. Use Mix'n'Match to verify the indirect mappings.
>
> 3. Forget about the existing mappings and just create a Mix'n'Match
> catalog with all the YSO concepts.
>
> Any advice?
>
> Thanks,
>
> -Osma
>
> [1] http://finto.fi/yso/
>
> [2] http://finto.fi/yso-paikat/
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <+358%2050%203199529>
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata