Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-27 Thread Antonin Delpeuch (lists)
I have just rounded up the bounty to $300. This is a dream feature, we
need it! :)

Antonin

On 27/01/2017 13:12, Sandra Fauconnier wrote:
> +1 from someone who would be so extremely happy (and much more
> productive) if such a service were implemented in OpenRefine.
> 
> I also added it as a task to Phabricator, feel free to comment, add
> suggestions… https://phabricator.wikimedia.org/T146740
> 
> Best, Sandra/User:Spinster
> 
>> On 26 Jan 2017, at 19:00, Thad Guidry > > wrote:
>>
>> Everyone,
>>
>> Yes, our OpenRefine API can use Multiple Query Mode (reconciling an
>> Entity by using multiple columns/ WD properties)
>>
>> https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode
>>
>> I do not think that Magnus has implemented our Multiple Query Mode
>> yet, however.
>> The bounty issue https://github.com/OpenRefine/OpenRefine/issues/805 
>> that I created and funded on BountySource.com
>>  is to fully implement the Mutliple Query
>> Mode API and ensure that it works correctly in OpenRefine 2.6 RC2 latest.
>>
>> Happy Hacking anyone :)
>> Let us know if we can answer any questions regarding OpenRefine or the
>> Reconcile API , on our own mailing list.
>> http://groups.google.com/group/openrefine/
>>
>> -Thad
>>
>>
>> On Thu, Jan 26, 2017 at 11:18 AM AMIT KUMAR JAISWAL
>> mailto:amitkumarj...@gmail.com>> wrote:
>>
>> Hey Alina,
>>
>> Thanks for letting us know about this.
>>
>> I'll start testing it after configuring OpenRefine(as it's API is
>> implemented in WMF).
>>
>> Can you share me the open task related to this?
>>
>> Cheers,
>> Amit Kumar Jaiswal
>>
>> On 1/26/17, Antonin Delpeuch (lists) > > wrote:
>> > Hi Magnus,
>> >
>> > Mix'n'match looks great and I do have a few questions about it.
>> I'd like
>> > to use it to import a dataset, which looks like this (these are
>> the 100
>> > first lines):
>> > http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt
>> >
>> > I see how to import it in Mix'n'match, but given all the columns
>> I have
>> > in this dataset, I think that it is a bit sad to resort to
>> matching on
>> > the name only.
>> >
>> > Do you see any way to do some fuzzy-matching on, say, the URLs
>> provided
>> > in the dataset against the "official website" property? I think
>> that it
>> > would be possible with the (proposed) Wikidata interface for
>> OpenRefine
>> > (if I understand the UI correctly).
>> >
>> > In this context, I think it might even be possible to confirm
>> matches
>> > automatically (when the matches are excellent on multiple
>> columns). As
>> > the dataset is rather large (400,000 lines) I would not really
>> want to
>> > validate them one after the other with the web interface. So I would
>> > need a sort of batch edit. How would you do that?
>> >
>> > Finally, once matches are found, it would be great if statements
>> > corresponding to the various columns could be created in the
>> items (if
>> > these statements don't already exist). With the appropriate
>> reference to
>> > the dataset, ideally.
>> >
>> > I realise this is a lot to ask - maybe I should just write a bot.
>> >
>> > Alina, sorry to hijack your thread. I hope my questions were general
>> > enough to be interesting for other readers.
>> >
>> > Cheers,
>> > Antonin
>> >
>> >
>> > On 26/01/2017 16:01, Magnus Manske wrote:
>> >> If you want to match your list to Wikidata, to find which entries
>> >> already exist, have you considered Mix'n'match?
>> >> https://tools.wmflabs.org/mix-n-match/
>> >>
>> >> You can upload your names and identifiers at
>> >> https://tools.wmflabs.org/mix-n-match/import.php
>> >>
>> >> There are several mechanisms in place to help with the
>> matching. Please
>> >> contact me if you need help!
>> >>
>> >> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
>> >> > 
>> > >> wrote:
>> >>
>> >> Alina, I just found your bug report, which you filed under
>> the wrong
>> >> issue tracker. The git repo (source code, issue tracker
>> etc.) are
>> >> here:
>> >> https://bitbucket.org/magnusmanske/reconcile
>> >>
>> >> The report says it "keeps hanging", which is so vague that it's
>> >> impossible to debug, especially since the example linked on
>> >> https://tools.wmflabs.org/wikidata-reconcile/
>> >> works perfectly fine for me.
>> >>
>> >> Does it not work at all for you? Does it work for a time,
>> but then
>> >> stops? Does it "break" reproducibly on specifi

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-27 Thread Antonin Delpeuch (lists)
Hi Magnus,

The dataset is essentially this one: http://isni.ringgold.com/

I am currently augmenting it with Ringgold IDs (P3500) using ORCID (this
should be completed in a few days). This alignment only adds two columns
(Ringgold ID and organization type) which should not impact the task of
matching it with Wikidata (as there are virtually no Ringgold IDs in
Wikidata yet).

Cheers,
Antonin

On 27/01/2017 09:18, Magnus Manske wrote:
> Hi Antonin,
> 
> mix'n'match is designed to work with almost any dataset, thus uses the
> common denominator, which is names, for matching.
> 
> There are mechanisms to match on other properties, but writing an
> interface for public consumption for this would be a task that could
> easily keep an entire team of programmers busy :-)
> 
> If you can give me the whole list to download, I will see what I can do
> in terms of auxiliary data matching. Maybe a combination of that, manual
> matches (or at least confirmations on name matches), and the OpenRefine
> approach will give us maximum coverage.
> 
> It appears Kunstenpunt has no Wikidata property yet. Maybe Romaine could
> star setting one up? That would help in terms of synchronisation, I believe.
> 
> Cheers,
> Magnus
> 
> 
> 
> On Thu, Jan 26, 2017 at 4:44 PM Antonin Delpeuch (lists)
> mailto:li...@antonin.delpeuch.eu>> wrote:
> 
> Hi Magnus,
> 
> Mix'n'match looks great and I do have a few questions about it. I'd like
> to use it to import a dataset, which looks like this (these are the 100
> first lines):
> http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt
> 
> I see how to import it in Mix'n'match, but given all the columns I have
> in this dataset, I think that it is a bit sad to resort to matching on
> the name only.
> 
> Do you see any way to do some fuzzy-matching on, say, the URLs provided
> in the dataset against the "official website" property? I think that it
> would be possible with the (proposed) Wikidata interface for OpenRefine
> (if I understand the UI correctly).
> 
> In this context, I think it might even be possible to confirm matches
> automatically (when the matches are excellent on multiple columns). As
> the dataset is rather large (400,000 lines) I would not really want to
> validate them one after the other with the web interface. So I would
> need a sort of batch edit. How would you do that?
> 
> Finally, once matches are found, it would be great if statements
> corresponding to the various columns could be created in the items (if
> these statements don't already exist). With the appropriate reference to
> the dataset, ideally.
> 
> I realise this is a lot to ask - maybe I should just write a bot.
> 
> Alina, sorry to hijack your thread. I hope my questions were general
> enough to be interesting for other readers.
> 
> Cheers,
> Antonin
> 
> 
> On 26/01/2017 16:01, Magnus Manske wrote:
> > If you want to match your list to Wikidata, to find which entries
> > already exist, have you considered Mix'n'match?
> > https://tools.wmflabs.org/mix-n-match/
> >
> > You can upload your names and identifiers at
> > https://tools.wmflabs.org/mix-n-match/import.php
> >
> > There are several mechanisms in place to help with the matching.
> Please
> > contact me if you need help!
> >
> > On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> > mailto:magnusman...@googlemail.com>
>  >> wrote:
> >
> > Alina, I just found your bug report, which you filed under the
> wrong
> > issue tracker. The git repo (source code, issue tracker etc.)
> are here:
> > https://bitbucket.org/magnusmanske/reconcile
> >
> > The report says it "keeps hanging", which is so vague that it's
> > impossible to debug, especially since the example linked on
> > https://tools.wmflabs.org/wikidata-reconcile/
> > works perfectly fine for me.
> >
> > Does it not work at all for you? Does it work for a time, but then
> > stops? Does it "break" reproducibly on specific queries, or at
> > random? Maybe it breaks for specific "types" only? At what
> rate are
> > you hitting the tool? Do you have an example query, preferably one
> > that breaks?
> >
> > Please note that this is not an "official" WMF service, only parts
> > of the API are implemented, and there are currently other
> technical
> > limitations on it.
> >
> > Cheers,
> > Magnus
> >
> > On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists)
> > mailto:li...@antonin.delpeuch.eu>
>  >> wrote:
> >
> > Hi,
> >
> > I'm also very interested in this. How 

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-27 Thread Sandra Fauconnier
+1 from someone who would be so extremely happy (and much more productive) if 
such a service were implemented in OpenRefine.

I also added it as a task to Phabricator, feel free to comment, add 
suggestions… https://phabricator.wikimedia.org/T146740 


Best, Sandra/User:Spinster

> On 26 Jan 2017, at 19:00, Thad Guidry  wrote:
> 
> Everyone,
> 
> Yes, our OpenRefine API can use Multiple Query Mode (reconciling an Entity by 
> using multiple columns/ WD properties)
> 
> https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode
>  
> 
> I do not think that Magnus has implemented our Multiple Query Mode yet, 
> however.
> The bounty issue https://github.com/OpenRefine/OpenRefine/issues/805 
>   that I created and 
> funded on BountySource.com is to fully implement the Mutliple Query Mode API 
> and ensure that it works correctly in OpenRefine 2.6 RC2 latest.
> 
> Happy Hacking anyone :)
> Let us know if we can answer any questions regarding OpenRefine or the 
> Reconcile API , on our own mailing list.
> http://groups.google.com/group/openrefine/ 
> 
> 
> -Thad
> 
> 
> On Thu, Jan 26, 2017 at 11:18 AM AMIT KUMAR JAISWAL  > wrote:
> Hey Alina,
> 
> Thanks for letting us know about this.
> 
> I'll start testing it after configuring OpenRefine(as it's API is
> implemented in WMF).
> 
> Can you share me the open task related to this?
> 
> Cheers,
> Amit Kumar Jaiswal
> 
> On 1/26/17, Antonin Delpeuch (lists)  > wrote:
> > Hi Magnus,
> >
> > Mix'n'match looks great and I do have a few questions about it. I'd like
> > to use it to import a dataset, which looks like this (these are the 100
> > first lines):
> > http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt 
> > 
> >
> > I see how to import it in Mix'n'match, but given all the columns I have
> > in this dataset, I think that it is a bit sad to resort to matching on
> > the name only.
> >
> > Do you see any way to do some fuzzy-matching on, say, the URLs provided
> > in the dataset against the "official website" property? I think that it
> > would be possible with the (proposed) Wikidata interface for OpenRefine
> > (if I understand the UI correctly).
> >
> > In this context, I think it might even be possible to confirm matches
> > automatically (when the matches are excellent on multiple columns). As
> > the dataset is rather large (400,000 lines) I would not really want to
> > validate them one after the other with the web interface. So I would
> > need a sort of batch edit. How would you do that?
> >
> > Finally, once matches are found, it would be great if statements
> > corresponding to the various columns could be created in the items (if
> > these statements don't already exist). With the appropriate reference to
> > the dataset, ideally.
> >
> > I realise this is a lot to ask - maybe I should just write a bot.
> >
> > Alina, sorry to hijack your thread. I hope my questions were general
> > enough to be interesting for other readers.
> >
> > Cheers,
> > Antonin
> >
> >
> > On 26/01/2017 16:01, Magnus Manske wrote:
> >> If you want to match your list to Wikidata, to find which entries
> >> already exist, have you considered Mix'n'match?
> >> https://tools.wmflabs.org/mix-n-match/ 
> >> 
> >>
> >> You can upload your names and identifiers at
> >> https://tools.wmflabs.org/mix-n-match/import.php 
> >> 
> >>
> >> There are several mechanisms in place to help with the matching. Please
> >> contact me if you need help!
> >>
> >> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> >> mailto:magnusman...@googlemail.com> 
> >> >> 
> >> wrote:
> >>
> >> Alina, I just found your bug report, which you filed under the wrong
> >> issue tracker. The git repo (source code, issue tracker etc.) are
> >> here:
> >> https://bitbucket.org/magnusmanske/reconcile 
> >> 
> >>
> >> The report says it "keeps hanging", which is so vague that it's
> >> impossible to debug, especially since the example linked on
> >> https://tools.wmflabs.org/wikidata-reconcile/ 
> >> 
> >> works perfectly fine for me.
> >>
> >> Does it not work at all for you? Does it work for a time, but then
> >> stops? Does it "break" reproducibly on specific queries, or at
> >> random? Maybe it breaks for specific "types" only? At what rate are
> >> you hitting the tool? Do you have an example query, preferably one
> >>   

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-27 Thread Magnus Manske
Hi Antonin,

mix'n'match is designed to work with almost any dataset, thus uses the
common denominator, which is names, for matching.

There are mechanisms to match on other properties, but writing an interface
for public consumption for this would be a task that could easily keep an
entire team of programmers busy :-)

If you can give me the whole list to download, I will see what I can do in
terms of auxiliary data matching. Maybe a combination of that, manual
matches (or at least confirmations on name matches), and the OpenRefine
approach will give us maximum coverage.

It appears Kunstenpunt has no Wikidata property yet. Maybe Romaine could
star setting one up? That would help in terms of synchronisation, I believe.

Cheers,
Magnus



On Thu, Jan 26, 2017 at 4:44 PM Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi Magnus,
>
> Mix'n'match looks great and I do have a few questions about it. I'd like
> to use it to import a dataset, which looks like this (these are the 100
> first lines):
> http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt
>
> I see how to import it in Mix'n'match, but given all the columns I have
> in this dataset, I think that it is a bit sad to resort to matching on
> the name only.
>
> Do you see any way to do some fuzzy-matching on, say, the URLs provided
> in the dataset against the "official website" property? I think that it
> would be possible with the (proposed) Wikidata interface for OpenRefine
> (if I understand the UI correctly).
>
> In this context, I think it might even be possible to confirm matches
> automatically (when the matches are excellent on multiple columns). As
> the dataset is rather large (400,000 lines) I would not really want to
> validate them one after the other with the web interface. So I would
> need a sort of batch edit. How would you do that?
>
> Finally, once matches are found, it would be great if statements
> corresponding to the various columns could be created in the items (if
> these statements don't already exist). With the appropriate reference to
> the dataset, ideally.
>
> I realise this is a lot to ask - maybe I should just write a bot.
>
> Alina, sorry to hijack your thread. I hope my questions were general
> enough to be interesting for other readers.
>
> Cheers,
> Antonin
>
>
> On 26/01/2017 16:01, Magnus Manske wrote:
> > If you want to match your list to Wikidata, to find which entries
> > already exist, have you considered Mix'n'match?
> > https://tools.wmflabs.org/mix-n-match/
> >
> > You can upload your names and identifiers at
> > https://tools.wmflabs.org/mix-n-match/import.php
> >
> > There are several mechanisms in place to help with the matching. Please
> > contact me if you need help!
> >
> > On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> > mailto:magnusman...@googlemail.com>>
> wrote:
> >
> > Alina, I just found your bug report, which you filed under the wrong
> > issue tracker. The git repo (source code, issue tracker etc.) are
> here:
> > https://bitbucket.org/magnusmanske/reconcile
> >
> > The report says it "keeps hanging", which is so vague that it's
> > impossible to debug, especially since the example linked on
> > https://tools.wmflabs.org/wikidata-reconcile/
> > works perfectly fine for me.
> >
> > Does it not work at all for you? Does it work for a time, but then
> > stops? Does it "break" reproducibly on specific queries, or at
> > random? Maybe it breaks for specific "types" only? At what rate are
> > you hitting the tool? Do you have an example query, preferably one
> > that breaks?
> >
> > Please note that this is not an "official" WMF service, only parts
> > of the API are implemented, and there are currently other technical
> > limitations on it.
> >
> > Cheers,
> > Magnus
> >
> > On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists)
> > mailto:li...@antonin.delpeuch.eu>>
> wrote:
> >
> > Hi,
> >
> > I'm also very interested in this. How did you configure your
> > OpenRefine
> > to use Wikidata? (Even if it does not currently work, I am
> > interested in
> > the setup.)
> >
> > There is currently an open issue (with a nice bounty) to improve
> the
> > integration of Wikidata in OpenRefine:
> > https://github.com/OpenRefine/OpenRefine/issues/805
> >
> > Best regards,
> > Antonin
> >
> > On 26/01/2017 12:22, Alina Saenko wrote:
> > > Hello everyone,
> > >
> > > I have a question for people who are using the Wikidata
> > reconciliation
> > > service: https://tools.wmflabs.org/wikidata-reconcile/ It was
> > working
> > > perfectly in my Open Refine in november 2016, but since
> > december is
> > > stopped working. I already have contacted Magnus Manske, but
> > he hasn’t
> > > responded yet. Does anyone else experience problem

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread David Lowe
Another alternative (and, apologies if you consider it off-topic) are
the Wikipedia
and Wikidata Tools for Google Spreadsheets

(made by Dr. Thomas Steiner of this group). I'm working on about 250,000
names (in multiple sheets), most with life dates. This add-on matches just
on the name & returns the QID, which you can then use to pull back
Wikidata's birth (P569) & death (P570) dates. After a bit of cleaning of
that data, I can instantly tell whether it's matched the correct name. It's
been very useful for matching "low hanging fruit". And perhaps one of the
other options above is more appropriate for the remaining, more difficult
matches.

David


*David Lowe | The New York Public Library**Specialist II, Photography
Collection*

*Photographers' Identities Catalog *

On Thu, Jan 26, 2017 at 1:00 PM, Thad Guidry  wrote:

> Everyone,
>
> Yes, our OpenRefine API can use Multiple Query Mode (reconciling an Entity
> by using multiple columns/ WD properties)
>
> https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#
> multiple-query-mode
>
> I do not think that Magnus has implemented our Multiple Query Mode yet,
> however.
> The bounty issue https://github.com/OpenRefine/OpenRefine/issues/805
> that I created and funded on BountySource.com is to fully implement the
> Mutliple Query Mode API and ensure that it works correctly in OpenRefine
> 2.6 RC2 latest.
>
> Happy Hacking anyone :)
> Let us know if we can answer any questions regarding OpenRefine or the
> Reconcile API , on our own mailing list.
> http://groups.google.com/group/openrefine/
>
> -Thad
>
>
> On Thu, Jan 26, 2017 at 11:18 AM AMIT KUMAR JAISWAL <
> amitkumarj...@gmail.com> wrote:
>
>> Hey Alina,
>>
>> Thanks for letting us know about this.
>>
>> I'll start testing it after configuring OpenRefine(as it's API is
>> implemented in WMF).
>>
>> Can you share me the open task related to this?
>>
>> Cheers,
>> Amit Kumar Jaiswal
>>
>> On 1/26/17, Antonin Delpeuch (lists)  wrote:
>> > Hi Magnus,
>> >
>> > Mix'n'match looks great and I do have a few questions about it. I'd like
>> > to use it to import a dataset, which looks like this (these are the 100
>> > first lines):
>> > http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt
>> >
>> > I see how to import it in Mix'n'match, but given all the columns I have
>> > in this dataset, I think that it is a bit sad to resort to matching on
>> > the name only.
>> >
>> > Do you see any way to do some fuzzy-matching on, say, the URLs provided
>> > in the dataset against the "official website" property? I think that it
>> > would be possible with the (proposed) Wikidata interface for OpenRefine
>> > (if I understand the UI correctly).
>> >
>> > In this context, I think it might even be possible to confirm matches
>> > automatically (when the matches are excellent on multiple columns). As
>> > the dataset is rather large (400,000 lines) I would not really want to
>> > validate them one after the other with the web interface. So I would
>> > need a sort of batch edit. How would you do that?
>> >
>> > Finally, once matches are found, it would be great if statements
>> > corresponding to the various columns could be created in the items (if
>> > these statements don't already exist). With the appropriate reference to
>> > the dataset, ideally.
>> >
>> > I realise this is a lot to ask - maybe I should just write a bot.
>> >
>> > Alina, sorry to hijack your thread. I hope my questions were general
>> > enough to be interesting for other readers.
>> >
>> > Cheers,
>> > Antonin
>> >
>> >
>> > On 26/01/2017 16:01, Magnus Manske wrote:
>> >> If you want to match your list to Wikidata, to find which entries
>> >> already exist, have you considered Mix'n'match?
>> >> https://tools.wmflabs.org/mix-n-match/
>> >>
>> >> You can upload your names and identifiers at
>> >> https://tools.wmflabs.org/mix-n-match/import.php
>> >>
>> >> There are several mechanisms in place to help with the matching. Please
>> >> contact me if you need help!
>> >>
>> >> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
>> >> mailto:magnusman...@googlemail.com>>
>> wrote:
>> >>
>> >> Alina, I just found your bug report, which you filed under the
>> wrong
>> >> issue tracker. The git repo (source code, issue tracker etc.) are
>> >> here:
>> >> https://bitbucket.org/magnusmanske/reconcile
>> >>
>> >> The report says it "keeps hanging", which is so vague that it's
>> >> impossible to debug, especially since the example linked on
>> >> https://tools.wmflabs.org/wikidata-reconcile/
>> >> works perfectly fine for me.
>> >>
>> >> Does it not work at all for you? Does it work for a time, but then
>> >> stops? Does it "break" reproducibly on specific queries, or at
>> >> random? Maybe it breaks for specific "types" only? At what rate are
>> >> you hitting 

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread Thad Guidry
Everyone,

Yes, our OpenRefine API can use Multiple Query Mode (reconciling an Entity
by using multiple columns/ WD properties)

https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode

I do not think that Magnus has implemented our Multiple Query Mode yet,
however.
The bounty issue https://github.com/OpenRefine/OpenRefine/issues/805  that
I created and funded on BountySource.com is to fully implement the Mutliple
Query Mode API and ensure that it works correctly in OpenRefine 2.6 RC2
latest.

Happy Hacking anyone :)
Let us know if we can answer any questions regarding OpenRefine or the
Reconcile API , on our own mailing list.
http://groups.google.com/group/openrefine/

-Thad


On Thu, Jan 26, 2017 at 11:18 AM AMIT KUMAR JAISWAL 
wrote:

> Hey Alina,
>
> Thanks for letting us know about this.
>
> I'll start testing it after configuring OpenRefine(as it's API is
> implemented in WMF).
>
> Can you share me the open task related to this?
>
> Cheers,
> Amit Kumar Jaiswal
>
> On 1/26/17, Antonin Delpeuch (lists)  wrote:
> > Hi Magnus,
> >
> > Mix'n'match looks great and I do have a few questions about it. I'd like
> > to use it to import a dataset, which looks like this (these are the 100
> > first lines):
> > http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt
> >
> > I see how to import it in Mix'n'match, but given all the columns I have
> > in this dataset, I think that it is a bit sad to resort to matching on
> > the name only.
> >
> > Do you see any way to do some fuzzy-matching on, say, the URLs provided
> > in the dataset against the "official website" property? I think that it
> > would be possible with the (proposed) Wikidata interface for OpenRefine
> > (if I understand the UI correctly).
> >
> > In this context, I think it might even be possible to confirm matches
> > automatically (when the matches are excellent on multiple columns). As
> > the dataset is rather large (400,000 lines) I would not really want to
> > validate them one after the other with the web interface. So I would
> > need a sort of batch edit. How would you do that?
> >
> > Finally, once matches are found, it would be great if statements
> > corresponding to the various columns could be created in the items (if
> > these statements don't already exist). With the appropriate reference to
> > the dataset, ideally.
> >
> > I realise this is a lot to ask - maybe I should just write a bot.
> >
> > Alina, sorry to hijack your thread. I hope my questions were general
> > enough to be interesting for other readers.
> >
> > Cheers,
> > Antonin
> >
> >
> > On 26/01/2017 16:01, Magnus Manske wrote:
> >> If you want to match your list to Wikidata, to find which entries
> >> already exist, have you considered Mix'n'match?
> >> https://tools.wmflabs.org/mix-n-match/
> >>
> >> You can upload your names and identifiers at
> >> https://tools.wmflabs.org/mix-n-match/import.php
> >>
> >> There are several mechanisms in place to help with the matching. Please
> >> contact me if you need help!
> >>
> >> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> >> mailto:magnusman...@googlemail.com>>
> wrote:
> >>
> >> Alina, I just found your bug report, which you filed under the wrong
> >> issue tracker. The git repo (source code, issue tracker etc.) are
> >> here:
> >> https://bitbucket.org/magnusmanske/reconcile
> >>
> >> The report says it "keeps hanging", which is so vague that it's
> >> impossible to debug, especially since the example linked on
> >> https://tools.wmflabs.org/wikidata-reconcile/
> >> works perfectly fine for me.
> >>
> >> Does it not work at all for you? Does it work for a time, but then
> >> stops? Does it "break" reproducibly on specific queries, or at
> >> random? Maybe it breaks for specific "types" only? At what rate are
> >> you hitting the tool? Do you have an example query, preferably one
> >> that breaks?
> >>
> >> Please note that this is not an "official" WMF service, only parts
> >> of the API are implemented, and there are currently other technical
> >> limitations on it.
> >>
> >> Cheers,
> >> Magnus
> >>
> >> On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists)
> >> mailto:li...@antonin.delpeuch.eu>>
> wrote:
> >>
> >> Hi,
> >>
> >> I'm also very interested in this. How did you configure your
> >> OpenRefine
> >> to use Wikidata? (Even if it does not currently work, I am
> >> interested in
> >> the setup.)
> >>
> >> There is currently an open issue (with a nice bounty) to improve
> >> the
> >> integration of Wikidata in OpenRefine:
> >> https://github.com/OpenRefine/OpenRefine/issues/805
> >>
> >> Best regards,
> >> Antonin
> >>
> >> On 26/01/2017 12:22, Alina Saenko wrote:
> >> > Hello everyone,
> >> >
> >> > I have a question for people who are using the Wikidata
> >> rec

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread AMIT KUMAR JAISWAL
Hey Alina,

Thanks for letting us know about this.

I'll start testing it after configuring OpenRefine(as it's API is
implemented in WMF).

Can you share me the open task related to this?

Cheers,
Amit Kumar Jaiswal

On 1/26/17, Antonin Delpeuch (lists)  wrote:
> Hi Magnus,
>
> Mix'n'match looks great and I do have a few questions about it. I'd like
> to use it to import a dataset, which looks like this (these are the 100
> first lines):
> http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt
>
> I see how to import it in Mix'n'match, but given all the columns I have
> in this dataset, I think that it is a bit sad to resort to matching on
> the name only.
>
> Do you see any way to do some fuzzy-matching on, say, the URLs provided
> in the dataset against the "official website" property? I think that it
> would be possible with the (proposed) Wikidata interface for OpenRefine
> (if I understand the UI correctly).
>
> In this context, I think it might even be possible to confirm matches
> automatically (when the matches are excellent on multiple columns). As
> the dataset is rather large (400,000 lines) I would not really want to
> validate them one after the other with the web interface. So I would
> need a sort of batch edit. How would you do that?
>
> Finally, once matches are found, it would be great if statements
> corresponding to the various columns could be created in the items (if
> these statements don't already exist). With the appropriate reference to
> the dataset, ideally.
>
> I realise this is a lot to ask - maybe I should just write a bot.
>
> Alina, sorry to hijack your thread. I hope my questions were general
> enough to be interesting for other readers.
>
> Cheers,
> Antonin
>
>
> On 26/01/2017 16:01, Magnus Manske wrote:
>> If you want to match your list to Wikidata, to find which entries
>> already exist, have you considered Mix'n'match?
>> https://tools.wmflabs.org/mix-n-match/
>>
>> You can upload your names and identifiers at
>> https://tools.wmflabs.org/mix-n-match/import.php
>>
>> There are several mechanisms in place to help with the matching. Please
>> contact me if you need help!
>>
>> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
>> mailto:magnusman...@googlemail.com>> wrote:
>>
>> Alina, I just found your bug report, which you filed under the wrong
>> issue tracker. The git repo (source code, issue tracker etc.) are
>> here:
>> https://bitbucket.org/magnusmanske/reconcile
>>
>> The report says it "keeps hanging", which is so vague that it's
>> impossible to debug, especially since the example linked on
>> https://tools.wmflabs.org/wikidata-reconcile/
>> works perfectly fine for me.
>>
>> Does it not work at all for you? Does it work for a time, but then
>> stops? Does it "break" reproducibly on specific queries, or at
>> random? Maybe it breaks for specific "types" only? At what rate are
>> you hitting the tool? Do you have an example query, preferably one
>> that breaks?
>>
>> Please note that this is not an "official" WMF service, only parts
>> of the API are implemented, and there are currently other technical
>> limitations on it.
>>
>> Cheers,
>> Magnus
>>
>> On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists)
>> mailto:li...@antonin.delpeuch.eu>> wrote:
>>
>> Hi,
>>
>> I'm also very interested in this. How did you configure your
>> OpenRefine
>> to use Wikidata? (Even if it does not currently work, I am
>> interested in
>> the setup.)
>>
>> There is currently an open issue (with a nice bounty) to improve
>> the
>> integration of Wikidata in OpenRefine:
>> https://github.com/OpenRefine/OpenRefine/issues/805
>>
>> Best regards,
>> Antonin
>>
>> On 26/01/2017 12:22, Alina Saenko wrote:
>> > Hello everyone,
>> >
>> > I have a question for people who are using the Wikidata
>> reconciliation
>> > service: https://tools.wmflabs.org/wikidata-reconcile/ It was
>> working
>> > perfectly in my Open Refine in november 2016, but since
>> december is
>> > stopped working. I already have contacted Magnus Manske, but
>> he hasn’t
>> > responded yet. Does anyone else experience problems with the
>> service and
>> > know how to fix it?
>> >
>> > I’m using this service to link big lists of Belgian artists
>> (37.000) and
>> > performance art organisations (1.000) to Wikidata as a
>> preparation to
>> > upload contextual data about these persons and organisations to
>> > Wikidata. This data wil come from Kunstenpunt database
>> > (http://data.kunsten.be/people). Wikimedia user Romaine
>> > (https://meta.wikimedia.org/wiki/User:Romaine) is helping us
>> with this
>> > project.
>> >
>> > Best regards,
>>

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread Antonin Delpeuch (lists)
Hi Magnus,

Mix'n'match looks great and I do have a few questions about it. I'd like
to use it to import a dataset, which looks like this (these are the 100
first lines):
http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt

I see how to import it in Mix'n'match, but given all the columns I have
in this dataset, I think that it is a bit sad to resort to matching on
the name only.

Do you see any way to do some fuzzy-matching on, say, the URLs provided
in the dataset against the "official website" property? I think that it
would be possible with the (proposed) Wikidata interface for OpenRefine
(if I understand the UI correctly).

In this context, I think it might even be possible to confirm matches
automatically (when the matches are excellent on multiple columns). As
the dataset is rather large (400,000 lines) I would not really want to
validate them one after the other with the web interface. So I would
need a sort of batch edit. How would you do that?

Finally, once matches are found, it would be great if statements
corresponding to the various columns could be created in the items (if
these statements don't already exist). With the appropriate reference to
the dataset, ideally.

I realise this is a lot to ask - maybe I should just write a bot.

Alina, sorry to hijack your thread. I hope my questions were general
enough to be interesting for other readers.

Cheers,
Antonin


On 26/01/2017 16:01, Magnus Manske wrote:
> If you want to match your list to Wikidata, to find which entries
> already exist, have you considered Mix'n'match?
> https://tools.wmflabs.org/mix-n-match/
> 
> You can upload your names and identifiers at
> https://tools.wmflabs.org/mix-n-match/import.php
> 
> There are several mechanisms in place to help with the matching. Please
> contact me if you need help! 
> 
> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> mailto:magnusman...@googlemail.com>> wrote:
> 
> Alina, I just found your bug report, which you filed under the wrong
> issue tracker. The git repo (source code, issue tracker etc.) are here:
> https://bitbucket.org/magnusmanske/reconcile
> 
> The report says it "keeps hanging", which is so vague that it's
> impossible to debug, especially since the example linked on
> https://tools.wmflabs.org/wikidata-reconcile/
> works perfectly fine for me.
> 
> Does it not work at all for you? Does it work for a time, but then
> stops? Does it "break" reproducibly on specific queries, or at
> random? Maybe it breaks for specific "types" only? At what rate are
> you hitting the tool? Do you have an example query, preferably one
> that breaks?
> 
> Please note that this is not an "official" WMF service, only parts
> of the API are implemented, and there are currently other technical
> limitations on it.
> 
> Cheers,
> Magnus
> 
> On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists)
> mailto:li...@antonin.delpeuch.eu>> wrote:
> 
> Hi,
> 
> I'm also very interested in this. How did you configure your
> OpenRefine
> to use Wikidata? (Even if it does not currently work, I am
> interested in
> the setup.)
> 
> There is currently an open issue (with a nice bounty) to improve the
> integration of Wikidata in OpenRefine:
> https://github.com/OpenRefine/OpenRefine/issues/805
> 
> Best regards,
> Antonin
> 
> On 26/01/2017 12:22, Alina Saenko wrote:
> > Hello everyone,
> >
> > I have a question for people who are using the Wikidata
> reconciliation
> > service: https://tools.wmflabs.org/wikidata-reconcile/ It was
> working
> > perfectly in my Open Refine in november 2016, but since
> december is
> > stopped working. I already have contacted Magnus Manske, but
> he hasn’t
> > responded yet. Does anyone else experience problems with the
> service and
> > know how to fix it?
> >
> > I’m using this service to link big lists of Belgian artists
> (37.000) and
> > performance art organisations (1.000) to Wikidata as a
> preparation to
> > upload contextual data about these persons and organisations to
> > Wikidata. This data wil come from Kunstenpunt database
> > (http://data.kunsten.be/people). Wikimedia user Romaine
> > (https://meta.wikimedia.org/wiki/User:Romaine) is helping us
> with this
> > project.
> >
> > Best regards,
> > Alina
> >
> >
> > --
> > Aanwezig ma, di, wo, do
> >
> > PACKED vzw - Expertisecentrum Digitaal Erfgoed
> > Rue Delaunoystraat 58 bus 23
> > B-1080 Brussel
> > Belgium
> >
> > e al...@packed.be 
> >
> > t: +32 (0)2 

Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread Magnus Manske
If you want to match your list to Wikidata, to find which entries already
exist, have you considered Mix'n'match?
https://tools.wmflabs.org/mix-n-match/

You can upload your names and identifiers at
https://tools.wmflabs.org/mix-n-match/import.php

There are several mechanisms in place to help with the matching. Please
contact me if you need help!

On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske 
wrote:

> Alina, I just found your bug report, which you filed under the wrong issue
> tracker. The git repo (source code, issue tracker etc.) are here:
> https://bitbucket.org/magnusmanske/reconcile
>
> The report says it "keeps hanging", which is so vague that it's impossible
> to debug, especially since the example linked on
> https://tools.wmflabs.org/wikidata-reconcile/
> works perfectly fine for me.
>
> Does it not work at all for you? Does it work for a time, but then stops?
> Does it "break" reproducibly on specific queries, or at random? Maybe it
> breaks for specific "types" only? At what rate are you hitting the tool? Do
> you have an example query, preferably one that breaks?
>
> Please note that this is not an "official" WMF service, only parts of the
> API are implemented, and there are currently other technical limitations on
> it.
>
> Cheers,
> Magnus
>
> On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists) <
> li...@antonin.delpeuch.eu> wrote:
>
> Hi,
>
> I'm also very interested in this. How did you configure your OpenRefine
> to use Wikidata? (Even if it does not currently work, I am interested in
> the setup.)
>
> There is currently an open issue (with a nice bounty) to improve the
> integration of Wikidata in OpenRefine:
> https://github.com/OpenRefine/OpenRefine/issues/805
>
> Best regards,
> Antonin
>
> On 26/01/2017 12:22, Alina Saenko wrote:
> > Hello everyone,
> >
> > I have a question for people who are using the Wikidata reconciliation
> > service: https://tools.wmflabs.org/wikidata-reconcile/ It was working
> > perfectly in my Open Refine in november 2016, but since december is
> > stopped working. I already have contacted Magnus Manske, but he hasn’t
> > responded yet. Does anyone else experience problems with the service and
> > know how to fix it?
> >
> > I’m using this service to link big lists of Belgian artists (37.000) and
> > performance art organisations (1.000) to Wikidata as a preparation to
> > upload contextual data about these persons and organisations to
> > Wikidata. This data wil come from Kunstenpunt database
> > (http://data.kunsten.be/people). Wikimedia user Romaine
> > (https://meta.wikimedia.org/wiki/User:Romaine) is helping us with this
> > project.
> >
> > Best regards,
> > Alina
> >
> >
> > --
> > Aanwezig ma, di, wo, do
> >
> > PACKED vzw - Expertisecentrum Digitaal Erfgoed
> > Rue Delaunoystraat 58 bus 23
> > B-1080 Brussel
> > Belgium
> >
> > e al...@packed.be 
> > t: +32 (0)2 217 14 05 <+32%202%20217%2014%2005>
> > w www.packed.be 
> >
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread Magnus Manske
Alina, I just found your bug report, which you filed under the wrong issue
tracker. The git repo (source code, issue tracker etc.) are here:
https://bitbucket.org/magnusmanske/reconcile

The report says it "keeps hanging", which is so vague that it's impossible
to debug, especially since the example linked on
https://tools.wmflabs.org/wikidata-reconcile/
works perfectly fine for me.

Does it not work at all for you? Does it work for a time, but then stops?
Does it "break" reproducibly on specific queries, or at random? Maybe it
breaks for specific "types" only? At what rate are you hitting the tool? Do
you have an example query, preferably one that breaks?

Please note that this is not an "official" WMF service, only parts of the
API are implemented, and there are currently other technical limitations on
it.

Cheers,
Magnus

On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi,
>
> I'm also very interested in this. How did you configure your OpenRefine
> to use Wikidata? (Even if it does not currently work, I am interested in
> the setup.)
>
> There is currently an open issue (with a nice bounty) to improve the
> integration of Wikidata in OpenRefine:
> https://github.com/OpenRefine/OpenRefine/issues/805
>
> Best regards,
> Antonin
>
> On 26/01/2017 12:22, Alina Saenko wrote:
> > Hello everyone,
> >
> > I have a question for people who are using the Wikidata reconciliation
> > service: https://tools.wmflabs.org/wikidata-reconcile/ It was working
> > perfectly in my Open Refine in november 2016, but since december is
> > stopped working. I already have contacted Magnus Manske, but he hasn’t
> > responded yet. Does anyone else experience problems with the service and
> > know how to fix it?
> >
> > I’m using this service to link big lists of Belgian artists (37.000) and
> > performance art organisations (1.000) to Wikidata as a preparation to
> > upload contextual data about these persons and organisations to
> > Wikidata. This data wil come from Kunstenpunt database
> > (http://data.kunsten.be/people). Wikimedia user Romaine
> > (https://meta.wikimedia.org/wiki/User:Romaine) is helping us with this
> > project.
> >
> > Best regards,
> > Alina
> >
> >
> > --
> > Aanwezig ma, di, wo, do
> >
> > PACKED vzw - Expertisecentrum Digitaal Erfgoed
> > Rue Delaunoystraat 58 bus 23
> > B-1080 Brussel
> > Belgium
> >
> > e al...@packed.be 
> > t: +32 (0)2 217 14 05 <+32%202%20217%2014%2005>
> > w www.packed.be 
> >
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread Antonin Delpeuch (lists)
Hi,

I'm also very interested in this. How did you configure your OpenRefine
to use Wikidata? (Even if it does not currently work, I am interested in
the setup.)

There is currently an open issue (with a nice bounty) to improve the
integration of Wikidata in OpenRefine:
https://github.com/OpenRefine/OpenRefine/issues/805

Best regards,
Antonin

On 26/01/2017 12:22, Alina Saenko wrote:
> Hello everyone,
> 
> I have a question for people who are using the Wikidata reconciliation
> service: https://tools.wmflabs.org/wikidata-reconcile/ It was working
> perfectly in my Open Refine in november 2016, but since december is
> stopped working. I already have contacted Magnus Manske, but he hasn’t
> responded yet. Does anyone else experience problems with the service and
> know how to fix it?
> 
> I’m using this service to link big lists of Belgian artists (37.000) and
> performance art organisations (1.000) to Wikidata as a preparation to
> upload contextual data about these persons and organisations to
> Wikidata. This data wil come from Kunstenpunt database
> (http://data.kunsten.be/people). Wikimedia user Romaine
> (https://meta.wikimedia.org/wiki/User:Romaine) is helping us with this
> project.
> 
> Best regards,
> Alina
> 
> 
> --
> Aanwezig ma, di, wo, do
> 
> PACKED vzw - Expertisecentrum Digitaal Erfgoed
> Rue Delaunoystraat 58 bus 23
> B-1080 Brussel
> Belgium
> 
> e al...@packed.be 
> t: +32 (0)2 217 14 05
> w www.packed.be 
> 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata reconciliation service and Ope Refine

2017-01-26 Thread Alina Saenko
Hello everyone,

I have a question for people who are using the Wikidata reconciliation service: 
https://tools.wmflabs.org/wikidata-reconcile/ It was working perfectly in my 
Open Refine in november 2016, but since december is stopped working. I already 
have contacted Magnus Manske, but he hasn’t responded yet. Does anyone else 
experience problems with the service and know how to fix it?

I’m using this service to link big lists of Belgian artists (37.000) and 
performance art organisations (1.000) to Wikidata as a preparation to upload 
contextual data about these persons and organisations to Wikidata. This data 
wil come from Kunstenpunt database (http://data.kunsten.be/people). Wikimedia 
user Romaine (https://meta.wikimedia.org/wiki/User:Romaine) is helping us with 
this project.

Best regards,
Alina


--
Aanwezig ma, di, wo, do

PACKED vzw - Expertisecentrum Digitaal Erfgoed
Rue Delaunoystraat 58 bus 23
B-1080 Brussel
Belgium

e al...@packed.be 
t: +32 (0)2 217 14 05
w www.packed.be 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata