Re: [Wikidata] Terms - search for corresponding WD-item and WP-article

2016-10-14 Thread Susanna Ånäs
Hi all

We are proposing GLAMpipe for a Wikimedia grant right now
. GLAMpipe is a
data import, manipulation and export tool. It can read a set of files,
APIs, tabular data etc., manipulate it (split, merge, format, make
lookups...) and export as files or to web services like Wikimedia Commons
or Wikidata.

We are applying for funding to create a Wikidata manipulation and export
node:

"New nodes for transforming data for Wikidata will be created. The
transformation node needs to take into account the triplet data structure,
be able to map data to Wikidata and format appropriately. The Wikidata
export node takes transformed data hash as parameter, makes sanity checks
for it and saves it to Wikidata. The export node checks that there are no
duplicate values and merges statement to existing ones if possible. The
export node is made using a Widar-like OAuth interface and plans to share
its code."

The kind of matching described here is what we aim to cater for.

We'd be enormously happy if you decided to endorse the project!

Cheers
Susanna Ånäs

2016-10-14 13:17 GMT+03:00 Sandra Fauconnier :

> What you are encountering here, is a major bottleneck and timesuck for any
> data import into Wikidata. Matching external lists of concepts (names of
> people, places, buildings, whatever) from external datasets correctly with
> the right Wikidata items is a thing that always takes me hours and hours
> and hours of work.
>
> In order to solve it, we need a working and user-friendly reconciliation
> tool that is integrated into a common data management platform (i.e.
> OpenRefine, and would also be fantastic to have it for Google Spreadsheets).
>
> Magnus has developed a basic API for it
> , but a working and
> user-friendly interface in one of those tools mentioned above is the
> missing link.
>
> I want to emphasize again that there is a bounty (money!) to be earned
> 
>  for
> those who develop this for OpenRefine.
>
> I have outlined the task in Phabricator too. https://phabricator.
> wikimedia.org/T146740
>
> Just putting this out here to give it attention again. It is such an
> important missing link in the workflow of anyone who wants to import data
> into Wikidata.
> I’m so desperate for it that I’m considering to collect funding and then
> hire an external developer to make it, but of course it would be best if it
> would be developed and maintained from within our community ;-)
>
> Greetings, Sandra
>
> On 13 Oct 2016, at 11:16, Markus Bärlocher 
> wrote:
>
> Hi Tom,
>
> This is a lighthouse case for my Google Sheets add-on
>
>
> Great tool - thanks!
> And more great tools included there :-)
>
> just add new terms to the "Terms" column, everything else fills
> automagically.
>
>
> I checked the first results by hand:
> 30% of the found WP-articles are specific helpful
> 70% of the URLs lead to not concordant content
>
> My idea:
> A "reliability index" may be could help?
>
> (1. handy approved accordance of Term and WP-article)
> 2. Term and Lemma identical
> 3. Term and section title identical
> 4. all words in Term found in Lemma
> 5. all words in Term found in section title
> 6. Term found as string in article text
>
> But I have no idea how to do this myself:
> https://github.com/tomayac/wikipedia-tools-for-google-
> spreadsheets/issues/11
>
> Best regards,
> Markus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Terms - search for corresponding WD-item and WP-article

2016-10-14 Thread Sandra Fauconnier
What you are encountering here, is a major bottleneck and timesuck for any data 
import into Wikidata. Matching external lists of concepts (names of people, 
places, buildings, whatever) from external datasets correctly with the right 
Wikidata items is a thing that always takes me hours and hours and hours of 
work.

In order to solve it, we need a working and user-friendly reconciliation tool 
that is integrated into a common data management platform (i.e. OpenRefine, and 
would also be fantastic to have it for Google Spreadsheets).

Magnus has developed a basic API for it 
, but a working and 
user-friendly interface in one of those tools mentioned above is the missing 
link.

I want to emphasize again that there is a bounty (money!) to be earned 

 for those who develop this for OpenRefine.

I have outlined the task in Phabricator too. 
https://phabricator.wikimedia.org/T146740 


Just putting this out here to give it attention again. It is such an important 
missing link in the workflow of anyone who wants to import data into Wikidata.
I’m so desperate for it that I’m considering to collect funding and then hire 
an external developer to make it, but of course it would be best if it would be 
developed and maintained from within our community ;-)

Greetings, Sandra

> On 13 Oct 2016, at 11:16, Markus Bärlocher  
> wrote:
> 
> Hi Tom,
> 
>> This is a lighthouse case for my Google Sheets add-on 
> 
> Great tool - thanks!
> And more great tools included there :-)
> 
>> just add new terms to the "Terms" column, everything else fills 
>> automagically.
> 
> I checked the first results by hand:
> 30% of the found WP-articles are specific helpful
> 70% of the URLs lead to not concordant content
> 
> My idea:
> A "reliability index" may be could help?
> 
> (1. handy approved accordance of Term and WP-article)
> 2. Term and Lemma identical
> 3. Term and section title identical
> 4. all words in Term found in Lemma
> 5. all words in Term found in section title
> 6. Term found as string in article text
> 
> But I have no idea how to do this myself:
> https://github.com/tomayac/wikipedia-tools-for-google-spreadsheets/issues/11
> 
> Best regards,
> Markus
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Terms - search for corresponding WD-item and WP-article

2016-10-10 Thread Biyanto Rebin
​Great add-on! Thank you, Thomas​

2016-10-10 15:28 GMT+07:00 Thomas Steiner :

> Hi Markus,
>
> This is a lighthouse case for my Google Sheets add-on Wikipedia Tools
> for Google Spreadsheets (bit.ly/wikipedia-tools-add-on). Here is an
> editable sheet (https://docs.google.com/spreadsheets/d/
> 1zAZBS09XAYzzL0e6ltTEc943ATvddN6DUgi076xe8qs/edit?usp=sharing)
> that you can continue to use, just add new terms to the "Terms"
> column, everything else fills automagically. Enjoy!
>
> Cheers,
> Tom
>
> --
> Dr. Thomas Steiner, Employee (http://blog.tomayac.com,
> https://twitter.com/tomayac)
>
> Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
> Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
> Registration office and registration number: Hamburg, HRB 86891
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.29 (GNU/Linux)
>
> iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DChara
> CTersAttH3b0ttom
> hTtPs://xKcd.cOm/1181/
> -END PGP SIGNATURE-
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 

Biyanto Rebin | Ketua Umum (*Chair*) 2016-2018
Wikimedia Indonesia
Nomor Ponsel: +62 8989 037379
Surel: biyanto.re...@wikimedia.or.id


Dukung upaya kami membebaskan pengetahuan:
http://wikimedia.or.id/wiki/Wikimedia_Indonesia:Donasi
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Terms - search for corresponding WD-item and WP-article

2016-10-10 Thread Thomas Steiner
Hi Markus,

This is a lighthouse case for my Google Sheets add-on Wikipedia Tools
for Google Spreadsheets (bit.ly/wikipedia-tools-add-on). Here is an
editable sheet 
(https://docs.google.com/spreadsheets/d/1zAZBS09XAYzzL0e6ltTEc943ATvddN6DUgi076xe8qs/edit?usp=sharing)
that you can continue to use, just add new terms to the "Terms"
column, everything else fills automagically. Enjoy!

Cheers,
Tom

-- 
Dr. Thomas Steiner, Employee (http://blog.tomayac.com,
https://twitter.com/tomayac)

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.29 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom
hTtPs://xKcd.cOm/1181/
-END PGP SIGNATURE-

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Terms - search for corresponding WD-item and WP-article

2016-10-10 Thread Magnus Manske
You could try this (example:"Cambridge"):
https://quarry.wmflabs.org/query/13025

Not sure if your terms will work though; "Aerial photograph" does not
exist, for example. You can replace
term_type='label'
with
term_type IN ('label','alias')
to get more hits.

On Mon, Oct 10, 2016 at 7:14 AM Markus Bärlocher <
markus.baerloc...@lau-net.de> wrote:

> Dear Wikidata specialists,
>
> I have a list with 5000 English terms,
> which are translated to several languages (including the corresponding
> WP-language-shortcut).
> Now I look for the corresponding WP-article (as URL).
>
> How can I search the corresponding *WD-item*?
>
> As result I need a table with the columns:
> - Term in English
> - WD-item
>
> How can I build the *WP-URL* for the language-specific WP-article?
> Input:
> - Term in Englisch
> - WD-item
> - WP-language-shortcut
>
> Best regards,
> Markus
>
> Examples:
> Aberration of light
> Abyssal hills
> Aerial photograph
> Age of diurnal inequality
> Aperture of antenna
>
> Languages can be all WP-languages.
>
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata