Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Finn Årup Nielsen


The below SPARQL counts 14.

Among them are https://www.wikidata.org/wiki/Q238509 which is "5-HT1A 
receptor human gene" in English and "5-HT₁A-Rezeptor Protein" in German. 
The last editor is ProteinBoxBot. It is coded by by itself. That item 
has a split personality, so it seems that we need to do some cleaning.



PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?item WHERE {
  ?item wdt:P352 ?uniprot ;
wdt:P353 ?genesymbol .
  }


I now see that Teugnhausen has also merged 
https://www.wikidata.org/wiki/Special:Contributions/Teugnhausen


/Finn


On 10/28/2015 06:07 PM, Benjamin Good wrote:

The Gene Wiki team is experiencing a problem that may suggest some areas
for improvement in the general wikidata experience.

When our project was getting started, we had some fairly long public
debates about how we should structure the data we wanted to load [1].
These resulted in a data model that, we think, remains pretty much true
to the semantics of the data, at the cost of distributing information
about closely related things (genes, proteins, orthologs) across
multiple, interlinked items.  Now, as long as these semantic links
between the different item classes are maintained, this is working out
great.  However, we are consistently seeing people merging items that
our model needs to be distinct.  Most commonly, we see people merging
items about genes with items about the protein product of the gene (e.g.
[2]]).  This happens nearly every day - especially on items related to
the more popular Wikipedia articles. (More examples [3])

Merges like this, as well as other semantics-breaking edits, make it
very challenging to build downstream apps (like the wikipedia infobox)
that depend on having certain structures in place.  My question to the
list is how to best protect the semantic models that span multiple
entity types in wikidata?  Related to this, is there an opportunity for
some consistent way of explaining these structures to the community when
they exist?

I guess the immediate solutions are to (1) write another bot that
watches for model-breaking edits and reverts them and (2) to create an
article on wikidata somewhere that succinctly explains the model and
links back to the discussions that went into its creation.

It seems that anyone that works beyond a single entity type is going to
face the same kind of problems, so I'm posting this here in hopes that
generalizable patterns (and perhaps even supporting code) can be
realized by this community.

[1]
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
[2] https://www.wikidata.org/w/index.php?title=Q417782=262745370
[3]
https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] getting RDF out of a specific WDQ query of Wikidata?

2015-10-26 Thread Finn Årup Nielsen

Hi Sandra,


For part of the artworks on Danish museums I used the standard SPARQL 
service (https://query.wikidata.org). I my case I only downloaded data 
for some hundred artworks, - as far as I recall, so I do not know how 
stable the approach is for 30,000 artworks.


You will see some example on my Wikidata user page:

https://www.wikidata.org/wiki/User:Fnielsen

The Python 'sparql' and 'pandas' package allows for easy scripting. The 
example does not use RDF but get the data into a Pandas DataFrame.


For 'or' between all the museums you want I believe you can use the UNION:

SELECT ?work ?workLabel WHERE {
  { ?work wdt:P195 wd:Q1471477 . }
  UNION { ?work wdt:P195 wd:Q2365880 . }
  UNION { ?work wdt:P195 wd:Q1948674 . }
  UNION { ?work wdt:P195 wd:Q1928672 . }
  UNION { ?work wdt:P195 wd:Q1540707 . }
  UNION { ?work wdt:P195 wd:Q1573755 . }
  UNION { ?work wdt:P195 wd:Q2098074 . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en" . }
}

https://query.wikidata.org has a download results for CSV or JSON.

Try this one: https://tinyurl.com/p8ghgnx and press execute and
download CSV.


best
Finn


On 10/26/2015 05:11 PM, Sandra Fauconnier wrote:

Hi all,

For this Flemish museums on Wikidata project
<https://www.wikidata.org/wiki/Wikidata:Flemish_art_collections,_Wikidata_and_Linked_Open_Data>
 (
… we hope to import some 30,000 Flemish artworks in the upcoming months
:-) … ) I and the rest of the project team are trying to find out if and
how we’ll be able to retrieve RDF from Wikidata - one RDF export/file
for all concerned items at once.

So this is not RDF for a single item (like this
<https://www.wikidata.org/wiki/Special:EntityData/Q21012032.rdf>) and
also not a RDF dump of all of Wikidata like mentioned here
<https://www.wikidata.org/wiki/Wikidata:Data_access#Access_to_dumps>. It
would be an RDF file corresponding to the results of this WDQ query
<http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B195:1471477%5D%20OR%20CLAIM%5B195:2365880%5D%20OR%20CLAIM%5B195:1948674%5D%20OR%20CLAIM%5B195:1928672%5D%20OR%20CLAIM%5B195:1540707%5D%20OR%20CLAIM%5B195:1573755%5D%20OR%20CLAIM%5B195:2098074%5D>
 (which
should produce more than 30,000 items in a few months!).

Any tips on how to achieve this? Wikidata Toolkit? But how/what to do?
We are not programmers/developers but we do have some budget to hire
someone to build us something, so pointers to a (Belgian??) developer
who could help would also be very welcome.

The project raises quite a few questions, by the way, so I might come
back with more :-)

Many thanks in advance! Sandra (User:Spinster)


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] getting RDF out of a specific WDQ query of Wikidata?

2015-10-26 Thread Finn Årup Nielsen

Dear Sandra,


On 10/26/2015 05:11 PM, Sandra Fauconnier wrote:

Any tips on how to achieve this? Wikidata Toolkit? But how/what to do?
We are not programmers/developers but we do have some budget to hire
someone to build us something, so pointers to a (Belgian??) developer
who could help would also be very welcome.


I believe it is BotMultichill that has uploaded much of the National 
Gallery of Denmark to Wikidata. 
https://www.wikidata.org/wiki/User:Multichill "heeft het Nederlands als 
moedertaal." Maybe you want to contact him.



best
Finn



--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] provenance tracking for high volume edit sources (was Data model explanation and protection)

2015-11-10 Thread Finn Årup Nielsen

If I understand correctly:

1) Magnus' game already tags the edits with 'Widar'.

2) Magnus' game cannot merge protein and genes if they link to each 
other. With 'ortholog' and 'expressed by' Magnus' merging game does not 
contribute to the problematic merges (Magnus email from previously 
today: "FWIW, checked again. Neither game can merge two items that link 
to each other. So, if the protein is "expressed by" the gene, that pair 
will not even be suggested.").


There is nothing more that Magnus can do, - except making an unmerging 
game. :-)


/Finn


On 11/10/2015 05:54 PM, Benjamin Good wrote:

In another thread, we are discussing the preponderance of problematic
merges of gene/protein items.  One of the hypotheses raised to explain
the volume and nature of these merges (which are often by fairly
inexperienced editors and/or people that seem to only do merges) was
that they were coming from the wikidata game.  It seems to me that
anything like the wikidata game that has the potential to generate a
very large volume of edits - especially from new editors - ought to tag
its contributions so that they can easily be tracked by the system.  It
should be easy to answer the question of whether an edit came from that
game (or any of what I hope to be many of its descendants).  This will
make it possible to debug what could potentially be large swathes of
problems and to make it straightforward to 'reward' game/other
developers with information about the volume of the edits that they have
enabled directly from the system (as opposed to their own tracking data).

Please don't misunderstand me.  I am a big fan of the wikidata game and
actually am pushing for our group to make a bio-specific version of it
that will build on that code.  I see a great potential here - but
because of the potential scale of edits this could quickly generate, we
(the whole wikidata community) need ways to keep an eye on what is going
on.

-Ben


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Twitter search is spammed

2015-11-08 Thread Finn Årup Nielsen



I sometimes search Twitter for the word "Wikidata" to get updates beyond 
this mailing list. The past couple of times I have experienced that a 
considerable part of the posts are from fake accounts issueing more or 
less the same post "The file numbers are also being added to Wikipedia 
biographical articles and are incorporated into Wikidata." I tried to 
report and block these posts but find that it really doesn't help.


Are there any others that have tried to block these accounts and posts? 
I imaging it might help. Or is Twitter going down?


https://twitter.com/search?f=tweets=default=wikidata=typd



/Finn Årup Nielsen

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-11-10 Thread Finn Årup Nielsen
Isn't Magnus Manske's game tagging the edit with "Widar"? I do not see 
that for, for instance, the user Hê de tekhnê makrê.


I must say, being a wannabe bioinformatician, that the gene/protein data 
in Wikidata can be confusing. Take 
https://www.wikidata.org/wiki/Q14907009 which had a merging problem 
(that I have tried to resolve).


Even before merging 
https://www.wikidata.org/w/index.php?title=Q14907009=261061025 
this human gene had three gene products "cyclin-dependent kinase 
inhibitor 2A", "P14ARF" (which to me looked like a gene symbol, I 
changed it to p14ARF), and "Tumor suppressor ARF". One of them is a 
mouse protein. One of the others link to 
http://www.uniprot.org/uniprot/Q8N726 Here the recommended name is 
"Tumor suppressor ARF" while alternative names are "Cyclin-dependent 
kinase inhibitor 2A" and "p14ARF". To me it seems that one gene codes 
two proteins that can be referred to by the same name.


I hope my edits haven't made more damage than good. Several P1889s would 
be nice.


I think, as someone suggested, that adding P1889 and having Wikibase 
merging looking at P1889 would be a solution.



/Finn


On 11/10/2015 12:34 AM, Benjamin Good wrote:

Magnus,

We are seeing more and more of these problematic merges.  See:
http://tinyurl.com/ovutz5x for the current list of (today 61) problems.
Are these coming from the wikidata game?

All of the editors performing the merges seem to be new and the edit
patterns seem to match the game.  I thought the edits were tagged with a
statement about them coming from the game, but I don't see that?  If
they are, could you just take genes and proteins out of the 'potential
merge' queue ?  I'm guessing that their frequently very similar names
are putting many of them into the list.

We are starting to work on a bot to combat this, but would like to stop
the main source of the damage if its possible to detect it.  This is ,
making Wikipedia integration more challenging than it already is...

thanks
-Ben


On Wed, Oct 28, 2015 at 3:41 PM, Magnus Manske
<magnusman...@googlemail.com <mailto:magnusman...@googlemail.com>> wrote:

I fear my games may contribute to both problems (merging two items,
and adding a sitelink to the wrong item). Both are facilitated by
identical names/aliases, and sometimes it's hard to tell that a pair
is meant to be different, especially if you don't know about the
intricate structures of the respective knowledge domain.

An item-specific, but somewhat heavy-handed approach would be to
prevent merging of any two items where at least one has P1889, no
matter what it specifically points to. At least, give a warning that
an item is "merge-protected", and require an additional override for
the merge.

If that is acceptable, it would be easy for me to filter all items
with P1889, from the merge game at least.

On Wed, Oct 28, 2015 at 8:50 PM Peter F. Patel-Schneider
<pfpschnei...@gmail.com <mailto:pfpschnei...@gmail.com>> wrote:

On 10/28/2015 12:08 PM, Tom Morris wrote:
[...]
 > Going back to Ben's original problem, one tool that Freebase
used to help
 > manage the problem of incompatible type merges was a set of
curated sets of
 > incompatible types [5] which was used by the merge tools to
warn users that
 > the merge they were proposing probably wasn't a good idea.
People could
 > ignore the warning in the Freebase implementation, but
Wikidata could make it
 > a hard restriction or just a warning.
 >
 > Tom

I think that this idea is a good one.  The incompatibility
information  could
be added to classes in the form of "this class is disjoint from
that other
class".  Tools would then be able to look for this information
and produce
warnings or even have stronger reactions to proposed merging.

I'm not sure that using P1889 "different from" is going to be
adequate.  What
links would be needed?  Just between a gene and its protein?
That wouldn't
catch merging a gene and a related protein.  Between all genes
and all
proteins?  It seems to me that this is better handled at the
class level.

peter


___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata




______

Re: [Wikidata] property label/alias uniqueness

2015-07-08 Thread Finn Årup Nielsen

On 07/08/2015 03:42 AM, Lydia Pintscher wrote:

On Wed, Jul 8, 2015 at 3:07 AM, Gerard Meijssen
gerard.meijs...@gmail.com wrote:

Hoi,
It is not realistic from a language perspective to ask for labels to be
unique.


For items I totally agree. And that's not happening. But for
properties I'd hope that would be possible. Do you have cases where it
is not?


Danish:

far/fader (father). https://en.wiktionary.org/wiki/fader#Danish

fader (godparent)

Alternative one could use the pattern 'fader (far)'? The 'fader' for 
'far' is seldomly used though.




/Finn

--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] content for wikidata tutorial?

2015-12-04 Thread Finn Årup Nielsen
Back in April 2013 I did a talk on wikis and Wikipedia including 
Wikidata, not really a tutorial. The slides are here:


http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6564/pdf/imm6564.pdf

The slides are in Danish, but perhaps you can get inspired from them? 
The bit of information on Wikidata starts around page 35.


I found Byrial's table 
https://da.wikipedia.org/wiki/Bruger:Byrial/sandkasse gives a good 
overview of what an item is.


/Finn


On 12/03/2015 10:31 PM, Benjamin Good wrote:

The gene wiki people are hosting a tutorial on wikidata in Cambridge, UK
next Monday [1].  In the interest of making the best tutorial in the
least amount of preparation time.. I was wondering if anyone on the list
had content (slides, handouts, cheatsheets) that they had already used
successfully and might want to share?  We are assembling the structure
of the 90 minute session in a google doc [2], feel free to chime in
there !  And of course everything we generate for that will be available
online as soon as it exists.

cheers
-Ben

[1] http://www.swat4ls.org/workshops/cambridge2015/programme/tutorials/
[2]https://docs.google.com/document/d/1dSgm90SbQBpHqEMa17t5zQL0PB2waIKD3LKTPPknmcY/edit#heading=h.m19y528ldds8


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Littar won second prize

2016-01-21 Thread Finn Årup Nielsen



On 01/21/2016 09:07 AM, Federico Leva (Nemo) wrote:

Finn Aarup Nielsen, 21/01/2016 01:50:

...and yes I have already been told I should link the elements on the
map to Wikidata.


And to Wikisource! I see there are links such as
https://bibliotek.dk/da/search/work?search_block_form=Koordinater and
https://bibliotek.dk/da/search/work?search_block_form=Fodreise%20fra%20Holmens%20Canal%20til%20%C3%98stpynten%20af%20Amager%20i%20Aarene%201828%20og%201829
, how are these determined?


These are just determined from the Wikidata label. The authority data 
from DBC.dk has not been added to Wikidata (and it is unclear if it is 
yet worthwhile since there is no URI/URL AFAIK), so deep links have not 
been made.



/Finn



Nemo

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Littar won second prize

2016-01-21 Thread Finn Årup Nielsen



My 'Littar' (literature radar) website/app won second prize in DBC's 
(Danish Library Center) app competition last week at the Data Science Day.


Littar displays narrative locations from literary works on a map, - 
presently only Danish locations. Data comes from Wikidata P840 and 
presently colored according to P136 using the Leaflet marker. Text is 
from the P1683 qualifier under P840:


http://fnielsen.github.io/littar/

The data is obtained with one big SPARQL query and processed through Python.

Most data are entered by me with narrative locations specified as 
detailed as possible, e.g., to streets and cafés.


...and yes I have already been told I should link the elements on the 
map to Wikidata.



Some Danish information and pictures are here: 
http://www.dbc.dk/news/data-science-day-pa-dbc



Finn Årup Nielsen

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Accessing qualifier-specific data

2016-05-19 Thread Finn Årup Nielsen
Sorry, I edited the Wikidata entry which made the previous tinyurl not 
return anything. Here is a new one:


http://tinyurl.com/jbjxqxo

/Finn

On 05/19/2016 04:09 PM, Yetkin Sakal wrote:

Determination method seems to work.

http://tinyurl.com/jygrx22

Thanks for the advice.


On Thursday, May 19, 2016 4:01 PM, Markus Krötzsch
<mar...@semantic-mediawiki.org> wrote:


On 19.05.2016 14:51, Markus Krötzsch wrote:
 > Here is a simple SPARQL query to get population numbers from (any time
 > in) 2015 of (arbitrary types of) entities, limited to 100 results:
 >
 > SELECT ?entity ?entityLabel ?population ?time
 > WHERE
 > {
 >  ?entity p:P1082 ?statement .
 >  ?statement ps:P1082 ?population .
 >  ?statement pq:P585 ?time .
 >  FILTER (
 >?time > "2015-01-01T00:00:00Z"^^xsd:dateTime &&
 >?time < "2015-12-31T23:59:59Z"^^xsd:dateTime
 >  )
 >
 >  SERVICE wikibase:label {
 >  bd:serviceParam wikibase:language "en" .
 >  }
 > }
 > LIMIT 100
 >
 > See http://tinyurl.com/gwzubox
 >
 > You can replace ?entity by something like wd:Q64 to query the population
 > of a specific place: http://tinyurl.com/jnajczu
<http://tinyurl.com/jnajczu>(I changed to 2014 here
 > since there are no 2015 figures for Berlin).
 >
 > You could also add other qualifiers to narrow down statements further,
 > but of course only if Wikidata has such information in the first place.
 > I don't see many qualifiers other than P585 being used with population
 > statements, so this is probably of little use.

Well, there are some useful qualifiers in some cases, e.g.,
determination method. Here are estimated populations between 2000 and
2015: http://tinyurl.com/zp6ymwr

See https://tools.wmflabs.org/sqid/#/view?id=P1082
<https://tools.wmflabs.org/sqid/#/view?id=P1082>for more qualifiers.

Markus



 >
 > Cheers
 >
 > Markus
 >
 > On 19.05.2016 14:35, Yetkin Sakal wrote:
 >> The only way I could find to retrieve it is through theAPI.
 >>
 >>
https://www.wikidata.org/w/api.php?action=wbgetclaims=Q2674064=P1082
 >>
 >>
 >> How to go about picking a population type (urban, rural, etc.) and
 >> accessing its value? I cannot see such a qualifier, so what is the right
 >> way to do it?
 >>
 >>
 >> On Thursday, May 19, 2016 9:50 AM, Gerard Meijssen
 >> <gerard.meijs...@gmail.com <mailto:gerard.meijs...@gmail.com>> wrote:
 >>
 >>
 >> Hoi,
 >> So 2015 is preferred, how do I then get the data for 1984?
 >> Thanks.
 >>  GerardM
 >>
 >> On 18 May 2016 at 21:00, Stas Malyshev <smalys...@wikimedia.org
<mailto:smalys...@wikimedia.org>
 >> <mailto:smalys...@wikimedia.org <mailto:smalys...@wikimedia.org>>>
wrote:
 >>
 >>Hi!
 >>
 >>  > Is there any chance we can access qualifier-specific data on
 >>Wikidata?
 >>  > For instance, we have two population properties on
 >>  > https://www.wikidata.org/wiki/Q2674064
<https://www.wikidata.org/wiki/Q2674064>and want to access the
 >>value of
 >>  > the first one (i.e, the population value for 2015).
 >>
 >>What should happen is that 2015 value should be marked as preferred.
 >>That is regardless of data access. Then probably something like this:
 >>
 >>
https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua#mw.wikibase.entity:getBestStatements
 >>
 >>can be used (not sure if there's better way to achieve the same).
 >>Also not sure what #property would return...
 >>--
 >>Stas Malyshev
 >> smalys...@wikimedia.org <mailto:smalys...@wikimedia.org>
<mailto:smalys...@wikimedia.org <mailto:smalys...@wikimedia.org>>
 >>
 >>___
 >>Wikidata mailing list
 >> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
<mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>>
 >> https://lists.wikimedia.org/mailman/listinfo/wikidata
 >>
 >>
 >>
 >> ___
 >> Wikidata mailing list
 >> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
<mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>>
 >> https://lists.wikimedia.org/mailman/listinfo/wikidata
 >>
 >>
 >>
 >>
 >> ___
 >> Wikidata mailing list
 >> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
 >> https://lists.wikimedia.org/mailman/listinfo/wikidata
 >>
 >


___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata Query Service problems

2016-05-13 Thread Finn Årup Nielsen

Hi,


I am having problems with the Wikidata Query Service. Sometimes I just 
get an black page at other time I get an interface but without a 
responsive edit field and the message "Data last updated: [connecting]" 
on the green status field. The issue has been going on for some days now.


I know there has been recent instabilities, cf. 
https://lists.wikimedia.org/pipermail/wikidata/2016-May/008674.html , 
but https://phabricator.wikimedia.org/T134238 'Query service fails with 
"Too many open files"' should be resolved as of 9 May. A new one 
https://phabricator.wikimedia.org/T133026 "WDQS GUI caching" is AFAIU 
unresolved. It seems my problem could be related to caching, but full 
page reload (Ctrl F5) does not always work.


Interestingly, one of my computers has the problem while another does not.

I look on Grafana and see that one of the servers has somewhat of a high 
load

https://grafana.wikimedia.org/dashboard/db/wikidata-query-service

I see nothing unusual here: https://searchdata.wmflabs.org/wdqs/


Is there anything we mere mortals can do about such issues? Is it a 
Phabricator issue? Perhaps my IP has just be throttled? :)



best
Finn

--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Geospatial search for Wikidata Query Service is up

2016-05-10 Thread Finn Årup Nielsen


Are the distances available somehow? I.e., the distances between the 
wikibase:center and the searched locations?


There has been some attempts (by James Heald?) on approximating the 
distances: http://tinyurl.com/otsp7cc



best regards
Finn


On 05/09/2016 10:23 PM, Stas Malyshev wrote:

Hi!

After a number of difficulties and unexpected setbacks[1] I am happy to
announce that geospatial search for Wikidata Query Service is now
deployed and functional. You can now search for items within certain
radius of a point and within a box defined by two points - more detailed
instructions are in the User Manual[2]. See also query examples[3] such
as "airports within 100km of Berlin": http://tinyurl.com/zxy8o64

There are still a couple of things to complete, namely sorting by
distance (coming soon) and units support (maybe). Overall progress of
the task is tracked by T133566[4].

[1] https://lists.wikimedia.org/pipermail/wikidata/2016-May/008674.html
[2]
https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Geospatial_search
[3]
https://www.mediawiki.org/wiki/Wikibase/Indexing/SPARQL_Query_Examples#Airports_within_100km_of_Berlin
[4] https://phabricator.wikimedia.org/T123565




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Verifiability and living people

2016-07-07 Thread Finn Årup Nielsen

I sometimes feel that I need a "citation needed" button on claims. :)

/Finn


On 07/07/2016 04:12 PM, Magnus Manske wrote:

While the proposal of all statements requiring citation is obviously
overshooting, I believe we all agree that more/better citations improve
Wikidata.
One component here would be a social one, namely that it first becomes
good practice, then the default, to cite statements.
For that, improved technology and new approaches are required.
Suggestions include:
* Open a blank reference box when adding a statement in the default
editor, thus subtly prompting a reference
* Show a "smart field" for reference adding, e.g. just paste a URL, and
it registers it's an URL, suggests a title from the page at the URL,
adds access date, suggests other data that can be inferred from the URL
or the linked page, shows likely other fields (e.g. "author" or such)
for easy fill-in
* Automatically add references for statements via external IDs. I have a
bot that does that to some degree, but it could use productizing
* Tools to "migrate" Wikipedia references to the actual sources. (Again,
I have some, but...)
* "Reference mode", to quickly add references to statements. (I have a
drag'n'drop script, but that breaks on every Wikidata UI update)
* A list of items/statements that are in "priority need" for
referencing. For example, death dates of the recently deceased should be
simple, while they are still in the news.
* Dedicated drives to complete a "set" (e.g. all women chemists), that
is, have all statements references in those items
* Special watchlist for new statements without reference, especially on
otherwise "completely referenced" items

Magnus

On Thu, Jul 7, 2016 at 2:56 PM Brill Lyle <wp.brilll...@gmail.com
<mailto:wp.brilll...@gmail.com>> wrote:

*blanket, not blanked...
*
*

___
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata-l] Who to talk to about integrating WolrCat Library Records in Wikidata

2012-05-25 Thread Finn Årup Nielsen


OCLC-Wikidata sounds very interesting.

I wonder whether the Wikidata people have thought about notability, and 
if/when there is much lower notability criteria the amount of data that 
could potentially be stored in Wikidata.


In Wikipedia we do not have individual articles on scientific articles. 
(I complained about this 
http://en.wikipedia.org/wiki/File:Finn_%C3%85rup_Nielsen_-_Wikipedia_is_not_the_sum_of_all_human_knowledge_-_Wikimania_2010.pdf 
:-)


AFAIR my library have over 100 million records of journal articles. 
These could potentially go into the Wikidata together with information 
extracted from the scientific paper (In my case that would be data 
related to brain activity).



/Finn, http://www.imm.dtu.dk/~fn/

On 25-05-2012 01:41, Klein,Max wrote:

Hello Wikidata Wizards,

Phoebe Ayers from the Board recommended I talk to you. My name is Max
Klein and I am the Wikipedian in Residence for OCLC. OCLC owns
Worldcat.org the world’s largest holder of Library data at 264 million
bibliographic records about books, journals and other library items. We
would really like to partner with you as Wikidata is being built, in
incorporating our data into your project.

*What we can offer:*

·WorldCat.org metadata http://www.worldcat.org/ .

oTypically, for any work we have most of the following: title, authors,
publisher, formats, summaries, editions, subjects, languages, intended
audience, all associated ISBNs, length, and abstract.

·APIs to this data http://oclc.org/developer/

oAnd some other cool APIs like xISBN which returns all the ISBNs of all
the editions of book on the input of any single one.

·Library finding tools

oWhen viewing a record on our site, we show you the closest library
which has that work, and links to reserve it for pick-up.

·The Virtual International Authority File (VIAF) http://viaf.org/, which
is an Authoritative Disambiguation file

oThat means that we have certified data on disambiguation of Authors

·WorldCat Identities, an Analytics site http://www.worldcat.org/identities/

oIt gives you for Author metadata and analytics: Alternative names,
significant dates, publication timelines, genres, roles, related
authors, and tag clouds of associated subjects.

*What’s in it for us:*

·We are a not-for-profit member cooperative. Our mission is “Connecting
people to knowledge through library cooperation.”

·Since I work at the research group, for now this is just a research
project.

oIf at some point this goes live - and you want to - we’d like to
integrate the “find it at a library near me” feature, that means
click-throughs for us.

*The ideas:*

There are a lot of possibilities, and I’d like to hear your input. These
are the first few that I’ve can come up with.

·Making infoboxes for each book or author that contains all their metadata.

oReady to incorporate into all language projects.

·Using authority files to disambiguate or link works to their creators.

oSolving DABs

·Using our analytics (e.g. author timelines) as Wikidata data types to
transclude.

oCurating articles with easy to include dynamic analytics

·Populating or creating works/author pages with their
algorithmically-derived history and details.

oExtremely experimental semantic work.

I’m roaring and ready to get this collaboration going. I know Wikidata
is at an early stage, and we are willing to accommodate you.

Send me any feedback or ideas,

Max Klein

Wikipedia in Residence

kle...@oclc.org

+17074787023




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] A personal note, and a secret

2013-07-12 Thread Finn Årup Nielsen

Dear Denny,


I am sorry to hear you are leaving. You have done a great job with 
Wikidata. Congratulation with Q (a late congratulation) and the new 
position.



best
Finn Årup Nielsen

On 07/11/2013 03:28 PM, Denny Vrandečić wrote:

I am truly and deeply amazed by the Wikidata community.

A bit more than a year ago, I moved to Berlin and assembled a fantastic
team of people to help realize a vision. Today, we have collected
millions of statements, geographical locations, points in time, persons
and their connections, creative works, and species - and every single
minute, hundred of edits are improving and changing this knowledge base
that anyone can edit, that anyone can use for free.

So much more is left to do, and the further we go, the more
opportunities open. More datatypes - links are on the horizon,
quantities will be a major step. I can hardly wait to see Wikidata
answer queries. And there are so many questions unanswered - what does
the community need in order to maintain Wikidata best? Which tools,
reports, special pages are needed? What is the right balance between
automation and flexibility?

Besides Wikipedia, Wikidata can be used in many other places. We just
started the conversations about sister projects, but also external
projects are expected to become smarter thanks to Wikidata. I expect
tools and libraries and patterns for these type of uses will emerge in
the next few months, and applications will become more intelligent and
act more informed, powered by Wikidata.

A project like Wikidata needs in its early days a strong, sometimes
stubborn leader in order to accelerate its growth. But at some point a
project gathers sufficient momentum, and the community moves faster than
any single leader could lead, and suddenly they might become
bottlenecks, and instead of accelerating the project the might be
stalling it.

Wikidata has reached the point where it is time for me to step down. The
Wikidata development team in Berlin will, in the upcoming weeks and
months, set up processes that allow the community, that I learned to
trust even more during that year, to take over the reigns. I will stay
with the team until the end of September, and then become again what I
have been for the last decade - a normal and proud member of the
Wikimedia communities.

I also would like to use this chance to reveal a secret. Wikidata items
are identified by a Q, followed by a number, Wikidata properties by a P,
followed by a number. Whereas it is obvious that the P stands for
property, some of you have asked - why Q? My answer was, that Q not only
looks cool, but also makes for great identifiers, and hopefully a
certain set of people will some day associate a number like Q9036 with
something they can look up in Wikidata. But the true reason is that Q is
the first letter of the name of the woman I love. We married last year,
among all that Wikidata craziness, and I am thankful to her for the
patience she had while I was discussing whether to show wiki identifiers
or language keys, what bugs to prioritize when, and which calendar
systems were used in Sweden.

I will continue to be a community member with Wikidata. My new day job,
though, will be at Google, and from there I hope to continue to
effectively further our goals towards a world where everyone has access
to the sum of all knowledge.

Sincerely,
Denny Vrandečić

--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] test.wikidata.org offers URL datatype now

2013-09-06 Thread Finn Årup Nielsen


It is apparently not possible to enter a URL with spaces and have it 
automatically escaped, e.g.,


http://en.wikipedia.org/wiki/Technical University of Denmark

should be entered as:

http://en.wikipedia.org/wiki/Technical%20University%20of%20Denmark

On the other hand http://en.wikipedia.org/wiki/København works ok for 
http://en.wikipedia.org/wiki/K%C3%B8benhavn


Also http://københavn.dk works.

see https://test.wikidata.org/wiki/Q132 and 
https://test.wikidata.org/wiki/Q133



cheers
Finn Årup Nielsen


On 09/06/2013 12:10 PM, Denny Vrandečić wrote:

Hello all,

in preparation of next week's deployment to Wikidata.org,
test.wikidata.org http://test.wikidata.org now has the new datatype
URL deployed.

If you have the time, we would appreciate if you tested it and let us
know about errors and problems.

The URL datatype should be a big step in allowing to introduce better
sourcing and reliability of the content of Wikidata.

Cheers,
Denny


--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Super Lachaise, a mobile app based on Wikidata

2014-10-29 Thread Finn Årup Nielsen


Dear Pierre-Yves,

On 10/28/2014 05:41 PM, Pierre-Yves Beaudouin wrote:


I don't know because I'm not the developer of the app and my knowledge
is limited in this area. For many years now, I am collecting data
(information, photo, coordinates) about the cemetery. I've publish
everything on Commons, Wikidata and OSM, so developers can do something
smart with that ;)


How do you get the geocoordinates for the individual graves? Looing at
http://www.superlachaise.fr/ I see Guillaume Apollinaire. His Wikidata 
https://www.wikidata.org/wiki/Q133855 has no geodata. The cemetery link 
and Findagrave seem neither to have geodata.



- Finn Årup Nielsen

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Example of Wikipedia infobox generated from Wikidata?

2015-01-18 Thread Finn Årup Nielsen

Dear James,

Den 15-01-2015 kl. 01:29 skrev ja...@j1w.xyz:
 I read that some Wikipedia infoboxes are generated from Wikidata.  Can
 someone please point me to an example of this?

It is not so wide-spread. In the Danish Wikipedia we have {{Infoboks
virksomhed}} which can set up part of the infobox fields, see, e.g.,

https://da.wikipedia.org/wiki/Tiger_%28butiksk%C3%A6de%29

https://da.wikipedia.org/wiki/Skabelon:Infoboks_virksomhed

It is only five of infoboks fields that can be set up this way.


best regards
Finn Årup Nielsen

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] OpenStreetMap + Wikidata for light houses

2015-03-11 Thread Finn Årup Nielsen

On 03/11/2015 12:58 PM, Daniel Kinzler wrote:

Am 11.03.2015 um 12:39 schrieb Jo Walsh:

http://wiki.openstreetmap.org/wiki/Wikidata#Importing_data
*Copying data to Wikidata from OSM* (or even from other Wikimedia projects) *is
not allowed* because Wikidata uses the public-domain style Creative Commons CC0
license which does not contain any attribution or share-alike provisions.
Conversely, data may be copied from Wikidata without restriction.

(OSM is licensed under the Open Database License)


This applies to any copyrightable material. Facts (this is a lighthouse) are
not copyrightable.


Facts are not copyrightable, but in EU a lot of facts are when they are 
assembled in a database due to sui generis database right.


I suppose that OSM falls under UK law and thereby EU sui generis 
database right, so systematic extraction of OSM data into Wikidata will 
constitute a violation in EU of the Open Database License terms.


https://en.wikipedia.org/wiki/Sui_generis_database_right



/Finn

--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l