Re: [Wikidata] Announcing the release of the Wikidata Query Service

2015-09-07 Thread Stas Malyshev
Hi!

> I am particularly looking forward to the tool that builds a query.. The
> examples as provided proved really important for me to start using these
> tools. I really hope for a similar service for the new service.

That's where community input/contribution is very welcome :)

> The documentation is only about the test environment. It mentions
> Wikigrok. Does the implementation mean that it now runs on the live data...

https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual is
about production environment.

It runs on live data, and the synchronization time is on
https://query.wikidata.org/ (look for "Last updated").

There may be technical issues that make it fall behind from time to time
(I know about them and am working to fix them) so it may be not
up-to-the second synchronization yet. Ideally, that's what we're
striving to reach, and that's what happens most of the time, but not
100% of the time _yet_.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing the release of the Wikidata Query Service

2015-09-07 Thread Gerard Meijssen
Hoi,
Wonderful to learn that we finally have progressed towards a live query
system. Is it your intention that tools will use this service and, do you
hope / anticipate that the tools by Magnus will move towards this new
system ?

I am particularly looking forward to the tool that builds a query.. The
examples as provided proved really important for me to start using these
tools. I really hope for a similar service for the new service.

The documentation is only about the test environment. It mentions Wikigrok.
Does the implementation mean that it now runs on the live data...
Thanks,
 GerardM

On 8 September 2015 at 00:29, Dan Garry  wrote:

> The Discovery Department at the Wikimedia Foundation is pleased to
> announce the release of the Wikidata Query Service
> ! You can find the
> interface for the service at https://query.wikidata.org.
>
> The Wikidata Query Service is designed to let users run queries on the
> data contained in Wikidata. The service uses SPARQL
>  as the query language. You can see
> some example queries in the user manual
> .
>
> Right now, the service is still in beta. This means that our goal
> 
>  is
> to monitor of the service usage and collect feedback about what people
> think should be next. To do that, we've created the Wikidata Query
> Service dashboard  to track usage
> of the service, and we're in the process
>  of setting up a feedback
> mechanism for users of the service. Once we've got monitored the usage of
> the service for a while and got user feedback, we'll decide on what's next
> for development of the service.
>
> If you have any feedback, suggestions, or comments, please do send an
> email to the Discovery Department's public mailing list,
> wikimedia-sea...@lists.wikimedia.org.
>
> Thanks,
> Dan
>
> --
> Dan Garry
> Lead Product Manager, Discovery
> Wikimedia Foundation
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] (Almost) empty items

2015-09-07 Thread Stas Malyshev
Hi!

> My recent tests produced lists of empty or almost empty items, meaning
> that they have no sitelinks, no statements, no label (almost empty), and
> sometimes no descriptions or aliases either (empty). Many of the empty
> ones seem to be redirects now, but not all (e.g. Q18482644).
> 
> Maybe somebody would like to check what is going on in each case. It
> seems like in most cases, a merge has emptied the item and a redirect
> should be placed now.

Some of them are redirects. Some of them are very strange items: with
description of "Wikimedia category" and no labels or links. I suspect it
is leftover from some bug. Check out here: http://tinyurl.com/ot47amh

I'll try to see if there are other patterns there.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Daniel Kinzler
Thanks for investigating, Makrus!

Am 07.09.2015 um 22:54 schrieb Markus Krötzsch:
> On 07.09.2015 22:10, Markus Krötzsch wrote:
>> On 07.09.2015 21:48, Markus Krötzsch wrote:
>> ...
>>>
>>> I'll count how many of each we have. Back in 30min.
>>
>> This does not seem to be the explanation after all. I could only find 33
>> items in total that have no data at all. If I also count items that have
>> nothing but descriptions or aliases, I get 589.
>>
>> Will check for duplicates next.
> 
> Update: there are no duplicate items in the dump.
> 
> Markus
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Announcing the release of the Wikidata Query Service

2015-09-07 Thread Dan Garry
The Discovery Department at the Wikimedia Foundation is pleased to announce
the release of the Wikidata Query Service
! You can find the
interface for the service at https://query.wikidata.org.

The Wikidata Query Service is designed to let users run queries on the data
contained in Wikidata. The service uses SPARQL
 as the query language. You can see
some example queries in the user manual
.

Right now, the service is still in beta. This means that our goal

is
to monitor of the service usage and collect feedback about what people
think should be next. To do that, we've created the Wikidata Query Service
dashboard  to track usage of the
service, and we're in the process
 of setting up a feedback
mechanism for users of the service. Once we've got monitored the usage of
the service for a while and got user feedback, we'll decide on what's next
for development of the service.

If you have any feedback, suggestions, or comments, please do send an email
to the Discovery Department's public mailing list,
wikimedia-sea...@lists.wikimedia.org.

Thanks,
Dan

-- 
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] (Almost) empty items

2015-09-07 Thread Markus Krötzsch

Hi all,

My recent tests produced lists of empty or almost empty items, meaning 
that they have no sitelinks, no statements, no label (almost empty), and 
sometimes no descriptions or aliases either (empty). Many of the empty 
ones seem to be redirects now, but not all (e.g. Q18482644).


Maybe somebody would like to check what is going on in each case. It 
seems like in most cases, a merge has emptied the item and a redirect 
should be placed now.


Markus


All data based on 31 Aug 2015 JSON dump.

Empty: [Q4009003, Q10326820, Q11075784, Q11076861, Q13117977, Q13118077, 
Q17504622, Q9370571, Q9710731, Q11083936, Q11087561, Q13118625, 
Q16749989, Q18630019, Q20515598, Q6909817, Q10780450, Q11074508, 
Q11079165, Q11089726, Q13115994, Q13116978, Q13327246, Q19819704, 
Q19967348, Q5936922, Q9314766, Q12243231, Q12454977, Q13118107, 
Q13118374, Q16943273, Q18482644]


Almost empty: [Q587248, Q1164798, Q3112633, Q4009003, Q4062117, 
Q4292083, Q4318132, Q5668795, Q6192003, Q6237634, Q6237652, Q6237655, 
Q6308010, Q6331429, Q6930680, Q6974238, Q7241365, Q7768807, Q7782293, 
Q7849393, Q8003207, Q8343296, Q8504111, Q8614159, Q8979826, Q8990731, 
Q9004011, Q9043398, Q9063876, Q9103025, Q9109206, Q9117920, Q9164306, 
Q9170348, Q9174405, Q9174424, Q9174555, Q9488312, Q9513173, Q9518437, 
Q9632533, Q9636117, Q9661338, Q9718033, Q9775961, Q9776559, Q9786262, 
Q9786601, Q9789144, Q9791363, Q9796753, Q9803673, Q9804276, Q9805622, 
Q9807208, Q9808694, Q9810197, Q9828378, Q9828528, Q9829439, Q9832003, 
Q9840979, Q9845828, Q9850795, Q9852621, Q9860579, Q9863446, Q9908743, 
Q9913015, Q9920520, Q9924412, Q9929068, Q9930757, Q9932368, Q9936017, 
Q9936060, Q9943410, Q9943530, Q9949358, Q9949788, Q9950572, Q9954180, 
Q9992311, Q10007569, Q10007635, Q10008745, Q10031183, Q1007, 
Q10084373, Q10096437, Q10104020, Q10218343, Q10219749, Q10236783, 
Q10326820, Q10729998, Q11075784, Q11076861, Q11128812, Q11340931, 
Q11737059, Q12101945, Q12205093, Q12404189, Q12443121, Q12621871, 
Q12631965, Q13072852, Q13117977, Q13118077, Q13245750, Q13263997, 
Q13321100, Q13324365, Q13360518, Q14553821, Q14799878, Q15004946, 
Q15730551, Q16159846, Q16282742, Q16602153, Q16702173, Q16869872, 
Q17277701, Q17478425, Q17504622, Q17593537, Q18175021, Q18237112, 
Q18274457, Q18282299, Q18531535, Q18561899, Q18628032, Q18730938, 
Q18821128, Q18998044, Q19331912, Q19481981, Q19482131, Q19550459, 
Q19590380, Q19616299, Q19641075, Q19642002, Q19734545, Q19767134, 
Q19822949, Q19988106, Q20035189, Q20079998, Q20639714, Q20679947, 
Q20760080, Q20892816, Q2210323, Q3113056, Q3533562, Q4101879, Q4210087, 
Q5684730, Q5809971, Q6187338, Q6214071, Q6221847, Q6232454, Q6237623, 
Q6238407, Q6267133, Q6304334, Q6304340, Q6305887, Q6948925, Q6976279, 
Q7002200, Q7715677, Q7768811, Q227, Q7811966, Q8204528, Q8327914, 
Q8420039, Q8545168, Q9170600, Q9170895, Q9174261, Q9174398, Q9174402, 
Q9370571, Q9431627, Q9477845, Q9520148, Q9528273, Q9528284, Q9533963, 
Q9549703, Q9568787, Q9588409, Q9632151, Q9647812, Q9710731, Q9729602, 
Q9732941, Q9773192, Q9781543, Q9783257, Q9788337, Q9789469, Q9791424, 
Q9804106, Q9805916, Q9810115, Q9819720, Q9820281, Q9820461, Q9820835, 
Q9823355, Q9823407, Q9837555, Q9838783, Q9840499, Q9849199, Q9850679, 
Q9851940, Q9855390, Q9917449, Q9922940, Q9924272, Q9926256, Q9928042, 
Q9928254, Q9931521, Q9968727, Q9989440, Q1259, Q10024392, Q10031160, 
Q10042639, Q10097916, Q10098140, Q10106067, Q10106068, Q10106347, 
Q10109679, Q10126604, Q10138517, Q10156971, Q10201244, Q10207296, 
Q10226384, Q10237094, Q11083936, Q11087561, Q11249136, Q11340216, 
Q11723301, Q12024692, Q12173627, Q12405002, Q12595491, Q13091247, 
Q13118625, Q13120053, Q13318236, Q13339703, Q13344714, Q13350354, 
Q15023800, Q15179680, Q15216251, Q15258494, Q15353671, Q15971392, 
Q16742302, Q16749989, Q17272669, Q17312641, Q17313103, Q17329406, 
Q17336067, Q17404038, Q17478412, Q18240140, Q18240752, Q18282081, 
Q18283071, Q18494403, Q18593433, Q18598473, Q18630019, Q18783455, 
Q19014615, Q19119068, Q19352325, Q19370628, Q19590712, Q19635746, 
Q19641080, Q19955305, Q20034343, Q20080001, Q20080814, Q20107489, 
Q20192449, Q20515598, Q20636623, Q20671182, Q20705601, Q20830805, 
Q20870799, Q1384584, Q1720927, Q2251351, Q3066778, Q4070991, Q4858090, 
Q6237616, Q6237619, Q6237676, Q6405134, Q6539112, Q6595216, Q6909817, 
Q6924240, Q7091930, Q7098258, Q8473098, Q8497107, Q8742956, Q8907476, 
Q8961567, Q9118475, Q9170334, Q9311504, Q9428083, Q9432965, Q9476291, 
Q9513989, Q9523286, Q9571617, Q9723093, Q9776024, Q9776350, Q9777657, 
Q939, Q9777974, Q9786515, Q9788293, Q9789416, Q9816934, Q9819231, 
Q9822928, Q9824643, Q9826885, Q9829206, Q9829711, Q9832139, Q9837547, 
Q9839179, Q9850745, Q9853057, Q9853575, Q9873126, Q9884441, Q9892430, 
Q9897034, Q9900422, Q9907796, Q9909494, Q9917083, Q9917432, Q9922959, 
Q9926577, Q9926648, Q9928091, Q9929582, Q9929656, Q9930201, Q9933652, 
Q9941606, Q9941679, Q9942423, Q9942460, Q9951122, Q9952159, Q9967447, 
Q9982655, Q9998938, Q1000790

Re: [Wikidata] Item count

2015-09-07 Thread Markus Krötzsch

On 07.09.2015 22:10, Markus Krötzsch wrote:

On 07.09.2015 21:48, Markus Krötzsch wrote:
...


I'll count how many of each we have. Back in 30min.


This does not seem to be the explanation after all. I could only find 33
items in total that have no data at all. If I also count items that have
nothing but descriptions or aliases, I get 589.

Will check for duplicates next.


Update: there are no duplicate items in the dump.

Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Stas Malyshev
Hi!

> A small fix though: I think you should better use count(?statement)
> rather than count(?ref), right?

Yes, of course, my mistake - I modified it from different query and
forgot to change it.

> I have tried a similar query on the public test endpoint on labs
> earlier, but it timed out for me (I was using a very common reference
> though ;-). For rarer references, live queries are definitely the better
> approach.

Works for me for Q216047, didn't check others though. For a popular
references, labs one may be too slow, indeed. A faster one is coming
"real soon now" :)

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Markus Krötzsch

On 07.09.2015 21:45, Stas Malyshev wrote:

Hi!


I'm wondering if there is a way (SQL, api, tool or otherwise) for
finding out how often a particular source is used on Wikidata.


Something like this probably would work:

http://tinyurl.com/plssk4j

This runs the following query:

prefix prov: 
prefix pr: 
PREFIX wd: 
SELECT (count(?ref) as ?mentions) WHERE {
   ?statement prov:wasDerivedFrom ?ref .
   ?ref pr:P248 wd:Q216047 .
   ?ref pr:P577 ?date .
}

For Q216047 which is "Le Figaro". This counts how many statements
reference Le Figaro and also have dates (drop the last clause if
non-dated ones are fine too).


Yes, that's the best technique if you already know which reference you 
are looking for. And it also supports more general patterns, like the Le 
Figaro one, which is also very interesting.


A small fix though: I think you should better use count(?statement) 
rather than count(?ref), right?


I have tried a similar query on the public test endpoint on labs 
earlier, but it timed out for me (I was using a very common reference 
though ;-). For rarer references, live queries are definitely the better 
approach.


Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Andrew Gray
How many items have no sitelinks at all (regardless of labels,
properties, etc)? That might be a more substantial number...

Andrew.

On 7 September 2015 at 21:10, Markus Krötzsch
 wrote:
> On 07.09.2015 21:48, Markus Krötzsch wrote:
> ...
>>
>>
>> I'll count how many of each we have. Back in 30min.
>
>
> This does not seem to be the explanation after all. I could only find 33
> items in total that have no data at all. If I also count items that have
> nothing but descriptions or aliases, I get 589.
>
> Will check for duplicates next.
>
>
> Markus
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Markus Krötzsch

On 07.09.2015 21:48, Markus Krötzsch wrote:
...


I'll count how many of each we have. Back in 30min.


This does not seem to be the explanation after all. I could only find 33 
items in total that have no data at all. If I also count items that have 
nothing but descriptions or aliases, I get 589.


Will check for duplicates next.

Markus

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Markus Krötzsch

On 07.09.2015 19:37, Daniel Kinzler wrote:

Am 07.09.2015 um 18:05 schrieb Emilio J. Rodríguez-Posada:

Wow, that is a big difference. Almost 4 million.

I think that MediaWiki doesn't count pages without any [[link]]. Is that the 
reason?


No, that only applies to Wikitext.

Here is the relevant code from ItemContent:

public function isCountable( $hasLinks = null ) {
return !$this->isRedirect() && !$this->getItem()->isEmpty();
}

And the relevant code from Item:

public function isEmpty() {
return $this->fingerprint->isEmpty()
&& $this->statements->isEmpty()
&& $this->siteLinks->isEmpty();
}

So all pages that are not empty (have labels or descriptions or aliases or
statements or sitelinks), and are not redirects, should be counted.

Is it possible that the difference of 3,694,285 is mainly redirects? Which dump
were you refering to, Markus?The XML dump contains redirects, and so does the
RDF dump. The JSON dump doesn't... so if you were referring to the JSON dump,
that would imply we have 3.7 million empty (useless) items.

Or, of course, the counting mechanism is just broken. Which is quite possible.


This could of course also be the case for my Java program, but I 
reconfirmed:


$ zgrep -c "{\"typ" 20150831.json.gz
18483096

I am using the JSON dump, so redirects are not possible. I would not 
detect duplicate items, if this would occur.


It seems that there are indeed a number of empty or almost empty items, 
apparently created by merges, e.g.:


https://www.wikidata.org/wiki/Q10031183

Some of them do have some minimal amount of remaining data though, e.g.,

https://www.wikidata.org/wiki/Q6237652

(would this count as "empty"?)

I'll count how many of each we have. Back in 30min.

Markus





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Stas Malyshev
Hi!

> I'm wondering if there is a way (SQL, api, tool or otherwise) for
> finding out how often a particular source is used on Wikidata.

Something like this probably would work:

http://tinyurl.com/plssk4j

This runs the following query:

prefix prov: 
prefix pr: 
PREFIX wd: 
SELECT (count(?ref) as ?mentions) WHERE {
  ?statement prov:wasDerivedFrom ?ref .
  ?ref pr:P248 wd:Q216047 .
  ?ref pr:P577 ?date .
}

For Q216047 which is "Le Figaro". This counts how many statements
reference Le Figaro and also have dates (drop the last clause if
non-dated ones are fine too).

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Stas Malyshev
Hi!

> We have 725691 redirects per SPARQL engine. We do have some sizeable
> number of entities which have no statements (alas!) but I have hard time
> believing we have ~3 mln of those not having even a single label. Unless
> there's some bot gone wild here. The problem is that if entity has no
> sitelinks, no labels and no statements, I don't even think it would be
> in SPARQL engine, so I can't query for it.

OTOH, we have 14,590,233 entities having P31 or P279, which given
historical statistics on non-classified entities, suggests 14,788,811 is
way too low, unless we've got spectacularly good at catching up with
classification (which may have happened, I don't know - did it?)
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Stas Malyshev
Hi!

> Is it possible that the difference of 3,694,285 is mainly redirects? Which 
> dump

We have 725691 redirects per SPARQL engine. We do have some sizeable
number of entities which have no statements (alas!) but I have hard time
believing we have ~3 mln of those not having even a single label. Unless
there's some bot gone wild here. The problem is that if entity has no
sitelinks, no labels and no statements, I don't even think it would be
in SPARQL engine, so I can't query for it.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Addshore
I know that over the past 9 months I have created 500,000 redirects.
Other than that I would guess that maybe 100,000 other redirect have been
created, at most 500,000 more meaning 1,000,000 in total.

Such a big difference does seem rather odd to me...

On 7 September 2015 at 19:37, Daniel Kinzler 
wrote:

> Am 07.09.2015 um 18:05 schrieb Emilio J. Rodríguez-Posada:
> > Wow, that is a big difference. Almost 4 million.
> >
> > I think that MediaWiki doesn't count pages without any [[link]]. Is that
> the reason?
>
> No, that only applies to Wikitext.
>
> Here is the relevant code from ItemContent:
>
> public function isCountable( $hasLinks = null ) {
> return !$this->isRedirect() &&
> !$this->getItem()->isEmpty();
> }
>
> And the relevant code from Item:
>
> public function isEmpty() {
> return $this->fingerprint->isEmpty()
> && $this->statements->isEmpty()
> && $this->siteLinks->isEmpty();
> }
>
> So all pages that are not empty (have labels or descriptions or aliases or
> statements or sitelinks), and are not redirects, should be counted.
>
> Is it possible that the difference of 3,694,285 is mainly redirects? Which
> dump
> were you refering to, Markus? The XML dump contains redirects, and so does
> the
> RDF dump. The JSON dump doesn't... so if you were referring to the JSON
> dump,
> that would imply we have 3.7 million empty (useless) items.
>
> Or, of course, the counting mechanism is just broken. Which is quite
> possible.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Addshore
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Daniel Kinzler
Am 07.09.2015 um 18:05 schrieb Emilio J. Rodríguez-Posada:
> Wow, that is a big difference. Almost 4 million.
> 
> I think that MediaWiki doesn't count pages without any [[link]]. Is that the 
> reason?

No, that only applies to Wikitext.

Here is the relevant code from ItemContent:

public function isCountable( $hasLinks = null ) {
return !$this->isRedirect() && !$this->getItem()->isEmpty();
}

And the relevant code from Item:

public function isEmpty() {
return $this->fingerprint->isEmpty()
&& $this->statements->isEmpty()
&& $this->siteLinks->isEmpty();
}

So all pages that are not empty (have labels or descriptions or aliases or
statements or sitelinks), and are not redirects, should be counted.

Is it possible that the difference of 3,694,285 is mainly redirects? Which dump
were you refering to, Markus? The XML dump contains redirects, and so does the
RDF dump. The JSON dump doesn't... so if you were referring to the JSON dump,
that would imply we have 3.7 million empty (useless) items.

Or, of course, the counting mechanism is just broken. Which is quite possible.

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] weekly summary #174

2015-09-07 Thread Lydia Pintscher
Hey folks :)

Here's what's been happening around Wikidata over the last week:

Events /Press/Blogs


   - Wikimedia Grafana graphs of Wikidata profiling information
   

   - Past: Wikimedia Science Conference (blog posts: Liberating Science
   Daily With, and To, Wikidata
   
,
   Wikidata, Wikipedia, and #wikisci
   
   )
   - Past: Wikidata workshop in Rennes
   


Other Noteworthy Stuff

   - New date for the next Arbitrary access rollout
    has been set:
   16th of September.
   - Support for units on Wikidata is coming on Wednesday (9th of
   September).
   - A new version of Kian
   
   has been released
   - The Wikidata Game  has got a
   new mode: Books without author.
   - A new IEG proposal needs your review and support
   
.
   You can also submit your own until the 29th of September.
   - >8K Fellows of the Royal Society have been added to mix’n’match
   
   - You can have a look at the spiffy new mobile view
   
   - Want to work with the data in Wikidata? There is a new release of the
   Wikidata Toolkit
   
   for you.
   - Your help is needed with the most important constraint violations
   

   .
   - A new noticeboard has been created to help with classification
issues: d:Wikidata:Classification
   noticeboard
   

Did you know?

   - Newest properties
   
:
   Artsy artist , National
   Gallery of Victoria artist identifier
   , CITES Species+ ID
   

Development

   - Worked on search on the mobile site
   - Started working on automatically linking identifiers without a gadget
   and the new datatype for identifiers
   - We have a new Special page to query badges (finally!) After the next
   update it will be at Special:PagesWithBadges on Wikipedia and others
   - Fixed some of the remaining known issues with unit support to make it
   ready for rollout on Wednesday
   - Continued with making it possible to show meaningful edit summaries in
   the watchlist and recent changes on Wikipedia and others
   - Made change dispatching faster (This is what makes Wikipedia and
   others aware of changes happening on Wikidata)

You can see all open tickets related to Wikidata here
.
Monthly Tasks

   - Hack on one of these
   .
   - Help develop the next summary here!
   
   - Contribute to a Showcase item
   
   - Help translate 
   or proofread pages in your own language!
   - Add labels, in your own language(s), for the new properties listed
   above.

Anything to add? Please share! :)

Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Item count

2015-09-07 Thread Emilio J . Rodríguez-Posada
Wow, that is a big difference. Almost 4 million.

I think that MediaWiki doesn't count pages without any [[link]]. Is that
the reason?

2015-09-07 17:39 GMT+02:00 Markus Krötzsch :

> Hi all,
>
> The main page of Wikidata shows an item count that is getting increasingly
> out of synch with reality. The 31 Aug dump contains 18,483,096 items, while
> the front page says that there are 14,788,811 now. I think this is caused
> by how MediaWiki counts "articles" (which is not what we are dealing with).
>
> Or maybe this is intended? But if we prominently publish a number that is
> 25% off the "raw" data, we should at least explain which criteria was used
> to produce it. What counts as a "proper" item on Wikidata?
>
> Cheers,
>
> Markus
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Item count

2015-09-07 Thread Markus Krötzsch

Hi all,

The main page of Wikidata shows an item count that is getting 
increasingly out of synch with reality. The 31 Aug dump contains 
18,483,096 items, while the front page says that there are 14,788,811 
now. I think this is caused by how MediaWiki counts "articles" (which is 
not what we are dealing with).


Or maybe this is intended? But if we prominently publish a number that 
is 25% off the "raw" data, we should at least explain which criteria was 
used to produce it. What counts as a "proper" item on Wikidata?


Cheers,

Markus

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata's 3 birthday is coming up

2015-09-07 Thread Lydia Pintscher
Hey folks :)

Wikidata's birthday is coming up in less than 2 months (29th of
October). We're currently brainstorming some cool ideas for presents.
Last year we had a few very cool ones from different corners of our
community: https://www.wikidata.org/wiki/Wikidata:Second_Birthday  So
if you're interested in preparing a present of some sort for
Wikidata's 3rd birthday this is your advance warning so you have
enough time to prepare ;-)


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-07 Thread Markus Krötzsch

Dear Marco,

Sounds interesting, but the project page still has a lot of gaps. Will 
you notify us again when you are done? It is a bit tricky to endorse a 
proposal that is not finished yet ;-)


Markus

On 04.09.2015 17:01, Marco Fossati wrote:

[Begging pardon if you have already read this in the Wikidata project chat]

Hi everyone,

As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be
validated via references to external, non-wiki, sources.

That's why the primary sources tool is being developed:
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
And that's why I am preparing the StrepHit IEG proposal:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References


StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
a Natural Language Processing pipeline that understands human language,
extracts structured data from raw text and produces Wikidata statements
with reference URLs.

As a demonstration to support the IEG proposal, you can find the
**FBK-strephit-soccer** dataset uploaded to the primary sources tool
backend.
It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and
start playing with the data.

What is the biggest difference that sets StrepHit datasets apart from
the currently uploaded ones?
At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there
in Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no
reference, right?

If you like the idea, please endorse the StrepHit IEG proposal!

Cheers,



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Markus Krötzsch

On 07.09.2015 14:25, Edgard Marx wrote:

Is not an updated version, but

dbtrends.aksw.org 


I am getting an error there. Is the server down maybe?

Markus



best,
Edgard

On Mon, Sep 7, 2015 at 1:25 PM, André Costa mailto:andre.co...@wikimedia.se>> wrote:

Hi all!

I'm wondering if there is a way (SQL, api, tool or otherwise) for
finding out how often a particular source is used on Wikidata.

The background is a collaboration with two GLAMs where we have used
ther open (and CC0) datasets to add and/or source statements on
Wikidata for items on which they can be considered an authority. Now
I figured it would be nice to give them back a number for just how
big the impact was.

While I can find out how many items should be affected I couldn't
find an easy way, short of analysing each of these, for how many
statements were affected.

Any suggestions would be welcome.

Some details: Each reference is a P248 claim + P577 claim (where the
latter may change)

Cheers,
André / Lokal_Profil
André Costa | GLAM-tekniker, Wikimedia Sverige
|andre.co...@wikimedia.se 
|+46 (0)733-964574

Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se 


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Markus Krötzsch
P.S. If you want to do this yourself to play with it, below is the 
relevant information on how I wrote this code (looks a bit clumsy in 
email, but I don't have time now to set up a tutorial page ;-).


Markus


(1) I modified the example program "EntityStatisticsProcessor" that is 
part of Wikidata Toolkit [1].

(2) I added a new field to count references:

final HashMap refStatistics = new HashMap<>();

(3) The example program already downloads and processes all items and 
properties in the most recent dump. You just have to add the counting. 
Essentially, this is the code I run on every ItemDocument and 
PropertyDocument:


public void countReferences(StatementDocument statementDocument) {
  for (StatementGroup sg : statementDocument.getStatementGroups()) {
for (Statement s : sg.getStatements()) {
  for (Reference r : s.getReferences()) {
if (!refStatistics.containsKey(r)) {
  refStatistics.put(r, 1);
} else {
  refStatistics.put(r, refStatistics.get(r) + 1);
}
  }
}
  }
}

(the example already has a method "countStatements" that does these 
iterations, so you can also insert the code there).



(4) To print the output to a file, I sort the hash map by values first. 
Here's some standard code for how to do this:


try (PrintStream out = new PrintStream(
  ExampleHelpers.openExampleFileOuputStream("reference-counts.txt"))) {
List> list =
   new LinkedList>(
   refStatistics.entrySet());

 Collections.sort(list, new Comparator>()
   {
 @Override
 public int compare(Entry o1,
Entry o2) {
   return o2.getValue().compareTo(o1.getValue());
 }
   }
 );

 int singleRefs = 0;
 for (Entry entry : list) {
   if (entry.getValue() > 1) {
 out.println(entry.getValue() + " x " + entry.getKey());
   } else {
 singleRefs++;
   }
 }
 out.println("... and another " + singleRefs
 + " references that occurred just once.");
} catch (IOException e) {
  e.printStackTrace();
}

This code I put into the existing method writeFinalResults() that is 
called at the end.


As I said, this runs in about 30min on my laptop, but downloading the 
dump file first time takes a bit longer.



[1] 
https://github.com/Wikidata/Wikidata-Toolkit/blob/v0.5.0/wdtk-examples/src/main/java/org/wikidata/wdtk/examples/EntityStatisticsProcessor.java


On 07.09.2015 15:49, Markus Krötzsch wrote:

Hi André,

I just made a small counting program with Wikidata Toolkit to count
unique references. Running it on the most recent dump took about 30min.
I uploaded the results:

http://tools.wmflabs.org/wikidata-exports/statistics/20150831/reference-counts-50.txt


The file lists all references that are used at least 50 times, ordered
by number of use. There were 593778 unique references for 35485364
referenced statements (out of 69942556 statements in total).

416480 of the references are used only once. If you want to see all
references used at least twice, this is a slightly longer file:

http://tools.wmflabs.org/wikidata-exports/statistics/20150831/reference-counts.txt.gz


Best regards,

Markus


On 07.09.2015 13:25, André Costa wrote:

Hi all!

I'm wondering if there is a way (SQL, api, tool or otherwise) for
finding out how often a particular source is used on Wikidata.

The background is a collaboration with two GLAMs where we have used ther
open (and CC0) datasets to add and/or source statements on Wikidata for
items on which they can be considered an authority. Now I figured it
would be nice to give them back a number for just how big the impact was.

While I can find out how many items should be affected I couldn't find
an easy way, short of analysing each of these, for how many statements
were affected.

Any suggestions would be welcome.

Some details: Each reference is a P248 claim + P577 claim (where the
latter may change)

Cheers,
André / Lokal_Profil
André Costa | GLAM-tekniker, Wikimedia Sverige |andre.co...@wikimedia.se
 |+46 (0)733-964574

Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se 



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata






___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Markus Krötzsch

Hi André,

I just made a small counting program with Wikidata Toolkit to count 
unique references. Running it on the most recent dump took about 30min. 
I uploaded the results:


http://tools.wmflabs.org/wikidata-exports/statistics/20150831/reference-counts-50.txt

The file lists all references that are used at least 50 times, ordered 
by number of use. There were 593778 unique references for 35485364 
referenced statements (out of 69942556 statements in total).


416480 of the references are used only once. If you want to see all 
references used at least twice, this is a slightly longer file:


http://tools.wmflabs.org/wikidata-exports/statistics/20150831/reference-counts.txt.gz

Best regards,

Markus


On 07.09.2015 13:25, André Costa wrote:

Hi all!

I'm wondering if there is a way (SQL, api, tool or otherwise) for
finding out how often a particular source is used on Wikidata.

The background is a collaboration with two GLAMs where we have used ther
open (and CC0) datasets to add and/or source statements on Wikidata for
items on which they can be considered an authority. Now I figured it
would be nice to give them back a number for just how big the impact was.

While I can find out how many items should be affected I couldn't find
an easy way, short of analysing each of these, for how many statements
were affected.

Any suggestions would be welcome.

Some details: Each reference is a P248 claim + P577 claim (where the
latter may change)

Cheers,
André / Lokal_Profil
André Costa | GLAM-tekniker, Wikimedia Sverige |andre.co...@wikimedia.se
 |+46 (0)733-964574

Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se 



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Edgard Marx
Is not an updated version, but

dbtrends.aksw.org

best,
Edgard

On Mon, Sep 7, 2015 at 1:25 PM, André Costa 
wrote:

> Hi all!
>
> I'm wondering if there is a way (SQL, api, tool or otherwise) for finding
> out how often a particular source is used on Wikidata.
>
> The background is a collaboration with two GLAMs where we have used ther
> open (and CC0) datasets to add and/or source statements on Wikidata for
> items on which they can be considered an authority. Now I figured it would
> be nice to give them back a number for just how big the impact was.
>
> While I can find out how many items should be affected I couldn't find an
> easy way, short of analysing each of these, for how many statements were
> affected.
>
> Any suggestions would be welcome.
>
> Some details: Each reference is a P248 claim + P577 claim (where the
> latter may change)
>
> Cheers,
> André / Lokal_Profil
> André Costa | GLAM-tekniker, Wikimedia Sverige | andre.co...@wikimedia.se
> | +46 (0)733-964574
>
> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
> Läs mer på blimedlem.wikimedia.se
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Source statistics

2015-09-07 Thread André Costa
Hi all!

I'm wondering if there is a way (SQL, api, tool or otherwise) for finding
out how often a particular source is used on Wikidata.

The background is a collaboration with two GLAMs where we have used ther
open (and CC0) datasets to add and/or source statements on Wikidata for
items on which they can be considered an authority. Now I figured it would
be nice to give them back a number for just how big the impact was.

While I can find out how many items should be affected I couldn't find an
easy way, short of analysing each of these, for how many statements were
affected.

Any suggestions would be welcome.

Some details: Each reference is a P248 claim + P577 claim (where the latter
may change)

Cheers,
André / Lokal_Profil
André Costa | GLAM-tekniker, Wikimedia Sverige | andre.co...@wikimedia.se |
+46 (0)733-964574

Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata