Re: [Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Christopher Johnson
Since P143 is primarily a "reference type" property, it should be used when
the reference node is the subject (with a few exceptions apparently). The
query only evaluates the arity of the reference nodes as objects.  So, the
results for P143 are expected.
On 8 Dec 2015 1:09 pm, "Addshore" 
wrote:

> Addshore added a comment.
>
> I am still confused, Running this for
> https://phabricator.wikimedia.org/P143 gives the following:
>
>   nrefs count
>   0 920
>   1 8
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T117234
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: Christopher, Addshore
> Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper,
> Wikidata-bugs, aude, Mbch331
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Addshore
Addshore added a comment.

I am still confused, Running this for https://phabricator.wikimedia.org/P143 
gives the following:

  nrefs count
  0 920
  1 8


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-04 Thread Christopher
Christopher added a comment.

I think that you may have missed the point.  I added the $property variable in 
the above query to indicate that this has to be run for **every** property.  
p:https://phabricator.wikimedia.org/P227 is a random example.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-03 Thread Christopher
Christopher added a comment.

So basically a clever adaptation as to what I suggested in 
https://phabricator.wikimedia.org/T119775 to get statements referenced to the 
Wikipedias.  It works, but seems a very hacky approach around the core problem 
of not having a way to ask how many references a statement has.

So, just so I am clear on this, a statement to reference triple is always 
unique in the dataset?  I was under the assumption that a singular reference 
statement could potentially be duplicated with different hashes, which is why 
distinct would need to be enforced on the subject.In theory, there should 
also be metadata on the reference that identifies it as "the latest" version, 
and previous revisions should not simply be replaced.  This is another issue, I 
guess.

Imho, there are clear problems with the reference implementation that should be 
addressed and not just worked around which is why I created 
https://phabricator.wikimedia.org/T120166 to start.   Is the objective here 
just to produce some numbers or to improve the quality of the data?


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-03 Thread Addshore
Addshore added a comment.

So a rough version of my approach can be seen at 
https://github.com/wikimedia/analytics-limn-wikidata-data/blob/master/graphite/sparql/references.php
Firstly get all properties that should be used as references

  SELECT ?s WHERE {?s wdt:P31/wdt:P279* wd:Q18608359}

And then query the counts for each

$query .= "SELECT (count(?s) AS ?scount) WHERE {";
$query .= "?wdref 
 ?x .";
$query .= "?s prov:wasDerivedFrom ?wdref";
$query .= "}";

Of course this runs into the issue of a single statement can be returned in 
multiple counts.

So instead of this I will simply query for the statements that are referenced 
by the property (a query which completes but on the public interface times out 
when sending back the result) and then do some post processing to figure out 
the number that we actually want.

Me doing this is just blocked on https://phabricator.wikimedia.org/T120010 now.

Also when digging into all of these queries it turns out adding distinct is 
what actually causes them to run over the execution time limit.
If you remove the distinct from your query it will actually complete rather 
quickly.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-02 Thread Christopher
Christopher added a blocking task: T120166: Semantically define arity of 
statement -> reference relations.

TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Addshore
Addshore added a comment.

Also, for reference I have just created 
https://phabricator.wikimedia.org/T119182 which covers the notes for the 
general topic of data completeness


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Addshore
Addshore added a blocked task: T119182: Track Wikidata data completeness .

TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-06 Thread Lydia_Pintscher
Lydia_Pintscher added a project: Wikidata.
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Lydia_Pintscher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs