[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-16 Thread Addshore
Addshore added a comment.

As far as I can see we are now covering all of the parts of wikidata-todo/stats 
that we wanted!


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher, 
Aklapper, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Addshore
Addshore added a comment.

Okay, I'm struggling to see which part of the todo stats this is covering


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher, 
Aklapper, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Christopher Johnson
Obviously, a main aspect of the data presented in the todo stats is
"referenced statements".  (even though the chart labels there are wrong).
Whether or not this query maps directly to todo is actually not the key
issue.  Clearly, measuring data quality requires that the arity of
statement to reference relationships are quantified.  Right?

This assumption is based on Wikipedia's policy of maintaining a NPOV.  And,
unfortunately, all unreferenced statements contain a "bias" that makes the
data theoretically worthless, even though they may in fact be "correct".
On 8 Dec 2015 1:52 pm, "Addshore" 
wrote:

> Addshore added a comment.
>
> Okay, I'm struggling to see which part of the todo stats this is covering
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T117234
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: Christopher, Addshore
> Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher,
> Aklapper, aude, Mbch331
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-04 Thread Christopher
Christopher added a comment.

@Addshore Some progress was made on this in 
https://phabricator.wikimedia.org/T120166.  The only "practical" way to get the 
statement and reference metrics is to facet the data by property.  It is just 
not possible to run counting queries against the whole database and get any 
reasonable response time.

This means that any large domain or range metric counts should iterate over all 
1800+ properties with separate SPARQL calls and then aggregate the numbers.  We 
can do this for the statement -> reference arity with:

  PREFIX wikibase: 
  PREFIX wd:  
  PREFIX wdt: 
  PREFIX rdfs: 
  PREFIX prov: 
  prefix p: 
  
  SELECT ?nrefs (COUNT(?wds) AS ?count) WHERE {
{
  SELECT ?wds (COUNT(DISTINCT(?ref)) AS ?nrefs)
  WHERE {
  ?item p:$property ?wds .
  OPTIONAL {?wds prov:wasDerivedFrom ?ref } .
  } GROUP BY ?wds
}
  } GROUP BY ?nrefs 
  ORDER BY ?nrefs

Would you do this in PHP?  If you want to handle this, just let me know, 
otherwise we could reuse the bulk sparql scripts that I have already done in R.

In addition to tracking aggregates, it would also be useful to show all 
property counts in a table like I did for here 
http://wdm.wmflabs.org/?t=wikidata_property_usage_count.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-04 Thread Addshore
Addshore added a comment.

You above query is slightly off somewhere and the below is actually correct!

  PREFIX wikibase: 
  PREFIX wd:  
  PREFIX wdt: 
  PREFIX rdfs: 
  PREFIX prov: 
  prefix p: 
  
  SELECT ?nrefs (COUNT(?wds) AS ?count) WHERE {
{
  SELECT ?wds (COUNT(DISTINCT(?ref)) AS ?nrefs)
  WHERE {
  ?item p:P227 ?wds .
  OPTIONAL { ?wds prov:wasDerivedFrom ?ref } .
  } GROUP BY ?wds
}
  } GROUP BY ?nrefs 
  ORDER BY ?nrefs

So this is listing the number of statements that have n number of references?

> In addition to tracking aggregates, it would also be useful to show all 
> property counts in a table like I did for here 
> http://wdm.wmflabs.org/?t=wikidata_property_usage_count.

Table support is in the next version of grafana

I can look at adding this query to our collection next week!


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-03 Thread Addshore
Addshore added a comment.

You do need distinct if you want the correct number there! I was simply 
pointing out that distinct is what makes the query a long one, not actually the 
count.

I think the issue with potential duplication is being addressed and the 
datasets are being rebuilt this week 
https://phabricator.wikimedia.org/T116622#1839670

My goal is to have these number by the end of the year, hence my working around 
any potential problems right now.!


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-02 Thread Christopher
Christopher added a comment.

The only way to get a count of statements with references in the current 
model/format is like this:

  PREFIX wd: 
  PREFIX wdt: 
  PREFIX prov: 
  
  SELECT (count(distinct(?s)) AS ?scount) WHERE {
?s prov:wasDerivedFrom ?wdref .  
  }  

This query is super slow!  In fact, it has crashed Blazegraph because on an 
unlimited query timeout, it uses all of the 8GB allocated heap space.

Since a single statement can have multiple references, just counting 
prov:wasDerivedFrom using estimated cardinality only returns a count of all 
references.

I asked the experts in the mailing list how we can address this reference query 
problem, and no one has responded with anything useful yet.   This is an issue 
that could be handled in the Wikibase RDF serialization with any number of 
different solutions.  In addition to the idea of introducing a null reference 
object, another possibility would be to create a new attribute like 
wikibase:hasReference with a boolean datatype constraint.  I will create a new 
ticket for this issue I guess.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-01 Thread Addshore
Addshore added a comment.

So lots of this is now done using the query service.
We need to assess what has been missed / is missing and doesn't already have a 
ticket on the board


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-26 Thread Christopher
Christopher added a comment.

I am blocked on this by several problems with the data model/ontology.  The 
question of the relationship of the data model and the RDF node definitions is 
a bit complicated, perhaps more so than it should be.  A reference is a special 
type of statement defined by its relationship to other statements.  An 
"unreferenced statement" is undefined in the ontology and in the RDF format.  
All statements **should** in practice have a reference node.  But this is not 
an enforceable constraint in the data model apparently.

I think that when a statement is born, it should also create a reference 
"placeholder" or blank node in the RDF.  With this information in the RDF, 
counting these "bad" statements would be much easier.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-21 Thread Addshore
Addshore added a comment.

In https://phabricator.wikimedia.org/T117234#1820362, @Christopher wrote:

> OK.  So the title "Referenced Statements by Statement Type" is just wrong 
> then.  Rather, it shows **All Statements ** by Type"
>
> | Date |  itemlink |  string |globecoordinate |   time | 
> quantity | somevalue | novalue| Total
>  | 2015-10-19 | 46,177,560 | 20,631,391|  2,363,191 | 3,588,295 | 470,476 
> |   9,630 | 4,436 | 73,244,979 |


Yes, wow, how has no one spotted that before?


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-21 Thread Christopher
Christopher added a comment.

Truthy statement counts per Item can be done like this:

  PREFIX wd: 
  
  SELECT (count(distinct(?o)) AS ?ocount)   WHERE {
   wd:Q7239 ?p ?o
   FILTER(STRSTARTS(STR(?p), "http://www.wikidata.org/prop/direct;)) 
  }  

Labels per Item like this:

  PREFIX wd: 
  
  SELECT (count(distinct(?o)) AS ?ocount)   WHERE {
   wd:Q7239 ?p ?o
   FILTER (REGEX(STR(?p), "http://www.w3.org/2000/01/rdf-schema#label;)) 
  } 

Descriptions per Item:

  PREFIX wd: 
  
  SELECT (count(distinct(?o)) AS ?ocount)   WHERE {
   wd:Q7239 ?p ?o
   FILTER (REGEX(STR(?p), "http://schema.org/description;)) 
  } 

Sitelinks per item:

  PREFIX wd: 
  
  SELECT (count(distinct(?s)) AS ?ocount)   WHERE {
   ?s ?p wd:Q7239
   FILTER (REGEX(STR(?p), "http://schema.org/about;)) 
  } 


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Christopher
Christopher added a comment.

OK.  So the title "Referenced Statements by Statement Type" is just wrong then. 
 Rather, it shows **All Statements ** by Type"

| Date   | itemlink   | string | globecoordinate | time  | quantity 
| somevalue | novalue | Total  |
| 2015-10-19 | 46,177,560 | 20,631,391 | 2,363,191   | 3,588,295 | 470,476  
| 9,630 | 4,436   | 73,244,979 |


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Addshore
Addshore added a comment.

> What is still murky to me, and I think possibly wrong with the todo/stats 
> data, is the "Referenced statements by statement type". Something does not 
> add up there because the total should not be greater than the sum of 
> "Statements referenced to Wikipedia by statement type" and "Statements 
> referenced to other sources by statement type" ?

Well I noticed something odd here yesterday, no value and some value are 
counted as statement types. this is not the case!

So for the wikidata-todo stats the counts of statements by type - the no values 
- the some values should give you the number of statements.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Addshore
Addshore added a comment.

> to find out how many statements do not have references is currently not 
> possible.

We may not actually need this, for example if we know the number of items, and 
the number of referenced statements we must know the number of unreferenced 
statements.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Christopher
Christopher added a comment.

True, a statement is either referenced or "unreferenced".  Getting the number 
of referenced statements (currently 41,735,203) is easy and fast with:

  curl -G https://query.wikidata.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 
'p='

So we use the total of wikibase:Statement objects to represent the total number 
of statements and subtract referenced statements to get "unreferenced 
statements".

What is still murky to me, and I think possibly wrong with the todo/stats data, 
is the "Referenced statements by statement type".  Something does not add up 
there because the total should not be greater than the sum of "Statements 
referenced to Wikipedia by statement type" and "Statements referenced to other 
sources by statement type" ?

For getting counts of objects per item, this means running 19M separate queries 
or is there another way?   Creating a script to do this would be very similar 
to the property distribution method that I have already done I guess.  
Basically ask "list all of the items" and then "lapply(items, count labels, 
statements, links, descriptions)"


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Christopher
Christopher added a comment.

OK.  I may have found an answer to the question of wildcard "Prefix Matching" 
that is necessary in order to query for number of statements in an item.

  PREFIX bds: 
  prefix wikibase: 
  
  SELECT (count(distinct(?s)) AS ?scount) 
  WHERE {
wd:Q20903715 ?p wikibase:Item
?s bds:search "wd:statement*" .
  }

This requires the FullTextSearch 
https://wiki.blazegraph.com/wiki/index.php/FullTextSearch to be enabled (it is 
not on query.wikidata.org).  I will test on labs.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-19 Thread Christopher
Christopher added a comment.

Yes.  It seems I need to disable the 10 minute query timeout set here first: 
https://github.com/wikimedia/wikidata-query-rdf/blob/b3e646284f0b74131bce99a1b7d5fc6bfe675ec1/war/src/config/web.xml#L55

A fat query like this:

  PREFIX wikibase: 
  PREFIX prov: 
  
  SELECT (count(distinct(?wds)) AS ?scount) WHERE {
 ?wds ?p wikibase:Statement .
 OPTIONAL {
   ?wds1  ?o .
   FILTER (?wds1 = ?wds) .
}
FILTER (!bound(?wds1)) .
  } 

to find out how many statements do not have references is currently not 
possible.

There may be a better way to ask for this, but the way that the data is coded 
does not really facilitate type joins.   An important point is that 
wikidata-todo/stats, and possibly the standing perception of the data, assumes 
an iterable hierarchy.  But RDF does not make hierarchy.  So an Item does not 
"contain" statements, and statements do not "contain" references.

The relationship between statements and references is difficult to query by 
type, because a binding triple looks like this:

  wd:statement/Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 
prov:wasDerivedFrom   wdref:39f3ce979f9d84a0ebf09abe1702bf22326695e9

Note that simply counting the frequency of 
http://www.w3.org/ns/prov#wasDerivedFrom and comparing that to the frequency of 
wikibase:Statement would provide a kind of global ratio that is a fast and easy 
alternative to counting individual statements without references.

I am rebuilding wdm-rdf now with the new Munger and no query timeout.

Also, I will load the dump from 17 November, so that the updater has some 
chance to sync.  It had fallen back to 14 days old, and I doubt that it would 
ever have caught up.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-19 Thread Addshore
Addshore added a comment.

Any progress here?


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-09 Thread Christopher
Christopher added a comment.

No.  the blocking task code enables an option to not filter item, statement, 
value and reference rdf:types in the munger.  I decided not to wait for this, 
so that I could get started, but having it in master is very helpful going 
forward.

In order to have these types on live wdqs, would require a complete rebuild of 
their data, which takes a long time.  The wdm-rdf instance is a clone that 
includes these types, and should eventually synch up to production (hopefully 
in another 5 or 6 day ...  24 hours of edits takes approx. 12 hours to process).

It is possible to do estimated cardinality queries on live wdqs for the 
property usage counts and anything else other than these primary types, however.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-09 Thread Addshore
Addshore added a comment.

As the above blocking task has been resolved is it possible to perform these on 
the live query service?


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Addshore
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-07 Thread Christopher
Christopher added a comment.

Update:  All data loaded into Blazegraph (it took over 24 hours).  Sync now 
running and up to 27 October.

Using Fast Range Counts returns counts of content objects instantly.

Examples:
curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 'o=http://wikiba.se/ontology#Item'
Number of Items: 18,733,307
 curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 
'o=http://wikiba.se/ontology#Statement'
Number of Statements: 74,709,111
curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 
'p=http://www.w3.org/ns/prov#wasDerivedFrom'
Number of Predicate wasDerivedFrom: 38,985,221

Trending these kinds of objects should show interesting usage frequency 
patterns.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs