Hello,
Iam trying to figure out a discrepancy between

(i) querying statistics numbers returned by the API (e.g. http://www.grazwiki.at/api.php?action=query&meta=siteinfo&siprop=statistics|namespaces&format=xml) and

(ii) querying the page titles and metadata using the API (e.g. http://www.grazwiki.at/api.php?action=query&list=allpages&aplimit=500&apnamespace=0&format=xml)

(i) reports
12816 pages
6300 articles
46584 edits
6551 images
137 users

(ii) sums up to
17788 pages (sum over all namespaceIDs >=0)
46906 edits (parsed using http://www.grazwiki.at/api.php?action=query&prop=revisions&pageids=100&rvprop=timestamp|user|content|comment&rvlimit=50&format=xml) 72 users (http://www.grazwiki.at/api.php?action=query&list=allusers&aulimit=500&format=xml)

! So I have much more pages when fetching them step by step than reported by the statistics query (only namespaceIDs >= 0, ensuring that I do not count duplicated)
! API lists 72 users, but statistics report 137
! Edits roughly match, but I have other Wikis in the pipeline where the difference is much higher

First I thought that the official statistics leave out some namespaces, but I could not figure combination which would explain the results. Could it be cached results? That would be quite a difference and it would not really explain the user-delta.

Iam extracting the Wikis for research and want to make sure that I have an accurate extraction/representation of the Wikis.

=> I would appreciate any information and ideas that can explain the differences.

Regards,

Rüdiger

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to