On Fri, Jun 13, 2014 at 2:51 PM, Jeroen De Dauw <[email protected]> wrote: > Hey, > >> Are the claims a large part of the network traffic for items you are >> processing? Some client time might be saved by lazy loading the claim >> objects from _content. The claims data is even smaller when using raw >> revisions instead of the API JSON. > > > Is the size of the serialization something that is causing problems?
Not serious problems IMO. e.g. Q60 is 54K via the API, but that is <10K gziped https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q60&languages=en&format=json Fetching only the claims only 'almost' halves the network traffic, but that results in the pywiki API cache not being as efficient if several labels or sitelinks are also fetched. https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q60&languages=en&format=json&props=claims If a bot is only working with wikidata and a single language wiki, this is the 'optimal' query, which is 5.7Kb gzipped. https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q60&languages=en&format=json&languages=en&sitefilter=enwiki&ungroupedlist Prefetching many items also reduces the network activity as it lets gzip work harder. -- John Vandenberg _______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
