I think this blog post would help us a lot (it suggests in stream compression we use zlib instead of gzip) http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompression/
What do you think? On Sat, Jul 5, 2014 at 6:16 PM, John Mark Vandenberg <[email protected]> wrote: > Likewise, thank you Francis for this evaluation. It is very helpful. > > Are we sure that gzip isnt occurring by default? I started to investigate > this a few weeks ago, and confirmed httplib2 defaults to gzip, but I didnt > verify that pywiki core isnt meddling with that default. > > This is quite important for the performance of WIkidata, as it contains a > lot of repetition in the JSON output and that repetition increases as the > item grows. e.g new label and sitelinks of articles about species are > usually the same as the label / sitelink in a different languages > > http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-June/008886.html > > > On Sat, Jul 5, 2014 at 1:15 AM, Amir Ladsgroup <[email protected]> > wrote: > > Thank you, this is helpful, I want to work on some of them: > > > > Use gzip compression by default > > Make it easy to add a user-agent header and give examples of a good one > in > > the documentation for it (see > > https://meta.wikimedia.org/wiki/User-agent_policy) > > Add Python 3 compatibility (this is in progress for the core branch) > > Package pywikibot for installation from PyPI via pip install > > Make the initial installation process lighter-weight: > > > > Design pwb.py with user experience in mind, particularly valuing feedback > > from new or one-time users during the redesign process > > Make it possible to install into a virtualenv without putting a config > file > > in the home directory > > Make it possible to run import pywikibot without having to log in > > > > Iterating over a list and calling the API for each item is an inefficient > > use of API calls. Efficiency in API usage is an important feature of a > gold > > standard library. If you are interested in gold standard status, consider > > making this more efficient by combining API calls as much as possible > (e.g. > > using generators and combining resultstitle=title1|title2|...). One > option > > may be a constructor method that collects Page requests and enables > larger, > > less frequent API calls. It may be possible to take advantage of the > > database-like structure of the MediaWiki API and help users save > bandwidth. > > > > Process-related > > > > Foster a hospitable attitude on pywikipedia-l, especially to new and/or > > inexperienced users. Consider agreeing on community standards for > > interaction; the Hacker School social rules may be a useful starting > point. > > Create more centralized and updated documentation, including: > > > > Easy-to-find, complete, and intuitive installation instructions, > including > > installing via pip and into virtual environments > > Code samples for common tasks, including queries and edits > > Documentation for people who aren't running bots with existing scripts > > (particularly researchers and beginning/intermediate bot writers) > > Links in method documentation to the corresponding API subpages > > > > Streamline or add more resources to the patch review process to reduce > the > > backlog of unreviewed patches > > > > > > If someone is willing to help out, let's work! > > > > > > > > > > On Fri, Jul 4, 2014 at 2:36 AM, Frances Hocutt <[email protected] > > > > wrote: > >> > >> Hello all, > >> > >> This summer I am working on a project to evaluate and improve the > >> available MediaWiki web API client libraries. As pywikibot met the > >> initial criteria of quality, features, and development status I chose > >> to evaluate it in more depth. There is now a "gold standard"[1] that > >> will be used to find and enable the listing of particularly > >> well-designed and easy-to-use MediaWiki web API client libraries--I've > >> now evaluated several Python libraries against this standard and > >> suggested additions and changes that would help them meet the > >> standard. > >> > >> First, thank you all for contributing to pywikibot and its community of > >> users! > >> > >> My evaluation for pywikibot is posted here.[2] Pywikibot is > >> impressively full-featured (including Wikidata API coverage), and it > >> makes it possible for bot runners and wiki maintainers to quickly get > >> started automating wiki management tasks. Some areas that could be > >> improved include expanded and centralized documentation, efficiency in > >> use of API calls, and making the setup process lighter-weight and > >> easier to use. > >> > >> I will follow up by posting specific suggestions to Bugzilla[3] later > >> this week. If you have comments or questions, please feel free to post > >> on the evaluation talk page, respond to the bugs filed, or make > >> corrections on the evaluation page if I've missed something. > >> > >> -Frances Hocutt > >> MediaWiki intern > >> > >> [1] https://www.mediawiki.org/wiki/API:Client_code/Gold_standard > >> [2] > https://www.mediawiki.org/wiki/API:Client_code/Evaluations/Pywikibot > >> [3] > >> > https://bugzilla.wikimedia.org/buglist.cgi?query_format=specific&product=Pywikibot&list_id=235557 > >> > >> _______________________________________________ > >> Pywikipedia-l mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > > > > > > > > > > -- > > Amir > > > > > > _______________________________________________ > > Pywikipedia-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > > > > > > -- > John Vandenberg > > > -- > John Vandenberg > > _______________________________________________ > Pywikipedia-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l > > -- Amir
_______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
