On Wed, Jul 29, 2015 at 3:48 PM, Kent/wp mirror <wpmirror...@gmail.com> wrote:
> When I build a mirror, I would like to compress the <text > ...>plaintext</text> to get: > > old_text: ciphertext > old_flags: utf-8,gzip > > I would like this done for every text revision, so as to save both disk > space... > Maybe https://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database will help. maintenance/storage/compressOld.php will compress older revisions, optionally using gzip, and you can set the parameters to compress every revision. Did you set $wgCompressRevisions in your installation before importing? I'm not sure if that has effect when building a mirror. It feels like it should, and/or importDump.php should have some option to compress all revisions imported; you could file a bug in Phabricator. and communication bandwidth between web server and browser. > If I understand you correctly, that's a separate issue. MediaWiki doesn't send compressed page data to the browser, it sends HTML. However, most browsers send the Accept-Encoding: gzip, deflate HTTP header, and in response most web servers will gzip the HTML of MediaWiki pages and other web content. To verify, load a page from your wiki in your browser and look in your web browser's developer tools' Network tab for the request and response headers; the latter will probably have Content-Encoding: gzip Or you could do something like `curl -H 'Accept-Encoding: gzip, deflate' --dump-header - http://localhost/wiki/Main_Page | less` and see what you get. 2) Problem > > There is little relevant documentation on <https://www.mediawiki.org>. So > I > have run a few experiments. > > exp1) I pipe the plaintext through gzip, escape for MySQL, and build the > mirror. > I wouldn't try to do this yourself. If import with $wgCompressRevisions = true doesn't do what you want and you don't want to run a compressOld.php maintenance step afterwards, I would suggest modifying some PHP somewhere solely during the import to your mirror to encourage MediaWiki it to compress every revision. > Please provide documentation as to how mediawiki handles compressed > old_text. > a) How is plaintext compressed? > From looking at core/includes/Revision.php, if PHP's gzdeflate() exists then MediaWiki will use this to compress the contents of old_text. http://php.net/manual/en/function.gzdeflate.php has some documentation on the function works. > b) Is the ciphertext escaped for MySQL after compression? > No idea, old_text is a mediumblob storing binary data. As I understand it escaping applies only to transfer in and out of the DB. c) How does mediawiki handle old_flags=utf-8,gzip? > d) How are the contents of old_text unescaped and decompressed for > rendering? > e) Where in the mediawiki code should I be looking to understand this > better? > As above, PHP's gzdeflate/gzinflate in Revision::compressRevisionText() and decompressRevisionText() in core/includes/Revision.php Hope this helps. I didn't know anything about this 25 minutes ago :) -- =S Page WMF Tech writer _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l