On Wed, Jul 29, 2015 at 3:48 PM, Kent/wp mirror <wpmirror...@gmail.com>
wrote:

> When I build a mirror, I would like to compress the <text
> ...>plaintext</text> to get:
>
> old_text: ciphertext
> old_flags: utf-8,gzip
>
> I would like this done for every text revision, so as to save both disk
> space...
>

Maybe https://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database
will help. maintenance/storage/compressOld.php will compress older
revisions, optionally using gzip, and you can set the parameters to
compress every revision.

Did you set $wgCompressRevisions in your installation before importing? I'm
not sure if that has effect when building a mirror. It feels like it
should, and/or importDump.php should have some option to compress all
revisions imported; you could file a bug in Phabricator.

and communication bandwidth between web server and browser.
>

If I understand you correctly, that's a separate issue. MediaWiki doesn't
send compressed page data to the browser, it sends HTML. However, most
browsers send the
  Accept-Encoding: gzip, deflate
HTTP header, and in response most web servers will gzip the HTML of
MediaWiki pages and other web content. To verify, load a page from your
wiki in your browser and look in your web browser's developer tools'
Network tab for the request and response headers; the latter will probably
have
  Content-Encoding: gzip
Or you could do something like `curl -H 'Accept-Encoding: gzip, deflate'
--dump-header - http://localhost/wiki/Main_Page | less` and see what you
get.

2) Problem
>
> There is little relevant documentation on <https://www.mediawiki.org>. So
> I
> have run a few experiments.
>
> exp1) I pipe the plaintext through gzip, escape for MySQL, and build the
> mirror.
>

I wouldn't try to do this yourself. If import with $wgCompressRevisions =
true doesn't do what you want and you don't want to run a compressOld.php
maintenance step afterwards, I would suggest modifying some PHP somewhere
solely during the import to your mirror to encourage MediaWiki it to
compress every revision.


> Please provide documentation as to how mediawiki handles compressed
> old_text.
> a) How is plaintext compressed?
>

From looking at core/includes/Revision.php, if PHP's gzdeflate() exists
then MediaWiki will use this to compress the contents of old_text.
http://php.net/manual/en/function.gzdeflate.php has some documentation on
the function works.


> b) Is the ciphertext escaped for MySQL after compression?
>
No idea, old_text is a mediumblob storing binary data. As I understand it
escaping applies only to transfer in and out of the DB.

c) How does mediawiki handle old_flags=utf-8,gzip?
> d) How are the contents of old_text unescaped and decompressed for
> rendering?
> e) Where in the mediawiki code should I be looking to understand this
> better?
>

As above, PHP's gzdeflate/gzinflate in Revision::compressRevisionText() and
decompressRevisionText() in core/includes/Revision.php

Hope this helps. I didn't know anything about this 25 minutes ago :)

-- 
=S Page  WMF Tech writer
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to