Re: [Wikitech-l] Question about 2-phase dump

2012-11-25 Thread vitalif
Page history structure isn't quite immutable; revisions may be added or deleted, pages may be renamed, etc etc. Shelling out to an external process means when that process dies due to a dead database connection etc, we can restart it cleanly. Brion, thanks for clarifying it. Also, I want

Re: [Wikitech-l] Question about 2-phase dump

2012-11-25 Thread Platonides
On 11/25/12 22:16, vita...@yourcmc.ru wrote: Also, I want to ask you and other developers about the idea of packing export XML file along with all exported uploads to ZIP archive (instead of putting them to XML in base64) - what do you think about it? We use it in our Mediawiki installations

[Wikitech-l] Question about 2-phase dump

2012-11-21 Thread vitalif
Hello! While working on my improvements to MediaWiki ImportExport, I've discovered a feature that is totally new for me: 2-phase backup dump. I.e. the first pass dumper creates XML file without page texts, and the second pass dumper adds page texts. I have several questions about it - what

Re: [Wikitech-l] Question about 2-phase dump

2012-11-21 Thread Delirium
On 11/21/12 1:54 PM, vita...@yourcmc.ru wrote: While working on my improvements to MediaWiki ImportExport, I've discovered a feature that is totally new for me: 2-phase backup dump. I.e. the first pass dumper creates XML file without page texts, and the second pass dumper adds page texts. I

Re: [Wikitech-l] Question about 2-phase dump

2012-11-21 Thread Brion Vibber
On Wed, Nov 21, 2012 at 4:54 AM, vita...@yourcmc.ru wrote: Hello! While working on my improvements to MediaWiki ImportExport, I've discovered a feature that is totally new for me: 2-phase backup dump. I.e. the first pass dumper creates XML file without page texts, and the second pass dumper

Re: [Wikitech-l] Question about 2-phase dump

2012-11-21 Thread vitalif
Brion Vibber wrote 2012-11-21 23:20: While generating a full dump, we're holding the database connection open for a long, long time. Hours, days, or weeks in the case of English Wikipedia. There's two issues with this: * the DB server needs to maintain a consistent snapshot of data since

Re: [Wikitech-l] Question about 2-phase dump

2012-11-21 Thread Brion Vibber
On Wed, Nov 21, 2012 at 12:31 PM, vita...@yourcmc.ru wrote: Oh, thanks, now I understand! But the revisions are also immutable - isn't it simpler just to select maximum revision ID in the beginning of dump and just discard newer page and image revisions during dump generation? Page history

Re: [Wikitech-l] Question about 2-phase dump

2012-11-21 Thread Platonides
You may also be interested in the xmldatadumps mailing list. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l