--- El mié, 25/2/09, Robert Ullmann <[email protected]> escribió:

> De: Robert Ullmann <[email protected]>
> Asunto: Re: [Wikitech-l] Dump processes seem to be dead
> Para: "Wikimedia developers" <[email protected]>
> Fecha: miércoles, 25 febrero, 2009 2:09
> you
> yourself suggested page id.
> 
> I suggest the history be partitioned into
> "blocks" by *revision ID*

I've checked some alternatives to slice the huge dump files in chunks with a 
more manageable size. I first thought about dividing the blocks by rev_id, like 
you suggest. Then, I realized that it can pose some problems for parsers 
recovering information, since revisions corresponding to the same page may fall 
in different dump files.

Once you have surpassed the page_id tag, you cannot remember it if the process 
stops due to some error, unless you save breakpoint information to recover it 
later on, when you restart the process again.

Partitioning by page_id, you can maintain all revs of the same page in the same 
block, while you don't disturb algorithms looking for individual revisions.

Yes, the chunks would be slightly bigger, but the difference is not that much 
with either 7zip or bzip2, and you favor simplicity of recovering tools.

Best,

F.

> 
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


      

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to