Re: [Wikitech-l] Dump frequency

2016-08-03 Thread Ariel Glenn WMF
Hi Binaris,

We actually have better hardware than 4 years ago [0].  However, we have
more projects with more content than 4 years ago.  Wikidata did not exist
in 2011; today it has almost 1/2 the revisions of the English language
Wikipedia.  The English language Wikipedia itself has increased 51% in size
since early 2012. And the Hungarian language wiki has grown by over 50% as
well.

We should be running two dumps a month going forwards [1], where the second
run each month does not contain full history revisions.  This isn't as good
as 2011 but it's not as bad as once a month either.

The main work however to improve the dumps situation will be in a complete
rearchitecturing of the dumps.  One big change will be to move to a format
and structure that is truly incremental.

Folks interested in these issues are welcome to subscribe to or watch the
Phabricator project for the current dumps [3] and/or the future dumps [4].
There is also a dedicated (low-traffic) list for users and contributors to
the xml dumps [4].

Lastly, your email reminds me that I should update the dumps information at
Meta; the documentation there has fallen a bit behind.  Thanks!
Ariel

[0] For current hardware, see
https://wikitech.wikimedia.org/wiki/Dumps/Snapshot_hosts
[1] https://phabricator.wikimedia.org/T126339
[2] https://phabricator.wikimedia.org/tag/dumps-generation/
[3] https://phabricator.wikimedia.org/tag/dumps-rewrite/
[4] https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

On Wed, Aug 3, 2016 at 8:31 AM, Bináris  wrote:

> Hi folks,
>
> we in Hungarian Wikipedia have been watching new huwiki dumps by bot since
> 2011, so this page history:
>
> https://hu.wikipedia.org/w/index.php?title=Sablon:A_dump_d%C3%A1tuma==250=history
> clearly shows the freqency. Back in 2012 it took 8-10 days to create the
> new dump. Now it takes one month. Are we less develeoped or do we have less
> hardware than 4 years ago?
>
>
>
> --
> Bináris
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Dump frequency

2016-08-02 Thread Bináris
Hi folks,

we in Hungarian Wikipedia have been watching new huwiki dumps by bot since
2011, so this page history:
https://hu.wikipedia.org/w/index.php?title=Sablon:A_dump_d%C3%A1tuma==250=history
clearly shows the freqency. Back in 2012 it took 8-10 days to create the
new dump. Now it takes one month. Are we less develeoped or do we have less
hardware than 4 years ago?



-- 
Bináris
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l