Hi,

I cannot actually answer this question (it is not easy), but I
sometimes get this kind of question related to our main relational
database storage (MariaDB). I am preparing some slides for a
presentation, and took some numbers and wanted to share those with you
(as of June 2019):

* There is approximately 550 TB of used data in the MariaDB-related
servers along the Wikimedia infrastructure (mostly compressed in some
way- InnoDB, gzip, etc.)
* If we do not account for redundancy, 60TB of data is unique (average
of 9x redundancy, which seems about right)
** Of that, 24TB is for insert-only highly-compressed content (External Storage)
** The rest is metadata, local content, misc services, disc cache,
analytics, cloud dbs, and backups.

Please note this doesn't have into account storage in other mediums or
technologies (search, maps, analytics, REST, file storage, etc.). Also
content compression be very efficient so uncompressed data can be much
larger. We are in fact aiming at reducing even more the storage
footprint over the next months.

If someone is interested on seeing size evolution, you can get the
latest up to date metrics on Grafana:
https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All

--
Jaime Crespo
<http://wikimedia.org>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to