Hi, I cannot actually answer this question (it is not easy), but I sometimes get this kind of question related to our main relational database storage (MariaDB). I am preparing some slides for a presentation, and took some numbers and wanted to share those with you (as of June 2019):
* There is approximately 550 TB of used data in the MariaDB-related servers along the Wikimedia infrastructure (mostly compressed in some way- InnoDB, gzip, etc.) * If we do not account for redundancy, 60TB of data is unique (average of 9x redundancy, which seems about right) ** Of that, 24TB is for insert-only highly-compressed content (External Storage) ** The rest is metadata, local content, misc services, disc cache, analytics, cloud dbs, and backups. Please note this doesn't have into account storage in other mediums or technologies (search, maps, analytics, REST, file storage, etc.). Also content compression be very efficient so uncompressed data can be much larger. We are in fact aiming at reducing even more the storage footprint over the next months. If someone is interested on seeing size evolution, you can get the latest up to date metrics on Grafana: https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All -- Jaime Crespo <http://wikimedia.org> _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l