Hi there,
Sorry for picking up this old thread, and I'd like to share our own
experience FWIW.
We agree too that PostgreSQL is better for handling large TB of jobs
data nowadays. But instead of writing a new specific accounting storage
plugin (just quick overview of mysql plugin code is enough to be
convinced that it would be painful), we have another approach.
We consider that slurm database is just a temporary application
specific storage backend only used for accounting purpose, and just live
with it. Then, we enable slurmdbd automatic purging (to avoid the
database growing forever). With MariaDB, it goes pretty well so far.
But since we do care about jobs metadata over the lifetime of our
supercomputers, we have developed a software that crawls into slurm
database to fill up incrementaly a PostgreSQL database:
http://edf-hpc.github.io/hpcstats/ [*]
This software is also able to get data from monitoring software, LDAP
directories, and so on. This way, we have all our precious data in
PostgreSQL for reporting and statistics purposes. This has the following
advantage:
- It's a separate DB, then it does not disturb slurmdbd when running
complex queries ;
- It's a mashup of various data sources, so we can extract metrics with
advanced correlations.
- It's generic and not linked to any technology, so we get all the
flexibility to change whevener.
We are happy with this approach so far :)
[*] The software is open-sourced but it may be hard to make it work in
your IS without tough integration effort. It is designed as a generic
framework with plugins but the current plugins are quite specifics to
our needs. Feel free to contact me if you feel brave and would like any
help though :)
Best,
Rémi