The MySQL / MariaDB variants we use nowadays default to innodb_file_per_table=ON and we also set this flag to ON in installer tools like TripleO. The reason we like file per table is so that we don't grow an enormous ibdata file that can't be shrunk without rebuilding the database. Instead, we have lots of little .ibd datafiles for each table throughout each openstack database.
But now we have the issue that these files also can benefit from periodic optimization which can shrink them and also have a beneficial effect on performance. The OPTIMIZE TABLE statement achieves this, but as would be expected it itself can lock tables for potentially a long time. Googling around reveals a lot of controversy, as various users and publications suggest that OPTIMIZE is never needed and would have only a negligible effect on performance. However here we seek to use OPTIMIZE so that we can reclaim disk space on tables that have lots of DELETE activity, such as keystone "token" and ceilometer "sample". Questions for the group: 1. is OPTIMIZE table worthwhile to be run for tables where the datafile has grown much larger than the number of rows we have in the table? 2. from people's production experience how safe is it to run OPTIMIZE, e.g. how long is it locking tables, etc. 3. is there a heuristic we can use to measure when we might run this -.e.g my plan is we measure the size in bytes of each row in a table and then compare that in some ratio to the size of the corresponding .ibd file, if the .ibd file is N times larger than the logical data size we run OPTIMIZE ? 4. I'd like to propose this job of scanning table datafile sizes in ratio to logical data sizes, then running OPTIMIZE, be a utility script that is delivered via oslo.db, and would run for all innodb tables within a target MySQL/ MariaDB server generically. That is, I really *dont* want this to be a script that Keystone, Nova, Ceilometer etc. are all maintaining delivering themselves. this should be done as a generic pass on a whole database (noting, again, we are only running it for very specific InnoDB tables that we observe have a poor logical/physical size ratio). 5. for Galera this gets more tricky, as we might want to run OPTIMIZE on individual nodes directly. The script at [1] illustrates how to run this on individual nodes one at a time. More succinctly, the Q is: a. OPTIMIZE, yes or no? b. oslo.db script to run generically, yes or no? thanks for your thoughts! [1] https://github.com/deimosfr/galera_innoptimizer __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
