On 04/03/2018 11:07 AM, Michael Bayer wrote:
The MySQL / MariaDB variants we use nowadays default to
innodb_file_per_table=ON and we also set this flag to ON in installer
tools like TripleO. The reason we like file per table is so that
we don't grow an enormous ibdata file that can't be shrunk without
rebuilding the database. Instead, we have lots of little .ibd
datafiles for each table throughout each openstack database.
But now we have the issue that these files also can benefit from
periodic optimization which can shrink them and also have a beneficial
effect on performance. The OPTIMIZE TABLE statement achieves this,
but as would be expected it itself can lock tables for potentially a
long time. Googling around reveals a lot of controversy, as various
users and publications suggest that OPTIMIZE is never needed and would
have only a negligible effect on performance. However here we seek
to use OPTIMIZE so that we can reclaim disk space on tables that have
lots of DELETE activity, such as keystone "token" and ceilometer
"sample".
Questions for the group:
1. is OPTIMIZE table worthwhile to be run for tables where the
datafile has grown much larger than the number of rows we have in the
table?
Possibly, though it's questionable to use MySQL/InnoDB for storing
transient data that is deleted often like ceilometer samples and
keystone tokens. A much better solution is to use RDBMS partitioning so
you can simply ALTER TABLE .. DROP PARTITION those partitions that are
no longer relevant (and don't even bother DELETEing individual rows) or,
in the case of Ceilometer samples, don't use a traditional RDBMS for
timeseries data at all...
But since that is unfortunately already the case, yes it is probably a
good idea to OPTIMIZE TABLE on those tables.
2. from people's production experience how safe is it to run OPTIMIZE,
e.g. how long is it locking tables, etc.
Is it safe? Yes.
Does it lock the entire table for the duration of the operation? No. It
uses online DDL operations:
https://dev.mysql.com/doc/refman/5.7/en/innodb-file-defragmenting.html
Note that OPTIMIZE TABLE is mapped to ALTER TABLE tbl_name FORCE for
InnoDB tables.
3. is there a heuristic we can use to measure when we might run this
-.e.g my plan is we measure the size in bytes of each row in a table
and then compare that in some ratio to the size of the corresponding
.ibd file, if the .ibd file is N times larger than the logical data
size we run OPTIMIZE ?
I don't believe so, no. Most things I see recommended is to simply run
OPTIMIZE TABLE in a cron job on each table periodically.
4. I'd like to propose this job of scanning table datafile sizes in
ratio to logical data sizes, then running OPTIMIZE, be a utility
script that is delivered via oslo.db, and would run for all innodb
tables within a target MySQL/ MariaDB server generically. That is, I
really *dont* want this to be a script that Keystone, Nova, Ceilometer
etc. are all maintaining delivering themselves. this should be done
as a generic pass on a whole database (noting, again, we are only
running it for very specific InnoDB tables that we observe have a poor
logical/physical size ratio).
I don't believe this should be in oslo.db. This is strictly the purview
of deployment tools and should stay there, IMHO.
5. for Galera this gets more tricky, as we might want to run OPTIMIZE
on individual nodes directly. The script at [1] illustrates how to
run this on individual nodes one at a time.
More succinctly, the Q is:
a. OPTIMIZE, yes or no?
Yes.
b. oslo.db script to run generically, yes or no?
No. Just have Triple-O install galera_innoptimizer and run it in a cron job.
Best,
-jay
thanks for your thoughts!
[1] https://github.com/deimosfr/galera_innoptimizer
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev