On 06/04, Michael Bayer wrote: > On Wed, Apr 4, 2018 at 5:00 AM, Gorka Eguileor <[email protected]> wrote: > > On 03/04, Jay Pipes wrote: > >> On 04/03/2018 11:07 AM, Michael Bayer wrote: > >> > The MySQL / MariaDB variants we use nowadays default to > >> > innodb_file_per_table=ON and we also set this flag to ON in installer > >> > tools like TripleO. The reason we like file per table is so that > >> > we don't grow an enormous ibdata file that can't be shrunk without > >> > rebuilding the database. Instead, we have lots of little .ibd > >> > datafiles for each table throughout each openstack database. > >> > > >> > But now we have the issue that these files also can benefit from > >> > periodic optimization which can shrink them and also have a beneficial > >> > effect on performance. The OPTIMIZE TABLE statement achieves this, > >> > but as would be expected it itself can lock tables for potentially a > >> > long time. Googling around reveals a lot of controversy, as various > >> > users and publications suggest that OPTIMIZE is never needed and would > >> > have only a negligible effect on performance. However here we seek > >> > to use OPTIMIZE so that we can reclaim disk space on tables that have > >> > lots of DELETE activity, such as keystone "token" and ceilometer > >> > "sample". > >> > > >> > Questions for the group: > >> > > >> > 1. is OPTIMIZE table worthwhile to be run for tables where the > >> > datafile has grown much larger than the number of rows we have in the > >> > table? > >> > >> Possibly, though it's questionable to use MySQL/InnoDB for storing > >> transient > >> data that is deleted often like ceilometer samples and keystone tokens. A > >> much better solution is to use RDBMS partitioning so you can simply ALTER > >> TABLE .. DROP PARTITION those partitions that are no longer relevant (and > >> don't even bother DELETEing individual rows) or, in the case of Ceilometer > >> samples, don't use a traditional RDBMS for timeseries data at all... > >> > >> But since that is unfortunately already the case, yes it is probably a good > >> idea to OPTIMIZE TABLE on those tables. > >> > >> > 2. from people's production experience how safe is it to run OPTIMIZE, > >> > e.g. how long is it locking tables, etc. > >> > >> Is it safe? Yes. > >> > >> Does it lock the entire table for the duration of the operation? No. It > >> uses > >> online DDL operations: > >> > >> https://dev.mysql.com/doc/refman/5.7/en/innodb-file-defragmenting.html > >> > >> Note that OPTIMIZE TABLE is mapped to ALTER TABLE tbl_name FORCE for InnoDB > >> tables. > >> > >> > 3. is there a heuristic we can use to measure when we might run this > >> > -.e.g my plan is we measure the size in bytes of each row in a table > >> > and then compare that in some ratio to the size of the corresponding > >> > .ibd file, if the .ibd file is N times larger than the logical data > >> > size we run OPTIMIZE ? > >> > >> I don't believe so, no. Most things I see recommended is to simply run > >> OPTIMIZE TABLE in a cron job on each table periodically. > >> > >> > 4. I'd like to propose this job of scanning table datafile sizes in > >> > ratio to logical data sizes, then running OPTIMIZE, be a utility > >> > script that is delivered via oslo.db, and would run for all innodb > >> > tables within a target MySQL/ MariaDB server generically. That is, I > >> > really *dont* want this to be a script that Keystone, Nova, Ceilometer > >> > etc. are all maintaining delivering themselves. this should be done > >> > as a generic pass on a whole database (noting, again, we are only > >> > running it for very specific InnoDB tables that we observe have a poor > >> > logical/physical size ratio). > >> > >> I don't believe this should be in oslo.db. This is strictly the purview of > >> deployment tools and should stay there, IMHO. > >> > > > > Hi, > > > > As far as I know most projects do "soft deletes" where we just flag the > > rows as deleted and don't remove them from the DB, so it's only when we > > use a management tool and run the "purge" command that we actually > > remove these rows. > > > > Since running the optimize without purging would be meaningless, I'm > > wondering if we should trigger the OPTIMIZE also within the purging > > code. This way we could avoid innefective runs of the optimize command > > when no purge has happened and even when we do the optimization we could > > skip the ratio calculation altogether for tables where no rows have been > > deleted (the ratio hasn't changed). > > > > the issue is that this OPTIMIZE will block on Galera unless it is run > on a per-individual node basis along with the changing of the > wsrep_OSU_method parameter, this is way out of scope both to be > redundantly hardcoded in multiple openstack projects, as well as > there's no portable way for Keystone and others to get at the > individual Galera node addresses. Putting it in oslo.db would at > least be a place that most of this logic can live but even then it > needs to run for multiple Galera nodes and needs to have > deployment-specific configuration. *unless* we say, the OPTIMIZE > here will short for a purged table, let's just let it block. >
I see... What about a hybrid solution? Use the alter table as mentioned in the comment [1] to not block the table for systems that support it, and going with the RSU mode when it's not supported? [1] https://mariadb.com/kb/en/library/optimize-table/#comment_3191 > > > Ideally the ratio calculation and optimization code would be provided by > > oslo.db to reduce code duplication between projects. > > I was hoping to have this be part of oslo.db but there's disagreement on that > :) > > If this can't be in oslo.db then the biggest issue facing me on this > is building out a new application and getting it packaged since this > feature has no home, unless I can ship it as some kind of script > packaged in tripleo. > > I think the oslo.db home you proposed has the great benefit of making it available in all deployments regardless of the installer, if that's not possible I would go with the TripleO script before creating yet another project that needs to be packaged and maintained. Cheers, Gorka. > > > > > >> > 5. for Galera this gets more tricky, as we might want to run OPTIMIZE > >> > on individual nodes directly. The script at [1] illustrates how to > >> > run this on individual nodes one at a time. > >> > > >> > More succinctly, the Q is: > >> > > >> > a. OPTIMIZE, yes or no? > >> > >> Yes. > >> > >> > b. oslo.db script to run generically, yes or no? > >> > >> No. Just have Triple-O install galera_innoptimizer and run it in a cron > >> job. > >> > >> Best, > >> -jay > >> > >> > thanks for your thoughts! > >> > > >> > > >> > > >> > [1] https://github.com/deimosfr/galera_innoptimizer > >> > > >> > __________________________________________________________________________ > >> > OpenStack Development Mailing List (not for usage questions) > >> > Unsubscribe: > >> > [email protected]?subject:unsubscribe > >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > >> > >> __________________________________________________________________________ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: [email protected]?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: [email protected]?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
