On 12/12/2014 7:54 PM, melanie witt wrote:
Hi everybody,
At some point, our db archiving functionality got broken because there was a
change to stop ever deleting instance system metadata [1]. For those
unfamiliar, the 'nova-manage db archive_deleted_rows' is the thing that moves
all soft-deleted (deleted=nonzero) rows to the shadow tables. This is a
periodic cleaning that operators can do to maintain performance (as things can
get sluggish when deleted=nonzero rows accumulate).
The change was made because instance_type data still needed to be read even after
instances had been deleted, because we allow admin to view deleted instances. I saw a bug
[2] and two patches [3][4] which aimed to fix this by changing back to soft-deleting
instance sysmeta when instances are deleted, and instead allow
read_deleted="yes" for the things that need to read instance_type for deleted
instances present in the db.
My question is, is this approach okay? If so, I'd like to see these patches
revive so we can have our db archiving working again. :) I think there's likely
something I'm missing about the approach, so I'm hoping people who know more
about instance sysmeta than I do, can chime in on how/if we can fix this for db
archiving. Thanks.
[1] https://bugs.launchpad.net/nova/+bug/1185190
[2] https://bugs.launchpad.net/nova/+bug/1226049
[3] https://review.openstack.org/#/c/110875/
[4] https://review.openstack.org/#/c/109201/
melanie (melwitt)
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
I'd like to bring this back up since even though [3] and [4] are merged,
nova-manage db archive_deleted_rows still fails to delete rows from some
tables because of foreign key constraint issues, detailed here:
https://bugs.launchpad.net/nova/+bug/1183523/comments/12
I'm wondering why we don't reverse sort the tables using the sqlalchemy
metadata object before processing the tables for delete? That's the
same thing I did in the 267 migration since we needed to process the
tree starting with the leafs and then eventually get back to the
instances table (since most roads lead to the instances table).
Another thing that's really weird is how max_rows is used in this code.
There is cumulative tracking of the max_rows value so if the value you
pass in is too small, you might not actually be removing anything.
I figured max_rows meant up to max_rows from each table, not max_rows
*total* across all tables. By my count, there are 52 tables in the nova
db model. The way I read the code, if I pass in max_rows=10 and say it
processes table A and archives 7 rows, then when it processes table B it
will pass max_rows=(max_rows - rows_archived), which would be 3 for
table B. If we archive 3 rows from table B, rows_archived >= max_rows
and we quit. So to really make this work, you have to pass in something
big for max_rows, like 1000, which seems completely random.
Does this seem odd to anyone else? Given the relationships between
tables, I'd think you'd want to try and delete max_rows for all tables,
so archive 10 instances, 10 block_device_mapping, 10 pci_devices, etc.
I'm also bringing this up now because there is a thread in the operators
list which pointed me to a set of scripts that operators at GoDaddy are
using for archiving deleted rows:
http://lists.openstack.org/pipermail/openstack-operators/2015-October/008392.html
Presumably because the command in nova doesn't work. We should either
make this thing work or just punt and delete it because no one cares.
--
Thanks,
Matt Riedemann
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev