Should add, good on you for backing up prior to running the command. And yeah if you did put in the wrong date or something, you would probably want to insert from the archive table (and that's why it's there). As for gotchas, the one that comes to mind is if say you did this prior to an upgrade, then did not realize that you put in the wrong date until after the upgrade, and the upgrade had migrations in a table that you cleaned. Then you might have to work a little harder to craft your insert statement. But it probably would not be too bad. The migrations can be found in the airflow github repo, and you can also print out the sql for migrations (between any two versions or revisions) in the airflow db upgrade command. See the `--show-sql-only` option.
On Thu, Aug 15, 2024 at 6:22 PM Daniel Standish < daniel.stand...@astronomer.io> wrote: > Hi Ben > > I think the main risk you worry about with this kind of thing is deleting > records you did not intend to. IIRC the command is reasonably well tested > and has been used pretty widely and I am not aware of any "it ate my > database" reports. > > I have however heard reports of it taking a super long time and perhaps > failing to complete, when the db table is quite large (and perhaps pushing > the capabilities of the db instance), likely due to statement timeout. > When the process fails, probably it will be in the delete step, after the > archive table has been created. In this case the negative consequence is > you now have an archive table (of the records that were to be deleted) and > you still have all the records in the main table. So you can drop the > archive, increase statement timeout and try again. Better though to batch > it by starting with an older "delete before timestamp" and incrementally > make that more recent. This will result in smaller batches. There is an > idea floating around to refactor the command to do batching always, but > that's a project without a champion at this time -- if you're interested, > have at it! > > On Thu, Aug 15, 2024 at 4:54 PM Ben Hancock <bhanc...@alm.com.invalid> > wrote: > >> Hi fellow Airflow users, >> >> I recently did some routine maintenance of the metadata database for one >> of our Airflow instances, using the `db clean` command to expunge old >> records. We use Postgres as a back-end and I made sure to create a >> backup beforehand using `pg_dump`. Afterward, I also exported the >> archived tables to CSV using the `db export-archived` command. >> >> Everything went smoothly. My question is: What are the considerations or >> gotchas in the event that a `db clean` operation *does not* go smoothly? >> >> The most straightforward path would seem to be to halt the scheduler and >> webserver, drop the metadata database, then restore from the backup >> using `pg_restore`. I imagine that would work fine. But I wonder whether >> any other users have encountered sharp edges following this process? >> >> Short of doing a full restore, it also seems feasible (albeit more >> laborious) to bring the expunged data back from the archived tables --- >> or archived CSV files --- using something like: >> >> INSERT INTO {target} SELECT * FROM {archived_table}; >> >> ... I imagine that could be scripted to more easily match the archive >> table name to the correct target, if it came down to that. >> >> Hopefully this falls under the category of things that I never actually >> _need_ to know, but any advice or wisdom would be appreciated. >> >> Thanks, >> >> Ben >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@airflow.apache.org >> For additional commands, e-mail: users-h...@airflow.apache.org >> >>