Hi Ben I think the main risk you worry about with this kind of thing is deleting records you did not intend to. IIRC the command is reasonably well tested and has been used pretty widely and I am not aware of any "it ate my database" reports.
I have however heard reports of it taking a super long time and perhaps failing to complete, when the db table is quite large (and perhaps pushing the capabilities of the db instance), likely due to statement timeout. When the process fails, probably it will be in the delete step, after the archive table has been created. In this case the negative consequence is you now have an archive table (of the records that were to be deleted) and you still have all the records in the main table. So you can drop the archive, increase statement timeout and try again. Better though to batch it by starting with an older "delete before timestamp" and incrementally make that more recent. This will result in smaller batches. There is an idea floating around to refactor the command to do batching always, but that's a project without a champion at this time -- if you're interested, have at it! On Thu, Aug 15, 2024 at 4:54 PM Ben Hancock <bhanc...@alm.com.invalid> wrote: > Hi fellow Airflow users, > > I recently did some routine maintenance of the metadata database for one > of our Airflow instances, using the `db clean` command to expunge old > records. We use Postgres as a back-end and I made sure to create a > backup beforehand using `pg_dump`. Afterward, I also exported the > archived tables to CSV using the `db export-archived` command. > > Everything went smoothly. My question is: What are the considerations or > gotchas in the event that a `db clean` operation *does not* go smoothly? > > The most straightforward path would seem to be to halt the scheduler and > webserver, drop the metadata database, then restore from the backup > using `pg_restore`. I imagine that would work fine. But I wonder whether > any other users have encountered sharp edges following this process? > > Short of doing a full restore, it also seems feasible (albeit more > laborious) to bring the expunged data back from the archived tables --- > or archived CSV files --- using something like: > > INSERT INTO {target} SELECT * FROM {archived_table}; > > ... I imagine that could be scripted to more easily match the archive > table name to the correct target, if it came down to that. > > Hopefully this falls under the category of things that I never actually > _need_ to know, but any advice or wisdom would be appreciated. > > Thanks, > > Ben > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@airflow.apache.org > For additional commands, e-mail: users-h...@airflow.apache.org > >