Kirill Reshke <reshkekir...@gmail.com> wrote: > What is the size of the biggest relation successfully vacuumed > via pg_squeeze? > Looks like in case of big relartion or high insertion load, > replication may lag and never catch up...
Users reports problems rather than successes, so I don't know. 400 GB was reported in [1] but it's possible that the table size for this test was determined based on available disk space. I think that the amount of data changes performed during the "squeezing" matters more than the table size. In [2] one user reported "thounsands of UPSERTs per second", but the amount of data also depends on row size, which he didn't mention. pg_squeeze gives up if it fails to catch up a few times. The first version of my patch does not check this, I'll add the corresponding code in the next version. > However, in general, the 3rd patch is really big, very hard to > comprehend. Please consider splitting this into smaller (and > reviewable) pieces. I'll try to move some preparation steps into separate diffs, but not sure if that will make the main diff much smaller. I prefer self-contained patches, as also explained in [3]. > Also, we obviously need more tests on this. Both tap-test and > regression tests I suppose. Sure. The next version will use the injection points to test if "concurrent data changes" are processed correctly. > One more thing is about pg_squeeze background workers. They act in an > autovacuum-like fashion, aren't they? Maybe we can support this kind > of relation processing in core too? Maybe later. Even just adding the CONCURRENTLY option to CLUSTER and VACUUM FULL requires quite some effort. [1] https://github.com/cybertec-postgresql/pg_squeeze/issues/51 [2] https://github.com/cybertec-postgresql/pg_squeeze/issues/21#issuecomment-514495369 [3] http://peter.eisentraut.org/blog/2024/05/14/when-to-split-patches-for-postgresql -- Antonin Houska Web: https://www.cybertec-postgresql.com