On Tue, Mar 27, 2018 at 11:55 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov > <a.korot...@postgrespro.ru> wrote: > > However, I see that you are comparing relative change of num_heap_tuples > > before and after vacuum to vacuum_cleanup_index_scale_factor. > > The problem is that if vacuum is very aggressive (and that is likely for > > append only case if this patch is committed), then num_heap_tuples > > changes very slightly every vacuum cycle. So, cleanup would never > > happen and statistics could stall indefinitely. > > Good catch. I think we need to store the number of tuples at when > scanning whole index is performed (bulkdelete or cleanup) as your > patch does so. So it also would need the page-upgrading code. Since > that value would be helpful for other type of indexes too it might be > better to store it into system catalog. > > > > > Another issue is that append only workloads are frequently combined > > with rare periodical bulk deletes of data. Assuming that in this patch > > you don't consider deleted pages to trigger index cleanup, on such > > workload large amounts of deleted pages may reside non-recycled until > > next bulk delete cycle. > > Perhaps using that new value we can trigger the cleanup if the current > number of tuples has been increased or decreased the > vacuum_index_scale_factor% from n_tup_last_cleanup. > Yes, that might work. However, decreased number of tuples could be inaccurate measure of number of deleted pages. Depending on distribution of tuples per pages, same amount of deleted tuples can lead to very different number of deleted pages (in corner cases in can start from zero to the very large amounts). If we want to skip scanning of deleted pages then it would be better store their exact count known by previous bulk delete or cleanup. > > > So, despite I generally like idea of storing epoch of deleted xid in the > > page, I think my version of patch is closer to committable shape. > > > > Agreed, but as I mentioned before, I'm concerned that your version > patch approach will become a restriction of future improvement. If > community decides this feature works only for mostly append-only > workloads I think your version of patch is the best approach for now. At first I would like to note that all valid optimizations presented in the thread optimizes append case. Thus they do give serious benefit on mostly append-only workloads. Since patches were about skipping/reducing cleanup stage which does serious work only in append-only case. It could be possible in future to optimize also cases when only small fraction of table tuples were deleted. Those tuples could be deleted not by full index scan, but by individual index lookups. But I think such optimization is rather harder than everything mentioned in this thread, and it should be considered separately. The thing which could be improved in future is making btree able to skip cleanup when some deleted pages are pending to be recycled. But I'm not sure that this problem deserves a lot of time investment right now. Because I think that page deletion in our btree is unfortunately quite rate situation. In real life our btree is rather going to be bloat with bad fillfactor of pages rather than got much pages deleted. So, I would like to clarify why could my patch block future improvements in this area? For instance, if we would decide to make btree able to skip cleanup when some delete pages are pending for recycle, we can add it in future. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company