Hi Amit, Shi Yu
> > > > b. Executed SQL. > > I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you > did the > > same, or just executed 50 consecutive UPDATEs. If the latter one, there > would be > > lots of old tuples and this might have a bigger impact on sequential > scan. I > > tried this case (which executes 50 consecutive UPDATEs) and also saw > that the > > overhead is smaller than before. > Alright, I'll do similarly, execute truncate/insert before each update. > In the above profile number of calls to index_fetch_heap(), > heapam_index_fetch_tuple() explains the reason for the regression you > are seeing with the index scan. Because the update will generate dead > tuples in the same transaction and those dead tuples won't be removed, > we get those from the index and then need to perform > index_fetch_heap() to find out whether the tuple is dead or not. Now, > for sequence scan also we need to scan those dead tuples but there we > don't need to do back-and-forth between index and heap. Thanks for the insights, I think what you describe makes a lot of sense. > I think we can > once check with more number of tuples (say with 20000, 50000, etc.) > for case-1. > > As we'd expect, this test made the performance regression more visible. I quickly ran case-1 for 50 times with 50000 as Shi Yu does, and got the following results. I'm measuring end-to-end times for running the whole set of commands: seq_scan: 00 hr 24 minutes, 42 seconds index_scan: 01 hr 04 minutes 54 seconds But, I'm still not sure whether we should focus on this regression too much. In the end, what we are talking about is a case (e.g., all or many rows are duplicated) where using an index is not a good idea anyway. So, I doubt users would have such indexes. > The quadratic apply performance > the sequential scans cause, are a much bigger hazard for users than some apply > performance reqression. Quoting Andres' note, I personally think that the regression for this case is not a big concern. > I'd prefer not having an option, because we figure out the cause of the > performance regression (reducing it to be small enough to not care). After > that an option defaulting to using indexes. I don't think an option defaulting > to false makes sense. I think we figured out the cause of the performance regression. I think it is not small enough for some scenarios like the above. But those scenarios seem like synthetic test cases, with not much user impacting implications. Still, I think you are better suited to comment on this. If you consider that this is a significant issue, we could consider the second patch as well such that for this unlikely scenario users could disable index scans. Thanks, Onder