Hi Amit, Shi Yu

> >
> > b. Executed SQL.
> > I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you
> did the
> > same, or just executed 50 consecutive UPDATEs. If the latter one, there
> would be
> > lots of old tuples and this might have a bigger impact on sequential
> scan. I
> > tried this case (which executes 50 consecutive UPDATEs) and also saw
> that the
> > overhead is smaller than before.
>

Alright, I'll do similarly, execute truncate/insert before each update.


> In the above profile number of calls to index_fetch_heap(),
> heapam_index_fetch_tuple() explains the reason for the regression you
> are seeing with the index scan. Because the update will generate dead
> tuples in the same transaction and those dead tuples won't be removed,
> we get those from the index and then need to perform
> index_fetch_heap() to find out whether the tuple is dead or not. Now,
> for sequence scan also we need to scan those dead tuples but there we
> don't need to do back-and-forth between index and heap.


Thanks for the insights, I think what you describe makes a lot of sense.



> I think we can
> once check with more number of tuples (say with 20000, 50000, etc.)
> for case-1.
>
>
As we'd expect, this test made the performance regression more visible.

I quickly ran case-1 for 50 times with 50000 as Shi Yu does, and got
the following results. I'm measuring end-to-end times for running the
whole set of commands:

seq_scan:     00 hr 24 minutes, 42 seconds
index_scan:  01 hr 04 minutes 54 seconds


But, I'm still not sure whether we should focus on this regression too
much. In the end, what we are talking about is a case (e.g., all or many
rows are duplicated) where using an index is not a good idea anyway. So,
I doubt users would have such indexes.


>  The quadratic apply performance
> the sequential scans cause, are a much bigger hazard for users than some
apply
> performance reqression.

Quoting Andres' note, I personally think that the regression for this case
is not a big concern.

> I'd prefer not having an option, because we figure out the cause of the
> performance regression (reducing it to be small enough to not care). After
> that an option defaulting to using indexes. I don't think an option
defaulting
> to false makes sense.

I think we figured out the cause of the performance regression. I think it
is not small
enough for some scenarios like the above. But those scenarios seem like
synthetic
test cases, with not much user impacting implications. Still, I think you
are better suited
to comment on this.

If you consider that this is a significant issue,  we could consider the
second patch as well
such that for this unlikely scenario users could disable index scans.

Thanks,
Onder

Reply via email to