Re: Feature request for adoptive indexes

Tomas Vondra Tue, 02 Nov 2021 07:04:00 -0700



On 11/2/21 13:04, Hayk Manukyan wrote:

Tomas Vondra
 > Are you suggesting those are not the actual best/worst cases and we
 > should use some other indexes? If yes, which ones?

I would say yes.
In my case I am not querying only sequence column.
I have the following cases which I want to optimize.
1. Select * from Some_table where job = <somthing> and nlp = <something>and year = <something> and *scan_id = <something> *2. Select * from Some_table where job = <somthing> and nlp = <something>and year = <something> and *Issue_flag = <something> *3. Select * from Some_table where job = <somthing> and nlp = <something>and year = <something> and *sequence = <something> *Those are queries that my app send to db that is why I said that from*read perspective* our *best case* is 3 separate indexes for*(job, nlp, year, sequence)* AND *(job, nlp, year, Scan_ID)* and *(job,nlp, year, issue_flag)* and any other solution like (job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year )INCLUDE(sequence, Scan_ID, issue_flag) OR just (job, nlp, year) can beconsidered as*worst case *

I already explained why using INCLUDE in this case is the wrong thing todo, it'll harm performance compared to just defining a regular index.

I will remind that in real world scenario I have ~50m rows and about*~5k rows for each (job, nlp, year )*

Well, maybe this is the problem. We have 50M rows, but the three columnshave too many distinct values - (job, nlp, year) defines ~50M groups, sothere's only a single row per group. That'd explain why the two indexesperform almost equally.

So I guess you need to modify the data generator so that the data set ismore like the case you're trying to improve.

From *write perspective* as far as we want to have only one indexour*best case* can be considered any of*(job, nlp, year, sequence, Scan_ID, issue_flag)* OR *(job, nlp, year )INCLUDE(sequence, Scan_ID, issue_flag) *and the*worst case* will be having 3 separate queries like in readperspective(job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp,year, issue_flag)

Maybe. It's true a write with three indexes will require modification tothree leaf pages (on average). With a single index we have to modifyjust one leaf page, depending on where the row gets routed.

But with the proposed "merged" index, the row will have to be insertedinto three smaller trees. If the trees are large enough, they won't fitinto a single leaf page (e.g. the 5000 index tuples is guaranteed toneed many pages, even if you use some smart encoding). So the write willlikely need to modify at least 3 leaf pages, getting much closer tothree separate indexes. At which point you could just use three indexes.

So I think the comparison that we did is not right because we arecomparing different/wrong things.
 > For right results we need to compare this two cases when we are doing
random queries with 1,2,3  and random writes.

I'm not going to spend any more time on tweaking the benchmark, but ifyou tweak it to demonstrate the difference / benefits I'll run it againon my machine etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Feature request for adoptive indexes

Reply via email to