On Fri, 2023-09-22 at 10:35 +0200, Koen De Groote wrote:
> On Thu, Sep 21, 2023 at 9:30 PM Laurenz Albe <laurenz.a...@cybertec.at> wrote:
> > On Thu, 2023-09-21 at 17:05 +0200, Koen De Groote wrote:
> > > I'm doing the following query:
> > > select * from my_table where hasbeenchecked = true and hasbeenverified = 
> > > true and insert_timestamp <= '2023-09-01 00:00:00.000' limit 1000;
> > > 
> > > The date is an example, it is the format that is used in the query.
> > > 
> > > The table has 81M rows. Is 50GB in size. And the index is 34MB
> > > 
> > > The index is as follows:
> > > btree (insert_timestamp DESC) WHERE hasbeenchecked = true AND 
> > > hasbeenverified = true
> > > 
> > > I'm seeing a slow query first, then a fast one, and if I move the date, a 
> > > slow query again.
> > > 
> > > What I'm seeing is:
> > > Attempt 1:
> > > Hit: 5171(40MB)
> > > Read: 16571(130MB)
> > > Dirtied: 3940(31MB)
> > > 
> > > Attempt 2:
> > > Hit: 21745 (170MB)
> > > Read: Nothing
> > > Dirtied: Nothing.
> > > 
> > > It's slow once, then consistently fast, and then slow again if I move the 
> > > date around.
> > > And by slow I mean: around 60 seconds. And fast is below 1 second.
> > 
> > That's normal behavior: after the first execution, the data are cached, so 
> > the query
> > becomes much faster.
> > 
> > Dirtying pages happens because the first reader has to set hint bits, which 
> > is an extra
> > chore.  You can avoid that if you VACUUM the table before you query it.
>
> So, if I want to speed up the query, apart from trying to vacuum it 
> beforehand, I suspect
> I've hit the limit of what this query can do?
>
> Because, the table is just going to keep growing. And it's a usually a query 
> that runs one
> time per day, so it's a cold run each time.
>
> Is this just going to get slower and slower and there's nothing that can be 
> done about it?

Essentially yes.

If the table does not have too many columns, or you can be more selective than 
"SELECT *",
you could use an index-only scan with an index like

  CREATE INDEX ON my_table (insert_timestamp)
     INCLUDE (/* all the columns in the SELECT list */)
     WHERE hasbeenchecked AND hasbeenverified;

  VACUUM my_table;

You need to configure autovacuum so that it vacuums the table often enough if 
you want
an efficient index-only scan.

If that is not feasible, you can gain speed by clustering the table.  For that, 
you need
a different index:

  CREATE INDEX ckuster_idx ON my_table (hasbeenchecked, hasbeenverified, 
insert_timestamp);

  CLUSTER my_table USING cluster_idx;  -- attention: rewrites the table

That should speed up the query considerably, because it will have to read way 
fewer pages
from disk.  However, CLUSTER is not without problems.  Look at the 
documentation for the
caveats.

Yours,
Laurenz Albe


Reply via email to