Re: autovac issue with large number of tables

Masahiko Sawada Tue, 01 Dec 2020 19:54:43 -0800

On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao <masao.fu...@oss.nttdata.com> 
> wrote:
> >
> >
> >
> > On 2020/12/01 16:23, Masahiko Sawada wrote:
> > > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
> > > <kasahara.tatsuh...@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao 
> > >> <masao.fu...@oss.nttdata.com> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 2020/11/30 10:43, Masahiko Sawada wrote:
> > >>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> > >>>> <kasahara.tatsuh...@gmail.com> wrote:
> > >>>>>
> > >>>>> Hi, Thanks for you comments.
> > >>>>>
> > >>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao 
> > >>>>> <masao.fu...@oss.nttdata.com> wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao 
> > >>>>>>> <masao.fu...@oss.nttdata.com> wrote:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> > >>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada 
> > >>>>>>>>> <sawada.m...@gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> > >>>>>>>>>> <kasahara.tatsuh...@gmail.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada 
> > >>>>>>>>>>> <sawada.m...@gmail.com> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> > >>>>>>>>>>>> <kasahara.tatsuh...@gmail.com> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> > >>>>>>>>>>>>> <kasahara.tatsuh...@gmail.com> wrote:
> > >>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two 
> > >>>>>>>>>>>>>>> probes of the stats
> > >>>>>>>>>>>>>>> data.  First probe the existing stats data, and if it shows 
> > >>>>>>>>>>>>>>> the table to
> > >>>>>>>>>>>>>>> be already vacuumed, return immediately.  If not, *then* 
> > >>>>>>>>>>>>>>> force a stats
> > >>>>>>>>>>>>>>> re-read, and check a second time.
> > >>>>>>>>>>>>>> Does the above mean that the second and subsequent 
> > >>>>>>>>>>>>>> table_recheck_autovac()
> > >>>>>>>>>>>>>> will be improved to first check using the previous refreshed 
> > >>>>>>>>>>>>>> statistics?
> > >>>>>>>>>>>>>> I think that certainly works.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was 
> > >>>>>>>>>>>>> able to reproduce
> > >>>>>>>>>>>>> what was probably a very similar problem.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> This problem seems to be more likely to occur in cases where 
> > >>>>>>>>>>>>> you have
> > >>>>>>>>>>>>> a large number of tables,
> > >>>>>>>>>>>>> i.e., a large amount of stats, and many small tables need 
> > >>>>>>>>>>>>> VACUUM at
> > >>>>>>>>>>>>> the same time.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC.
> > >>>>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac 
> > >>>>>>>>>>>>> function to use
> > >>>>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has 
> > >>>>>>>>>>>>> already been done
> > >>>>>>>>>>>>> by another worker on the check after the stats have been 
> > >>>>>>>>>>>>> updated.
> > >>>>>>>>>>>>> If the tables continue to require VACUUM after the refresh, 
> > >>>>>>>>>>>>> then a refresh
> > >>>>>>>>>>>>> will be required instead of using the existing statistics.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch.
> > >>>>>>>>>>>>> The tests were conducted in two cases.
> > >>>>>>>>>>>>> (I changed few configurations. see attached scripts)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 1. Normal VACUUM case
> > >>>>>>>>>>>>>       - SET autovacuum = off
> > >>>>>>>>>>>>>       - CREATE tables with 100 rows
> > >>>>>>>>>>>>>       - DELETE 90 rows for each tables
> > >>>>>>>>>>>>>       - SET autovacuum = on and restart PostgreSQL
> > >>>>>>>>>>>>>       - Measure the time it takes for all tables to be 
> > >>>>>>>>>>>>> VACUUMed
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2. Anti wrap round VACUUM case
> > >>>>>>>>>>>>>       - CREATE brank tables
> > >>>>>>>>>>>>>       - SELECT all of these tables (for generate stats)
> > >>>>>>>>>>>>>       - SET autovacuum_freeze_max_age to low values and 
> > >>>>>>>>>>>>> restart PostgreSQL
> > >>>>>>>>>>>>>       - Consumes a lot of XIDs by using txid_curent()
> > >>>>>>>>>>>>>       - Measure the time it takes for all tables to be 
> > >>>>>>>>>>>>> VACUUMed
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> For each test case, the following results were obtained by 
> > >>>>>>>>>>>>> changing
> > >>>>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
> > >>>>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are 
> > >>>>>>>>>>>>> a little unstable,
> > >>>>>>>>>>>>> but I think it's enough to ask for a trend.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> ===========================================================================
> > >>>>>>>>>>>>> [1.Normal VACUUM case]
> > >>>>>>>>>>>>>      tables:1000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 20 sec VS (with 
> > >>>>>>>>>>>>> patch)  20 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 18 sec VS (with 
> > >>>>>>>>>>>>> patch)  16 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 18 sec VS (with 
> > >>>>>>>>>>>>> patch)  16 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 19 sec VS (with 
> > >>>>>>>>>>>>> patch)  17 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD) 19 sec VS (with 
> > >>>>>>>>>>>>> patch)  17 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      tables:5000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 77 sec VS (with 
> > >>>>>>>>>>>>> patch)  78 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 61 sec VS (with 
> > >>>>>>>>>>>>> patch)  43 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 38 sec VS (with 
> > >>>>>>>>>>>>> patch)  38 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 45 sec VS (with 
> > >>>>>>>>>>>>> patch)  37 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD) 43 sec VS (with 
> > >>>>>>>>>>>>> patch)  35 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      tables:10000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 152 sec VS (with 
> > >>>>>>>>>>>>> patch)  153 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 119 sec VS (with 
> > >>>>>>>>>>>>> patch)   98 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD)  87 sec VS (with 
> > >>>>>>>>>>>>> patch)   78 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 100 sec VS (with 
> > >>>>>>>>>>>>> patch)   66 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD)  97 sec VS (with 
> > >>>>>>>>>>>>> patch)   56 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      tables:20000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 338 sec VS (with 
> > >>>>>>>>>>>>> patch)  339 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 231 sec VS (with 
> > >>>>>>>>>>>>> patch)  229 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 220 sec VS (with 
> > >>>>>>>>>>>>> patch)  191 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 234 sec VS (with 
> > >>>>>>>>>>>>> patch)  147 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD) 320 sec VS (with 
> > >>>>>>>>>>>>> patch)  113 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [2.Anti wrap round VACUUM case]
> > >>>>>>>>>>>>>      tables:1000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 19 sec VS (with 
> > >>>>>>>>>>>>> patch) 18 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 14 sec VS (with 
> > >>>>>>>>>>>>> patch) 15 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 14 sec VS (with 
> > >>>>>>>>>>>>> patch) 14 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 14 sec VS (with 
> > >>>>>>>>>>>>> patch) 16 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD) 16 sec VS (with 
> > >>>>>>>>>>>>> patch) 14 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      tables:5000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 69 sec VS (with 
> > >>>>>>>>>>>>> patch) 69 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 66 sec VS (with 
> > >>>>>>>>>>>>> patch) 47 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 59 sec VS (with 
> > >>>>>>>>>>>>> patch) 37 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 39 sec VS (with 
> > >>>>>>>>>>>>> patch) 28 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD) 39 sec VS (with 
> > >>>>>>>>>>>>> patch) 29 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      tables:10000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 139 sec VS (with 
> > >>>>>>>>>>>>> patch) 138 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 130 sec VS (with 
> > >>>>>>>>>>>>> patch)  86 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 120 sec VS (with 
> > >>>>>>>>>>>>> patch)  68 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD)  96 sec VS (with 
> > >>>>>>>>>>>>> patch)  41 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD)  90 sec VS (with 
> > >>>>>>>>>>>>> patch)  39 sec
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      tables:20000
> > >>>>>>>>>>>>>       autovacuum_max_workers 1:   (HEAD) 313 sec VS (with 
> > >>>>>>>>>>>>> patch) 331 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 2:   (HEAD) 209 sec VS (with 
> > >>>>>>>>>>>>> patch) 201 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 3:   (HEAD) 227 sec VS (with 
> > >>>>>>>>>>>>> patch) 141 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 5:   (HEAD) 236 sec VS (with 
> > >>>>>>>>>>>>> patch)  88 sec
> > >>>>>>>>>>>>>       autovacuum_max_workers 10:  (HEAD) 309 sec VS (with 
> > >>>>>>>>>>>>> patch)  74 sec
> > >>>>>>>>>>>>> ===========================================================================
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The cases without patch, the scalability of the worker has 
> > >>>>>>>>>>>>> decreased
> > >>>>>>>>>>>>> as the number of tables has increased.
> > >>>>>>>>>>>>> In fact, the more workers there are, the longer it takes to 
> > >>>>>>>>>>>>> complete
> > >>>>>>>>>>>>> VACUUM to all tables.
> > >>>>>>>>>>>>> The cases with patch, it shows good scalability with respect 
> > >>>>>>>>>>>>> to the
> > >>>>>>>>>>>>> number of workers.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> It seems a good performance improvement even without the patch 
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>> shared memory based stats collector.
> > >>>>>>>>
> > >>>>>>>> Sounds great!
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Note that perf top results showed that 
> > >>>>>>>>>>>>> hash_search_with_hash_value,
> > >>>>>>>>>>>>> hash_seq_search and
> > >>>>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all 
> > >>>>>>>>>>>>> patterns,
> > >>>>>>>>>>>>> with or without the patch.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Therefore, there is still a need to find ways to optimize the 
> > >>>>>>>>>>>>> reading
> > >>>>>>>>>>>>> of large amounts of stats.
> > >>>>>>>>>>>>> However, this patch is effective in its own right, and since 
> > >>>>>>>>>>>>> there are
> > >>>>>>>>>>>>> only a few parts to modify,
> > >>>>>>>>>>>>> I think it should be able to be applied to current (preferably
> > >>>>>>>>>>>>> pre-v13) PostgreSQL.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> +1
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> +
> > >>>>>>>>>>>> +       /* We might be better to refresh stats */
> > >>>>>>>>>>>> +       use_existing_stats = false;
> > >>>>>>>>>>>>         }
> > >>>>>>>>>>>> +   else
> > >>>>>>>>>>>> +   {
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -   heap_freetuple(classTup);
> > >>>>>>>>>>>> +       heap_freetuple(classTup);
> > >>>>>>>>>>>> +       /* The relid has already vacuumed, so we might be 
> > >>>>>>>>>>>> better to
> > >>>>>>>>>>>> use exiting stats */
> > >>>>>>>>>>>> +       use_existing_stats = true;
> > >>>>>>>>>>>> +   }
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> With that patch, the autovacuum process refreshes the stats in 
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>> next check if it finds out that the table still needs to be 
> > >>>>>>>>>>>> vacuumed.
> > >>>>>>>>>>>> But I guess it's not necessarily true because the next table 
> > >>>>>>>>>>>> might be
> > >>>>>>>>>>>> vacuumed already. So I think we might want to always use the 
> > >>>>>>>>>>>> existing
> > >>>>>>>>>>>> for the first check. What do you think?
> > >>>>>>>>>>> Thanks for your comment.
> > >>>>>>>>>>>
> > >>>>>>>>>>> If we assume the case where some workers vacuum on large tables
> > >>>>>>>>>>> and a single worker vacuum on small tables, the processing
> > >>>>>>>>>>> performance of the single worker will be slightly lower if the
> > >>>>>>>>>>> existing statistics are checked every time.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In fact, at first I tried to check the existing stats every 
> > >>>>>>>>>>> time,
> > >>>>>>>>>>> but the performance was slightly worse in cases with a small 
> > >>>>>>>>>>> number of workers.
> > >>>>>>>>
> > >>>>>>>> Do you have this benchmark result?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>> (Checking the existing stats is lightweight , but at high 
> > >>>>>>>>>>> frequency,
> > >>>>>>>>>>>      it affects processing performance.)
> > >>>>>>>>>>> Therefore, at after refresh statistics, determine whether 
> > >>>>>>>>>>> autovac
> > >>>>>>>>>>> should use the existing statistics.
> > >>>>>>>>>>
> > >>>>>>>>>> Yeah, since the test you used uses a lot of small tables, if 
> > >>>>>>>>>> there are
> > >>>>>>>>>> a few workers, checking the existing stats is unlikely to return 
> > >>>>>>>>>> true
> > >>>>>>>>>> (no need to vacuum). So the cost of existing stats check ends up 
> > >>>>>>>>>> being
> > >>>>>>>>>> overhead. Not sure how slow always checking the existing stats 
> > >>>>>>>>>> was,
> > >>>>>>>>>> but given that the shared memory based stats collector patch 
> > >>>>>>>>>> could
> > >>>>>>>>>> improve the performance of refreshing stats, it might be better 
> > >>>>>>>>>> not to
> > >>>>>>>>>> check the existing stats frequently like the patch does. Anyway, 
> > >>>>>>>>>> I
> > >>>>>>>>>> think it’s better to evaluate the performance improvement with 
> > >>>>>>>>>> other
> > >>>>>>>>>> cases too.
> > >>>>>>>>> Yeah, I would like to see how much the performance changes in 
> > >>>>>>>>> other cases.
> > >>>>>>>>> In addition, if the shared-based-stats patch is applied, we won't 
> > >>>>>>>>> need to reload
> > >>>>>>>>> a huge stats file, so we will just have to check the stats on
> > >>>>>>>>> shared-mem every time.
> > >>>>>>>>> Perhaps the logic of table_recheck_autovac could be simpler.
> > >>>>>>>>>
> > >>>>>>>>>>> BTW, I found some typos in comments, so attache a  fixed 
> > >>>>>>>>>>> version.
> > >>>>>>>>
> > >>>>>>>> The patch adds some duplicated codes into table_recheck_autovac().
> > >>>>>>>> It's better to make the common function performing them and make
> > >>>>>>>> table_recheck_autovac() call that common function, to simplify the 
> > >>>>>>>> code.
> > >>>>>>> Thanks for your comment.
> > >>>>>>> Hmm.. I've cut out the duplicate part.
> > >>>>>>> Attach the patch.
> > >>>>>>> Could you confirm that it fits your expecting?
> > >>>>>>
> > >>>>>> Yes, thanks for updataing the patch! Here are another review 
> > >>>>>> comments.
> > >>>>>>
> > >>>>>> +       shared = pgstat_fetch_stat_dbentry(InvalidOid);
> > >>>>>> +       dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
> > >>>>>>
> > >>>>>> When using the existing stats, ISTM that these are not necessary and
> > >>>>>> we can reuse "shared" and "dbentry" obtained before. Right?
> > >>>>> Yeah, but unless autovac_refresh_stats() is called, these functions
> > >>>>> read the information from the
> > >>>>> local hash table without re-read the stats file, so the process is 
> > >>>>> very light.
> > >>>>> Therefore, I think, it is better to keep the current logic to keep the
> > >>>>> code simple.
> > >>>>>
> > >>>>>>
> > >>>>>> +               /* We might be better to refresh stats */
> > >>>>>> +               use_existing_stats = false;
> > >>>>>>
> > >>>>>> I think that we should add more comments about why it's better to
> > >>>>>> refresh the stats in this case.
> > >>>>>>
> > >>>>>> +               /* The relid has already vacuumed, so we might be 
> > >>>>>> better to use existing stats */
> > >>>>>> +               use_existing_stats = true;
> > >>>>>>
> > >>>>>> I think that we should add more comments about why it's better to
> > >>>>>> reuse the stats in this case.
> > >>>>> I added  comments.
> > >>>>>
> > >>>>> Attache the patch.
> > >>>>>
> > >>>>
> > >>>> Thank you for updating the patch. Here are some small comments on the
> > >>>> latest (v4) patch.
> > >>>>
> > >>>> +    * So if the last time we checked a table that was already 
> > >>>> vacuumed after
> > >>>> +    * refres stats, check the current statistics before refreshing it.
> > >>>> +    */
> > >>>>
> > >>>> s/refres/refresh/
> > >> Thanks! fixed.
> > >> Attached the patch.
> > >>
> > >>>>
> > >>>> -----
> > >>>> +/* Counter to determine if statistics should be refreshed */
> > >>>> +static bool use_existing_stats = false;
> > >>>> +
> > >>>>
> > >>>> I think 'use_existing_stats' can be declared within 
> > >>>> table_recheck_autovac().
> > >>>>
> > >>>> -----
> > >>>> While testing the performance, I realized that the statistics are
> > >>>> reset every time vacuumed one table, leading to re-reading the stats
> > >>>> file even if 'use_existing_stats' is true. Please refer that vacuum()
> > >>>> eventually calls AtEOXact_PgStat() which calls to
> > >>>> pgstat_clear_snapshot().
> > >>>
> > >>> Good catch!
> > >>>
> > >>>
> > >>>> I believe that's why the performance of the
> > >>>> method of always checking the existing stats wasn’t good as expected.
> > >>>> So if we save the statistics somewhere and use it for rechecking, the
> > >>>> results of the performance benchmark will differ between these two
> > >>>> methods.
> > >> Thanks for you checks.
> > >> But, if a worker did vacuum(), that means this worker had determined
> > >> need vacuum in the
> > >> table_recheck_autovac(). So, use_existing_stats set to false, and next
> > >> time, refresh stats.
> > >> Therefore I think the current patch is fine, as we want to avoid
> > >> unnecessary refreshing of
> > >> statistics before the actual vacuum(), right?
> > >
> > > Yes, you're right.
> > >
> > > When I benchmarked the performance of the method of always checking
> > > existing stats I edited your patch so that it sets use_existing_stats
> > > = true even if the second check is false (i.g., vacuum is needed).
> > > And the result I got was worse than expected especially in the case of
> > > a few autovacuum workers. But it doesn't evaluate the performance of
> > > that method rightly as the stats snapshot is cleared every time
> > > vacuum. Given you had similar results, I guess you used a similar way
> > > when evaluating it, is it right? If so, it’s better to fix this issue
> > > and see how the performance benchmark results will differ.
> > >
> > > For example, the results of the test case with 10000 tables and 1
> > > autovacuum worker I reported before was:
> > >
> > > 10000 tables:
> > >     autovac_workers 1  : 158s,157s, 290s
> > >
> > > But after fixing that issue in the third method (always checking the
> > > existing stats), the results are:
> >
> > Could you tell me how you fixed that issue? You copied the stats to
> > somewhere as you suggested or skipped pgstat_clear_snapshot() as
> > I suggested?
>
> I used the way you suggested in this quick test; skipped
> pgstat_clear_snapshot().
>
> >
> > Kasahara-san seems not to like the latter idea because it might
> > cause bad side effect. So we should use the former idea?
>
> Not sure. I'm also concerned about the side effect but I've not checked yet.
>
> Since probably there is no big difference between the two ways in
> terms of performance I'm going to see how the performance benchmark
> result will change first.


I've tested performance improvement again. From the left the execution
time of the current HEAD, Kasahara-san's patch, the method of always
checking the existing stats (using approach suggested by Fujii-san),
in seconds.

1000 tables:
   autovac_workers 1  : 13s, 13s, 13s
   autovac_workers 2  : 6s, 4s, 4s
   autovac_workers 3  : 3s, 4s, 3s
   autovac_workers 5  : 3s, 3s, 2s
   autovac_workers 10: 2s, 3s, 2s

5000 tables:
   autovac_workers 1  : 71s, 71s, 72s
   autovac_workers 2  : 37s, 32s, 32s
   autovac_workers 3  : 29s, 26s, 26s
   autovac_workers 5  : 20s, 19s, 18s
   autovac_workers 10: 13s, 8s, 8s

10000 tables:
   autovac_workers 1  : 158s,157s, 159s
   autovac_workers 2  : 80s, 53s, 78s
   autovac_workers 3  : 75s, 67s, 67s
   autovac_workers 5  : 61s, 42s, 42s
   autovac_workers 10: 69s, 26s, 25s

20000 tables:
   autovac_workers 1  : 379s, 380s, 389s
   autovac_workers 2  : 236s, 232s, 233s
   autovac_workers 3  : 222s, 181s, 182s
   autovac_workers 5  : 212s, 132s, 139s
   autovac_workers 10: 317s, 91s, 89s

I don't see a big difference between Kasahara-san's patch and the
method of always checking the existing stats.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: autovac issue with large number of tables

Reply via email to