Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Alexander Korotkov
On Tue, Jun 26, 2018 at 4:11 PM Darafei "Komяpa" Praliaskouski
 wrote:
> вт, 26 июн. 2018 г. в 15:42, Alexander Korotkov :
>>
>> On Tue, Jun 26, 2018 at 1:46 PM Masahiko Sawada  
>> wrote:
>> > On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
>> >  wrote:
>> > > So, I propose to just
>> > > increase maximum value for both GUC and reloption.  See the attached
>> > > patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
>> > > better handling of large values (just some kind of overflow paranoia).
>> >
>> > The patch looks good to me.
>>
>> Pushed, thanks!
>
>
> Thank you for the enhancement. Now Index Only Scans over Append-Only tables 
> in Postgres  can be implemented, even if it requires manual kicking of VACUUM 
> over large table, and that's a great enhancement for moving object databases. 
> :)
>
> My eye catches another thing, the error message in tests is:
>
> DETAIL:  Valid values are between "0.00" and 
> "179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00".
>
> a) do we really need to print digits of dblmax? "Valid values are double 
> precision, non-negative"?
> b) double precision binary-to-decimal noise starts at 16th digit. Why does it 
> stop at the point, and we have precise ".00"? Does it bite the conversion 
> somewhere else too?

Thank you for pointing.  I'm proposing to change output format from
"%f" to "%g" [1] [2].  It looks better and the same as what we do for
GUCs.

1. 
https://www.postgresql.org/message-id/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAPpHfdsFBbJAd24F0b_o9TfTtu%2B%2BjH0bR5XS_e9xbSwk8SJwyQ%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Komяpa
вт, 26 июн. 2018 г. в 15:42, Alexander Korotkov :

> On Tue, Jun 26, 2018 at 1:46 PM Masahiko Sawada 
> wrote:
> > On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
> >  wrote:
> > > So, I propose to just
> > > increase maximum value for both GUC and reloption.  See the attached
> > > patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
> > > better handling of large values (just some kind of overflow paranoia).
> >
> > The patch looks good to me.
>
> Pushed, thanks!
>

Thank you for the enhancement. Now Index Only Scans over Append-Only tables
in Postgres  can be implemented, even if it requires manual kicking of
VACUUM over large table, and that's a great enhancement for moving object
databases. :)

My eye catches another thing, the error message in tests is:

DETAIL:  Valid values are between "0.00" and
"179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00".

a) do we really need to print digits of dblmax? "Valid values are double
precision, non-negative"?
b) double precision binary-to-decimal noise starts at 16th digit. Why does
it stop at the point, and we have precise ".00"? Does it bite the
conversion somewhere else too?


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Alexander Korotkov
On Tue, Jun 26, 2018 at 1:46 PM Masahiko Sawada  wrote:
> On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
>  wrote:
> > So, I propose to just
> > increase maximum value for both GUC and reloption.  See the attached
> > patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
> > better handling of large values (just some kind of overflow paranoia).
>
> The patch looks good to me.

Pushed, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Masahiko Sawada
On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
 wrote:
> On Wed, Jun 20, 2018 at 12:00 PM Alexander Korotkov
>  wrote:
>> On Wed, Jun 20, 2018 at 8:32 AM Masahiko Sawada  
>> wrote:
>> > On Wed, Jun 20, 2018 at 1:00 AM, Alexander Korotkov
>> > > Ok.  I've rephrased comment a bit.  Also, you created "index vacuum"
>> > > subsection in the "resource usage" section.  I think it's not
>> > > appropriate for this option to be in "resource usage".  Ideally it
>> > > should be grouped with other vacuum options, but we don't have single
>> > > section for that.  "Autovacuum" section is also not appropriate,
>> > > because this guc works not only for autovacuum.  So, most semantically
>> > > close options, which affects vacuum in general, are
>> > > vacuum_freeze_min_age, vacuum_freeze_table_age,
>> > > vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age,
>> > > which are located in "client connection defaults" section.  So, I
>> > > decided to move this GUC into this section.  I also change the section
>> > > in GUC definition (guc.c) from AUTOVACUUM to CLIENT_CONN_STATEMENT.
>> >
>> > Agreed. So should we move it to 19.11 Client Connection Defaults in
>> > doc as well? And I think it's better if this option places next to
>> > other vacuum options for finding easier. Attached patch changes them
>> > so. Please review it.
>>
>> Right, thank you.  Looks good for me.
>> I'm going to commit this if no objections.
>
> Pushed.

Thank you!

>
> Regarding maximum value for vacuum_cleanup_index_scale_factor.  I
> think introducing special value with "never cleanup" meaning would be
> overkill for post feature freeze enhancement.

After more thought, adding a threshold to invoke index cleanups
perhaps work in order to support "never cleanup", it will be
PostgreSQL 12 item though. If a index has less tuples than the
threshold (say, vacuum_cleanup_index_threshold), index cleanups never
be executed.

> So, I propose to just
> increase maximum value for both GUC and reloption.  See the attached
> patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
> better handling of large values (just some kind of overflow paranoia).
>

The patch looks good to me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-22 Thread Alexander Korotkov
On Wed, Jun 20, 2018 at 12:00 PM Alexander Korotkov
 wrote:
> On Wed, Jun 20, 2018 at 8:32 AM Masahiko Sawada  wrote:
> > On Wed, Jun 20, 2018 at 1:00 AM, Alexander Korotkov
> > > Ok.  I've rephrased comment a bit.  Also, you created "index vacuum"
> > > subsection in the "resource usage" section.  I think it's not
> > > appropriate for this option to be in "resource usage".  Ideally it
> > > should be grouped with other vacuum options, but we don't have single
> > > section for that.  "Autovacuum" section is also not appropriate,
> > > because this guc works not only for autovacuum.  So, most semantically
> > > close options, which affects vacuum in general, are
> > > vacuum_freeze_min_age, vacuum_freeze_table_age,
> > > vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age,
> > > which are located in "client connection defaults" section.  So, I
> > > decided to move this GUC into this section.  I also change the section
> > > in GUC definition (guc.c) from AUTOVACUUM to CLIENT_CONN_STATEMENT.
> >
> > Agreed. So should we move it to 19.11 Client Connection Defaults in
> > doc as well? And I think it's better if this option places next to
> > other vacuum options for finding easier. Attached patch changes them
> > so. Please review it.
>
> Right, thank you.  Looks good for me.
> I'm going to commit this if no objections.

Pushed.

Regarding maximum value for vacuum_cleanup_index_scale_factor.  I
think introducing special value with "never cleanup" meaning would be
overkill for post feature freeze enhancement.  So, I propose to just
increase maximum value for both GUC and reloption.  See the attached
patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
better handling of large values (just some kind of overflow paranoia).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


vacuum_cleanup_index_scale_factor-max-2.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-20 Thread Alexander Korotkov
On Wed, Jun 20, 2018 at 8:32 AM Masahiko Sawada  wrote:
> On Wed, Jun 20, 2018 at 1:00 AM, Alexander Korotkov
> > Ok.  I've rephrased comment a bit.  Also, you created "index vacuum"
> > subsection in the "resource usage" section.  I think it's not
> > appropriate for this option to be in "resource usage".  Ideally it
> > should be grouped with other vacuum options, but we don't have single
> > section for that.  "Autovacuum" section is also not appropriate,
> > because this guc works not only for autovacuum.  So, most semantically
> > close options, which affects vacuum in general, are
> > vacuum_freeze_min_age, vacuum_freeze_table_age,
> > vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age,
> > which are located in "client connection defaults" section.  So, I
> > decided to move this GUC into this section.  I also change the section
> > in GUC definition (guc.c) from AUTOVACUUM to CLIENT_CONN_STATEMENT.
>
> Agreed. So should we move it to 19.11 Client Connection Defaults in
> doc as well? And I think it's better if this option places next to
> other vacuum options for finding easier. Attached patch changes them
> so. Please review it.

Right, thank you.  Looks good for me.
I'm going to commit this if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-19 Thread Masahiko Sawada
On Wed, Jun 20, 2018 at 1:00 AM, Alexander Korotkov
 wrote:
> On Tue, Jun 19, 2018 at 12:25 PM Masahiko Sawada  
> wrote:
>> On Tue, Jun 19, 2018 at 5:43 PM, Alexander Korotkov
>>  wrote:
>> > On Tue, Jun 19, 2018 at 11:34 AM Masahiko Sawada  
>> > wrote:
>> >> On Mon, Jun 18, 2018 at 1:56 PM, Alexander Korotkov
>> >> > So, I'm proposing to raise maximum valus of
>> >> > vacuum_cleanup_index_scale_factor to DBL_MAX.  Any objections?
>> >> >
>> >>
>> >> I agree to expand the maximum value. But if users don't want index
>> >> cleanup it would be helpful if we have an option (e.g. setting to -1)
>> >> to disable index cleanup while documenting a risk of disabling index
>> >> cleanup. It seems to me that setting very high values means the same
>> >> purpose.
>> >
>> > Yes, providing an option to completely disable b-tree index cleanup
>> > would be good.  But the problem is that we already use -1 value for
>> > "use the default" in reloption.  So, if even we will make -1 guc
>> > option to mean "never cleanup", then we still wouldn't be able to make
>> > reloption to work this way.  Probably, we should use another "magical
>> > value" in reloption for "use the default" semantics.
>>
>> Right. We can add a new reloption specifying whether we use default
>> value of vacuum_index_cleanup_scale_factor or not but it might be
>> overkill.
>
> That would be overkill, and also that would be different from how
> other reloptions behaves.

Agreed.

>
>> >> Also, your patch lacks documentation update.
>> >
>> > Good catch, thank you.
>> >
>> >> BTW, I realized that postgresql.conf.sample doesn't have
>> >> vacuum_cleanup_index_scale_factor option. Attached patch fixes it.
>> >
>> > It seems that you post a wrong attachment, because the patch you sent
>> > is exactly same as mine.
>> >
>>
>> Sorry, attached correct one.
>
> Ok.  I've rephrased comment a bit.  Also, you created "index vacuum"
> subsection in the "resource usage" section.  I think it's not
> appropriate for this option to be in "resource usage".  Ideally it
> should be grouped with other vacuum options, but we don't have single
> section for that.  "Autovacuum" section is also not appropriate,
> because this guc works not only for autovacuum.  So, most semantically
> close options, which affects vacuum in general, are
> vacuum_freeze_min_age, vacuum_freeze_table_age,
> vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age,
> which are located in "client connection defaults" section.  So, I
> decided to move this GUC into this section.  I also change the section
> in GUC definition (guc.c) from AUTOVACUUM to CLIENT_CONN_STATEMENT.

Agreed. So should we move it to 19.11 Client Connection Defaults in
doc as well? And I think it's better if this option places next to
other vacuum options for finding easier. Attached patch changes them
so. Please review it.

> I think we ideally should have a "vacuum" section, which would have
> two subections: "client defaults" and "autovacuum".  But that should
> be a separate patch, which needs to be discussed separately.

+1

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


skip_cleanup_index_3.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-19 Thread Alexander Korotkov
On Tue, Jun 19, 2018 at 12:25 PM Masahiko Sawada  wrote:
> On Tue, Jun 19, 2018 at 5:43 PM, Alexander Korotkov
>  wrote:
> > On Tue, Jun 19, 2018 at 11:34 AM Masahiko Sawada  
> > wrote:
> >> On Mon, Jun 18, 2018 at 1:56 PM, Alexander Korotkov
> >> > So, I'm proposing to raise maximum valus of
> >> > vacuum_cleanup_index_scale_factor to DBL_MAX.  Any objections?
> >> >
> >>
> >> I agree to expand the maximum value. But if users don't want index
> >> cleanup it would be helpful if we have an option (e.g. setting to -1)
> >> to disable index cleanup while documenting a risk of disabling index
> >> cleanup. It seems to me that setting very high values means the same
> >> purpose.
> >
> > Yes, providing an option to completely disable b-tree index cleanup
> > would be good.  But the problem is that we already use -1 value for
> > "use the default" in reloption.  So, if even we will make -1 guc
> > option to mean "never cleanup", then we still wouldn't be able to make
> > reloption to work this way.  Probably, we should use another "magical
> > value" in reloption for "use the default" semantics.
>
> Right. We can add a new reloption specifying whether we use default
> value of vacuum_index_cleanup_scale_factor or not but it might be
> overkill.

That would be overkill, and also that would be different from how
other reloptions behaves.

> >> Also, your patch lacks documentation update.
> >
> > Good catch, thank you.
> >
> >> BTW, I realized that postgresql.conf.sample doesn't have
> >> vacuum_cleanup_index_scale_factor option. Attached patch fixes it.
> >
> > It seems that you post a wrong attachment, because the patch you sent
> > is exactly same as mine.
> >
>
> Sorry, attached correct one.

Ok.  I've rephrased comment a bit.  Also, you created "index vacuum"
subsection in the "resource usage" section.  I think it's not
appropriate for this option to be in "resource usage".  Ideally it
should be grouped with other vacuum options, but we don't have single
section for that.  "Autovacuum" section is also not appropriate,
because this guc works not only for autovacuum.  So, most semantically
close options, which affects vacuum in general, are
vacuum_freeze_min_age, vacuum_freeze_table_age,
vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age,
which are located in "client connection defaults" section.  So, I
decided to move this GUC into this section.  I also change the section
in GUC definition (guc.c) from AUTOVACUUM to CLIENT_CONN_STATEMENT.

I think we ideally should have a "vacuum" section, which would have
two subections: "client defaults" and "autovacuum".  But that should
be a separate patch, which needs to be discussed separately.
--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index fa3c8a7905..859ef931e7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3248,7 +3248,7 @@ static struct config_real ConfigureNamesReal[] =
 	},
 
 	{
-		{"vacuum_cleanup_index_scale_factor", PGC_USERSET, AUTOVACUUM,
+		{"vacuum_cleanup_index_scale_factor", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Number of tuple inserts prior to index cleanup as a fraction of reltuples."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f43086f6d0..5269297d74 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -580,6 +580,8 @@
 #xmloption = 'content'
 #gin_fuzzy_search_limit = 0
 #gin_pending_list_limit = 4MB
+#vacuum_cleanup_index_scale_factor = 0.1	# fraction of total number of tuples
+	# before index cleanup, 0 always performs index cleanup
 
 # - Locale and Formatting -
 


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-19 Thread Alexander Korotkov
On Tue, Jun 19, 2018 at 11:34 AM Masahiko Sawada  wrote:
> On Mon, Jun 18, 2018 at 1:56 PM, Alexander Korotkov
> > So, I'm proposing to raise maximum valus of
> > vacuum_cleanup_index_scale_factor to DBL_MAX.  Any objections?
> >
>
> I agree to expand the maximum value. But if users don't want index
> cleanup it would be helpful if we have an option (e.g. setting to -1)
> to disable index cleanup while documenting a risk of disabling index
> cleanup. It seems to me that setting very high values means the same
> purpose.

Yes, providing an option to completely disable b-tree index cleanup
would be good.  But the problem is that we already use -1 value for
"use the default" in reloption.  So, if even we will make -1 guc
option to mean "never cleanup", then we still wouldn't be able to make
reloption to work this way.  Probably, we should use another "magical
value" in reloption for "use the default" semantics.

> Also, your patch lacks documentation update.

Good catch, thank you.

> BTW, I realized that postgresql.conf.sample doesn't have
> vacuum_cleanup_index_scale_factor option. Attached patch fixes it.

It seems that you post a wrong attachment, because the patch you sent
is exactly same as mine.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-17 Thread Alexander Korotkov
Hi!

On Sat, Jun 16, 2018 at 11:23 PM Darafei "Komяpa" Praliaskouski
 wrote:
> It is cool to see this in Postgres 11. However:
>
>>
>> 4) vacuum_cleanup_index_scale_factor can be set either by GUC or reloption.
>> Default value is 0.1.  So, by default cleanup scan is triggered after 
>> increasing of
>> table size by 10%.
>
>
> vacuum_cleanup_index_scale_factor can be set to the maximum of 100.
> I imagine that on a large append-only table with IOPS storage system budget 
> it may happen that I would want to never perform a full scan on index. 
> Roughly, with parameter set to 100, if we vacuum the table first time with 1 
> tuple and 130 byte wide rows, we'll have a full scan at 130 bytes, 12 kbytes, 
> 1.2MB, 123MB, 12 GB, 1.2TB.
>
> If we happen to perform the first vacuum when there are 4 tuples in the 
> table, it becomes 52kb, 5MB, 495MB, 48GB - and both 12GB and 48GB will 
> exhaust any storage spike IOPS budget, slowing everything down rather 
> suddenly.
>
> Can the upper limit for this GUC be lifted, or have a value for "never"?

I have some further exploration of how statistics obtained by B-tree
index vacuum cleanup is used.

1) Collected pages and tuples numbers are not directly used, but used
for an estimation of tuples density per page, while current number of
page is estimated using smgr (see btcostestimate()).  So, unless
density of tuples significantly changes, having index statistics
stalled doesn't affect query plans.
2) Our optimization for skipping B-tree index vacuum cleanup works
only in case when use manually vacuums table in order to update
visibility map. Autovacuum is not triggered for append-only tables.
So, if user doesn't have special care about append-only tables,
they're not vacuumed until "autovacuum to prevent wraparound".  Thus,
index statistics could be very stalled.  And I don't think we have
many occurrences of issues with stalled index statistics.
3) We have very safe defaul of vacuum_cleanup_index_scale_factor equal
to 1.1.  But as Darafei claimed, 100 maximum value is probably too low
for advanced users, who really need benefits of this optimization.

So, I'm proposing to raise maximum valus of
vacuum_cleanup_index_scale_factor to DBL_MAX.  Any objections?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


vacuum_cleanup_index_scale_factor-max.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-16 Thread Komяpa
Hi!

It is cool to see this in Postgres 11. However:


> 4) vacuum_cleanup_index_scale_factor can be set either by GUC or
> reloption.
> Default value is 0.1.  So, by default cleanup scan is triggered after
> increasing of
> table size by 10%.
>

vacuum_cleanup_index_scale_factor can be set to the maximum of 100.
I imagine that on a large append-only table with IOPS storage system budget
it may happen that I would want to never perform a full scan on index.
Roughly, with parameter set to 100, if we vacuum the table first time with
1 tuple and 130 byte wide rows, we'll have a full scan at 130 bytes, 12
kbytes, 1.2MB, 123MB, 12 GB, 1.2TB.

If we happen to perform the first vacuum when there are 4 tuples in the
table, it becomes 52kb, 5MB, 495MB, 48GB - and both 12GB and 48GB will
exhaust any storage spike IOPS budget, slowing everything down rather
suddenly.

Can the upper limit for this GUC be lifted, or have a value for "never"?


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-06 Thread Kyotaro HORIGUCHI
At Fri, 6 Apr 2018 10:52:58 +0900, Masahiko Sawada  
wrote in 
> On Thu, Apr 5, 2018 at 7:23 PM, Teodor Sigaev  wrote:
> > Thanks to everyone, fixes are pushed except nodeMerge.c, I don't wish to
> > increase entropy around MERGE patch :)

No problem. Thanks!

> Thank you!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-05 Thread Masahiko Sawada
On Thu, Apr 5, 2018 at 7:23 PM, Teodor Sigaev  wrote:
> Thanks to everyone, fixes are pushed except nodeMerge.c, I don't wish to
> increase entropy around MERGE patch :)
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-05 Thread Kyotaro HORIGUCHI
Hello.

The commit leaves three warnings for
-Wunused-but-set-variable. Two of them are not assertion-only but
really not used at all.

I also found that nodeMerge.c has one such variable.

regards.

At Thu, 5 Apr 2018 15:43:55 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-05 Thread Masahiko Sawada
On Thu, Apr 5, 2018 at 2:40 PM, Masahiko Sawada  wrote:
> On Thu, Apr 5, 2018 at 1:30 AM, Teodor Sigaev  wrote:
>> Thanks for everyone, pushed with minor editorization
>>
>
> Thank you for committing!
> I found a typo in nbtpage.c and attached a patch fixes it.
>

I also found an incorrect documentation in create_index.sgml as follows.

 vacuum_cleanup_index_scale_factor
 
 
  Per-table value for .
 
 


I think it should be "Per-index". Attached a patch for fixing it. And
sorry for missing it at review.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_document.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-04 Thread Masahiko Sawada
On Thu, Apr 5, 2018 at 1:30 AM, Teodor Sigaev  wrote:
> Thanks for everyone, pushed with minor editorization
>

Thank you for committing!
I found a typo in nbtpage.c and attached a patch fixes it.

s/overritten/overwritten/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-04 Thread Teodor Sigaev

Thanks for everyone, pushed with minor editorization


Alexander Korotkov wrote:
On Tue, Apr 3, 2018 at 6:42 PM, Masahiko Sawada > wrote:


On Tue, Mar 27, 2018 at 8:01 PM, Alexander Korotkov
> So, I would like to clarify why could my patch block future improvements
> in this area?  For instance, if we would decide to make btree able to skip
> cleanup when some delete pages are pending for recycle, we can add
> it in future.
>

Anyway, for approaches of this feature I agree your version patch and
your patch looks good to me as the first step of this feature.


Agreed.  I think we got consensus that this patch is good first step,
which doesn't block further enhancements in future.

So, I'm attaching rebased version of patch and marking this RFC.

--
Alexander Korotkov
Postgres Professional:http://www.postgrespro.com 
The Russian Postgres Company


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-04 Thread Alexander Korotkov
On Tue, Apr 3, 2018 at 6:42 PM, Masahiko Sawada 
wrote:

> On Tue, Mar 27, 2018 at 8:01 PM, Alexander Korotkov
> > So, I would like to clarify why could my patch block future improvements
> > in this area?  For instance, if we would decide to make btree able to
> skip
> > cleanup when some delete pages are pending for recycle, we can add
> > it in future.
> >
>
> Anyway, for approaches of this feature I agree your version patch and
> your patch looks good to me as the first step of this feature.


Agreed.  I think we got consensus that this patch is good first step,
which doesn't block further enhancements in future.

So, I'm attaching rebased version of patch and marking this RFC.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-lazy-btree-cleanup-7.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-03 Thread Masahiko Sawada
Sorry for the late response.

On Tue, Mar 27, 2018 at 8:01 PM, Alexander Korotkov
 wrote:
> On Tue, Mar 27, 2018 at 11:55 AM, Masahiko Sawada 
> wrote:
>>
>> On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov
>>  wrote:
>> > However, I see that you are comparing relative change of num_heap_tuples
>> > before and after vacuum to vacuum_cleanup_index_scale_factor.
>> > The problem is that if vacuum is very aggressive (and that is likely for
>> > append only case if this patch is committed), then num_heap_tuples
>> > changes very slightly every vacuum cycle.  So, cleanup would never
>> > happen and statistics could stall indefinitely.
>>
>> Good catch. I think we need to store the number of tuples at when
>> scanning whole index is performed (bulkdelete or cleanup) as your
>> patch does so. So it also would need the page-upgrading code. Since
>> that value would be helpful for other type of indexes too it might be
>> better to  store it into system catalog.
>>
>> >
>> > Another issue is that append only workloads are frequently combined
>> > with rare periodical bulk deletes of data.  Assuming that in this patch
>> > you don't consider deleted pages to trigger index cleanup, on such
>> > workload large amounts of deleted pages may reside non-recycled until
>> > next bulk delete cycle.
>>
>> Perhaps using that new value we can trigger the cleanup if the current
>> number of tuples has been increased or decreased the
>> vacuum_index_scale_factor% from n_tup_last_cleanup.
>
>
> Yes, that might work.  However, decreased number of tuples could be
> inaccurate measure of number of deleted pages.  Depending on distribution
> of tuples per pages, same amount of deleted tuples can lead to very
> different
> number of deleted pages (in corner cases in can start from zero to the very
> large amounts).  If we want to skip scanning of deleted pages then it
> would be better store their exact count known by previous bulk delete or
> cleanup.
>
>> >
>> > So, despite I generally like idea of storing epoch of deleted xid in the
>> > page, I think my version of patch is closer to committable shape.
>> >
>>
>> Agreed, but as I mentioned before, I'm concerned that your version
>> patch approach will become a restriction of future improvement. If
>> community decides this feature works only for mostly append-only
>> workloads I think your version of patch is the best approach for now.
>
>
> At first I would like to note that all valid optimizations presented in the
> thread optimizes append case.  Thus they do give serious benefit
> on mostly append-only workloads.  Since patches were about
> skipping/reducing cleanup stage which does serious work only in
> append-only case.
>
> It could be possible in future to optimize also cases when only small
> fraction of table tuples were deleted.  Those tuples could be deleted
> not by full index scan, but by individual index lookups.  But I think
> such optimization is rather harder than everything mentioned in
> this thread, and it should be considered separately.

Agreed.

>
> The thing which could be improved in future is making btree able
> to skip cleanup when some deleted pages are pending to be recycled.
> But I'm not sure that this problem deserves a lot of time investment
> right now.  Because I think that page deletion in our btree is unfortunately
> quite rate situation.  In real life our btree is rather going to be bloat
> with bad
> fillfactor of pages rather than got much pages deleted.

Agreed. In your version patch we cleanup recyclable pages if there
might be them whereas my version patch could leave recyclable pages.
The thing I'm concerned is that the patch unnecessarily would narrow
its effect. That is, we might be able to update the btree meta page
even when bulkdelete.

In your version patch we have to scan whole index at least twice
(bulkdelete and cleanup) if even one tuple is deleted. But if deletion
of index tuples didn't lead index page deletions (which is a common
case)  the extra one scan is really unnecessary. So I guess that we
can update the meta page only when we deleted pages in the index
vacuum scan. If no page deletion happened since we don't need to
update the oldest btpo.xact we don't update meta page, and of course
don't WAL-logging. This can be separate patch though. At least the
current approach is more safer.

>
> So, I would like to clarify why could my patch block future improvements
> in this area?  For instance, if we would decide to make btree able to skip
> cleanup when some delete pages are pending for recycle, we can add
> it in future.
>

Anyway, for approaches of this feature I agree your version patch and
your patch looks good to me as the first step of this feature.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-27 Thread Alexander Korotkov
On Tue, Mar 27, 2018 at 11:55 AM, Masahiko Sawada 
wrote:

> On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov
>  wrote:
> > However, I see that you are comparing relative change of num_heap_tuples
> > before and after vacuum to vacuum_cleanup_index_scale_factor.
> > The problem is that if vacuum is very aggressive (and that is likely for
> > append only case if this patch is committed), then num_heap_tuples
> > changes very slightly every vacuum cycle.  So, cleanup would never
> > happen and statistics could stall indefinitely.
>
> Good catch. I think we need to store the number of tuples at when
> scanning whole index is performed (bulkdelete or cleanup) as your
> patch does so. So it also would need the page-upgrading code. Since
> that value would be helpful for other type of indexes too it might be
> better to  store it into system catalog.
>
> >
> > Another issue is that append only workloads are frequently combined
> > with rare periodical bulk deletes of data.  Assuming that in this patch
> > you don't consider deleted pages to trigger index cleanup, on such
> > workload large amounts of deleted pages may reside non-recycled until
> > next bulk delete cycle.
>
> Perhaps using that new value we can trigger the cleanup if the current
> number of tuples has been increased or decreased the
> vacuum_index_scale_factor% from n_tup_last_cleanup.
>

Yes, that might work.  However, decreased number of tuples could be
inaccurate measure of number of deleted pages.  Depending on distribution
of tuples per pages, same amount of deleted tuples can lead to very
different
number of deleted pages (in corner cases in can start from zero to the very
large amounts).  If we want to skip scanning of deleted pages then it
would be better store their exact count known by previous bulk delete or
cleanup.

>
> > So, despite I generally like idea of storing epoch of deleted xid in the
> > page, I think my version of patch is closer to committable shape.
> >
>
> Agreed, but as I mentioned before, I'm concerned that your version
> patch approach will become a restriction of future improvement. If
> community decides this feature works only for mostly append-only
> workloads I think your version of patch is the best approach for now.


At first I would like to note that all valid optimizations presented in the
thread optimizes append case.  Thus they do give serious benefit
on mostly append-only workloads.  Since patches were about
skipping/reducing cleanup stage which does serious work only in
append-only case.

It could be possible in future to optimize also cases when only small
fraction of table tuples were deleted.  Those tuples could be deleted
not by full index scan, but by individual index lookups.  But I think
such optimization is rather harder than everything mentioned in
this thread, and it should be considered separately.

The thing which could be improved in future is making btree able
to skip cleanup when some deleted pages are pending to be recycled.
But I'm not sure that this problem deserves a lot of time investment
right now.  Because I think that page deletion in our btree is unfortunately
quite rate situation.  In real life our btree is rather going to be bloat
with bad
fillfactor of pages rather than got much pages deleted.

So, I would like to clarify why could my patch block future improvements
in this area?  For instance, if we would decide to make btree able to skip
cleanup when some delete pages are pending for recycle, we can add
it in future.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-27 Thread Masahiko Sawada
On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov
 wrote:
> On Mon, Mar 19, 2018 at 5:12 AM, Masahiko Sawada 
> wrote:
>>
>> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
>>  wrote:
>> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
>> > wrote:
>> >>
>> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
>> >>  wrote:
>> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
>> >> >>  wrote:
>> >> >> > 2) These parameters are reset during btbulkdelete() and set during
>> >> >> > btvacuumcleanup().
>> >> >>
>> >> >> Can't we set these parameters even during btbulkdelete()? By keeping
>> >> >> them up to date, we will able to avoid an unnecessary cleanup
>> >> >> vacuums
>> >> >> even after index bulk-delete.
>> >> >
>> >> >
>> >> > We certainly can update cleanup-related parameters during
>> >> > btbulkdelete().
>> >> > However, in this case we would update B-tree meta-page during each
>> >> > VACUUM cycle.  That may cause some overhead for non append-only
>> >> > workloads.  I don't think this overhead would be sensible, because in
>> >> > non append-only scenarios VACUUM typically writes much more of
>> >> > information.
>> >> > But I would like this oriented to append-only workload patch to be
>> >> > as harmless as possible for other workloads.
>> >>
>> >> What overhead are you referring here? I guess the overhead is only the
>> >> calculating the oldest btpo.xact. And I think it would be harmless.
>> >
>> >
>> > I meant overhead of setting last_cleanup_num_heap_tuples after every
>> > btbulkdelete with wal-logging of meta-page.  I bet it also would be
>> > harmless, but I think that needs some testing.
>>
>> Agreed.
>>
>> After more thought, it might be too late but we can consider the
>> possibility of another idea proposed by Peter. Attached patch
>> addresses the original issue of index cleanups by storing the epoch
>> number of page deletion XID into PageHeader->pd_prune_xid which is
>> 4byte field. Comparing to the current proposed patch this patch
>> doesn't need neither the page upgrade code nor extra WAL-logging. If
>> we also want to address cases other than append-only case we will
>> require the bulk-delete method of scanning whole index and of logging
>> WAL. But it leads some extra overhead. With this patch we no longer
>> need to depend on the full scan on b-tree index. This might be useful
>> for a future when we make the bulk-delete of b-tree index not scan
>> whole index.
>
>
> Storing epoch in deleted pages is good by itself, because it makes us
> free to select the moment of time to recycle those pages without risk
> of wraparound.
>
> However, I see that you are comparing relative change of num_heap_tuples
> before and after vacuum to vacuum_cleanup_index_scale_factor.
> The problem is that if vacuum is very aggressive (and that is likely for
> append only case if this patch is committed), then num_heap_tuples
> changes very slightly every vacuum cycle.  So, cleanup would never
> happen and statistics could stall indefinitely.

Good catch. I think we need to store the number of tuples at when
scanning whole index is performed (bulkdelete or cleanup) as your
patch does so. So it also would need the page-upgrading code. Since
that value would be helpful for other type of indexes too it might be
better to  store it into system catalog.

>
> Another issue is that append only workloads are frequently combined
> with rare periodical bulk deletes of data.  Assuming that in this patch
> you don't consider deleted pages to trigger index cleanup, on such
> workload large amounts of deleted pages may reside non-recycled until
> next bulk delete cycle.

Perhaps using that new value we can trigger the cleanup if the current
number of tuples has been increased or decreased the
vacuum_index_scale_factor% from n_tup_last_cleanup.

>
> So, despite I generally like idea of storing epoch of deleted xid in the
> page, I think my version of patch is closer to committable shape.
>

Agreed, but as I mentioned before, I'm concerned that your version
patch approach will become a restriction of future improvement. If
community decides this feature works only for mostly append-only
workloads I think your version of patch is the best approach for now.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-22 Thread Alexander Korotkov
On Mon, Mar 19, 2018 at 8:45 AM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada 
> wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-22 Thread Alexander Korotkov
On Mon, Mar 19, 2018 at 5:12 AM, Masahiko Sawada 
wrote:

> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
>  wrote:
> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
> > wrote:
> >>
> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
> >>  wrote:
> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada <
> sawada.m...@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> >> >>  wrote:
> >> >> > 2) These parameters are reset during btbulkdelete() and set during
> >> >> > btvacuumcleanup().
> >> >>
> >> >> Can't we set these parameters even during btbulkdelete()? By keeping
> >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> >> >> even after index bulk-delete.
> >> >
> >> >
> >> > We certainly can update cleanup-related parameters during
> >> > btbulkdelete().
> >> > However, in this case we would update B-tree meta-page during each
> >> > VACUUM cycle.  That may cause some overhead for non append-only
> >> > workloads.  I don't think this overhead would be sensible, because in
> >> > non append-only scenarios VACUUM typically writes much more of
> >> > information.
> >> > But I would like this oriented to append-only workload patch to be
> >> > as harmless as possible for other workloads.
> >>
> >> What overhead are you referring here? I guess the overhead is only the
> >> calculating the oldest btpo.xact. And I think it would be harmless.
> >
> >
> > I meant overhead of setting last_cleanup_num_heap_tuples after every
> > btbulkdelete with wal-logging of meta-page.  I bet it also would be
> > harmless, but I think that needs some testing.
>
> Agreed.
>
> After more thought, it might be too late but we can consider the
> possibility of another idea proposed by Peter. Attached patch
> addresses the original issue of index cleanups by storing the epoch
> number of page deletion XID into PageHeader->pd_prune_xid which is
> 4byte field. Comparing to the current proposed patch this patch
> doesn't need neither the page upgrade code nor extra WAL-logging. If
> we also want to address cases other than append-only case we will
> require the bulk-delete method of scanning whole index and of logging
> WAL. But it leads some extra overhead. With this patch we no longer
> need to depend on the full scan on b-tree index. This might be useful
> for a future when we make the bulk-delete of b-tree index not scan
> whole index.


Storing epoch in deleted pages is good by itself, because it makes us
free to select the moment of time to recycle those pages without risk
of wraparound.

However, I see that you are comparing relative change of num_heap_tuples
before and after vacuum to vacuum_cleanup_index_scale_factor.
The problem is that if vacuum is very aggressive (and that is likely for
append only case if this patch is committed), then num_heap_tuples
changes very slightly every vacuum cycle.  So, cleanup would never
happen and statistics could stall indefinitely.

Another issue is that append only workloads are frequently combined
with rare periodical bulk deletes of data.  Assuming that in this patch
you don't consider deleted pages to trigger index cleanup, on such
workload large amounts of deleted pages may reside non-recycled until
next bulk delete cycle.

So, despite I generally like idea of storing epoch of deleted xid in the
page, I think my version of patch is closer to committable shape.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-19 Thread Kyotaro HORIGUCHI
At Tue, 20 Mar 2018 13:57:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180320.135719.90053076.horiguchi.kyot...@lab.ntt.co.jp>
> At Mon, 19 Mar 2018 20:50:48 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-19 Thread Kyotaro HORIGUCHI
At Mon, 19 Mar 2018 20:50:48 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-19 Thread Claudio Freire
On Mon, Mar 19, 2018 at 8:50 AM, Masahiko Sawada  wrote:
>>> require the bulk-delete method of scanning whole index and of logging
>>> WAL. But it leads some extra overhead. With this patch we no longer
>>> need to depend on the full scan on b-tree index. This might be useful
>>> for a future when we make the bulk-delete of b-tree index not scan
>>> whole index.
>>
>> Perhaps I'm taking something incorrectly, but is it just the
>> result of skipping 'maybe needed' scans without condiering the
>> actual necessity?
>
> I meant to scan only index pages that are relevant with garbages TIDs
> on a table. The current b-tree index bulk-deletion is very slow and
> heavy because we always scans the whole index even if there is only 1
> dead tuples in a table. To address this problem I'm thinking a way to
> make bulk-delete not scan whole index if there is a few dead tuples in
> a table. That is, we do index scans to collect the stack of btree
> pages and reclaim garbage. Maybe we will full index scan if there are
> a lot of dead tuples, which would be same as what we're doing on
> planning access paths.

In theory it's not very difficult to do. I was pondering doing some
PoC after the other vacuum patches get through.

TL;DR version is, as long as there's enough MWM to fit the keys, they
can be stashed before vacuuming the heap, and used to perform index
scans instead for index cleanup. If MWM runs out, it just goes back to
bulk-delete scans (it would imply there's a lot to clean up anyway, so
index scans wouldn't be worth it). A finer decision can be made with
random_page_cost on which technique is likely faster as well.

So yes, lets not paint ourselves into a corner where full index scans
are required for correct operation, that would make the above
impossible.



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-19 Thread Masahiko Sawada
On Mon, Mar 19, 2018 at 2:45 PM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-18 Thread Kyotaro HORIGUCHI
Sorry I'd like to make a trivial but critical fix.

At Mon, 19 Mar 2018 14:45:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180319.144505.166111203.horiguchi.kyot...@lab.ntt.co.jp>
> At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-18 Thread Kyotaro HORIGUCHI
At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-18 Thread Masahiko Sawada
On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
 wrote:
> On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
> wrote:
>>
>> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
>>  wrote:
>> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
>> > wrote:
>> >>
>> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
>> >>  wrote:
>> >> > 2) These parameters are reset during btbulkdelete() and set during
>> >> > btvacuumcleanup().
>> >>
>> >> Can't we set these parameters even during btbulkdelete()? By keeping
>> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
>> >> even after index bulk-delete.
>> >
>> >
>> > We certainly can update cleanup-related parameters during
>> > btbulkdelete().
>> > However, in this case we would update B-tree meta-page during each
>> > VACUUM cycle.  That may cause some overhead for non append-only
>> > workloads.  I don't think this overhead would be sensible, because in
>> > non append-only scenarios VACUUM typically writes much more of
>> > information.
>> > But I would like this oriented to append-only workload patch to be
>> > as harmless as possible for other workloads.
>>
>> What overhead are you referring here? I guess the overhead is only the
>> calculating the oldest btpo.xact. And I think it would be harmless.
>
>
> I meant overhead of setting last_cleanup_num_heap_tuples after every
> btbulkdelete with wal-logging of meta-page.  I bet it also would be
> harmless, but I think that needs some testing.

Agreed.

After more thought, it might be too late but we can consider the
possibility of another idea proposed by Peter. Attached patch
addresses the original issue of index cleanups by storing the epoch
number of page deletion XID into PageHeader->pd_prune_xid which is
4byte field. Comparing to the current proposed patch this patch
doesn't need neither the page upgrade code nor extra WAL-logging. If
we also want to address cases other than append-only case we will
require the bulk-delete method of scanning whole index and of logging
WAL. But it leads some extra overhead. With this patch we no longer
need to depend on the full scan on b-tree index. This might be useful
for a future when we make the bulk-delete of b-tree index not scan
whole index.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


skip-cleanup-index-using-epoch.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-14 Thread Alexander Korotkov
On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada 
wrote:

> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
>  wrote:
> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
> > wrote:
> >>
> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> >>  wrote:
> >> > 2) These parameters are reset during btbulkdelete() and set during
> >> > btvacuumcleanup().
> >>
> >> Can't we set these parameters even during btbulkdelete()? By keeping
> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> >> even after index bulk-delete.
> >
> >
> > We certainly can update cleanup-related parameters during btbulkdelete().
> > However, in this case we would update B-tree meta-page during each
> > VACUUM cycle.  That may cause some overhead for non append-only
> > workloads.  I don't think this overhead would be sensible, because in
> > non append-only scenarios VACUUM typically writes much more of
> information.
> > But I would like this oriented to append-only workload patch to be
> > as harmless as possible for other workloads.
>
> What overhead are you referring here? I guess the overhead is only the
> calculating the oldest btpo.xact. And I think it would be harmless.
>

I meant overhead of setting last_cleanup_num_heap_tuples after every
btbulkdelete with wal-logging of meta-page.  I bet it also would be
harmless, but I think that needs some testing.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-13 Thread Masahiko Sawada
On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
 wrote:
> On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
> wrote:
>>
>> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
>>  wrote:
>> > 2) These parameters are reset during btbulkdelete() and set during
>> > btvacuumcleanup().
>>
>> Can't we set these parameters even during btbulkdelete()? By keeping
>> them up to date, we will able to avoid an unnecessary cleanup vacuums
>> even after index bulk-delete.
>
>
> We certainly can update cleanup-related parameters during btbulkdelete().
> However, in this case we would update B-tree meta-page during each
> VACUUM cycle.  That may cause some overhead for non append-only
> workloads.  I don't think this overhead would be sensible, because in
> non append-only scenarios VACUUM typically writes much more of information.
> But I would like this oriented to append-only workload patch to be
> as harmless as possible for other workloads.

What overhead are you referring here? I guess the overhead is only the
calculating the oldest btpo.xact. And I think it would be harmless.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-11 Thread Alexander Korotkov
On Sat, Mar 10, 2018 at 5:49 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Mar 9, 2018 at 9:40 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
>> wrote:
>>
>>> Attached an updated patch
>>>
>> fixed these issue. Will review the patch again.
>>
>>
>> Thank you!
>>
>
> I've fixed a bug: _bt_vacuum_needs_cleanup() didn't release a lock when
> metapage is in old format.
> Bug was found by Darafei Praliaskouski and Grigory Smolkin who tested this
> patch.
> Revised patch is attached.
>

I also found that online upgrade of meta-page didn't work: version field
wasn't set.
Fixed in the attached version of patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-lazy-btree-cleanup-6.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-10 Thread Alexander Korotkov
On Fri, Mar 9, 2018 at 9:40 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
> wrote:
>
>> Attached an updated patch
>>
> fixed these issue. Will review the patch again.
>
>
> Thank you!
>

I've fixed a bug: _bt_vacuum_needs_cleanup() didn't release a lock when
metapage is in old format.
Bug was found by Darafei Praliaskouski and Grigory Smolkin who tested this
patch.
Revised patch is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-lazy-btree-cleanup-5.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-09 Thread Alexander Korotkov
On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada 
wrote:

> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
>  wrote:
> > 2) These parameters are reset during btbulkdelete() and set during
> > btvacuumcleanup().
>
> Can't we set these parameters even during btbulkdelete()? By keeping
> them up to date, we will able to avoid an unnecessary cleanup vacuums
> even after index bulk-delete.
>

We certainly can update cleanup-related parameters during btbulkdelete().
However, in this case we would update B-tree meta-page during each
VACUUM cycle.  That may cause some overhead for non append-only
workloads.  I don't think this overhead would be sensible, because in
non append-only scenarios VACUUM typically writes much more of information.
But I would like this oriented to append-only workload patch to be
as harmless as possible for other workloads.

I've not reviewed the code deeply yet but the regression test of this
> patch seems to wrongly pass the regression tests and bt_metap()
> function of pageinspect needs to be updated.


Right.  Thank you for fixing this issue.


> Attached an updated patch
> fixed these issue. Will review the patch again.


Thank you!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-09 Thread Masahiko Sawada
On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
 wrote:
> Hi!
>

Sorry for my late reply.

> I'd like to propose a revised patch based on various ideas upthread.

Thank you for proposing the patch!

>
> This patch works as following.
>
> 1) B-tree meta page is extended with 2 additional parameters:
>  * btm_oldest_btpo_xact – oldest btpo_xact among of deleted pages,
>  * btm_last_cleanup_num_heap_tuples – number of heap tuples during last
> cleanup scan.
>
> 2) These parameters are reset during btbulkdelete() and set during
> btvacuumcleanup().

Can't we set these parameters even during btbulkdelete()? By keeping
them up to date, we will able to avoid an unnecessary cleanup vacuums
even after index bulk-delete.

>
> 3) Index scans during second and subsequent btvacuumcleanup() happen only if
>btm_oldest_btpo_xact is older than RecentGlobalXmin
>   OR num_heap_tuples >= btm_last_cleanup_num_heap_tuples(1 +
> vacuum_cleanup_index_scale_factor).
>
> In other words btvacuumcleanup() scans the index only if there are
> recyclable pages,
> or index statistics is stalled (inserted more than
> vacuum_cleanup_index_scale_factor
> since last index statistics collection).
>
> 4) vacuum_cleanup_index_scale_factor can be set either by GUC or reloption.
> Default value is 0.1.  So, by default cleanup scan is triggered after
> increasing of
> table size by 10%.
>
> 5) Since new fields are added to the metapage, BTREE_VERSION is bumped.
> In order to support pg_upgrade, read of previous metapage version is
> supported.
> On metapage rewrite, it's upgraded to the new version.
>
> So, since we don't skip scan of recyclable pages, there is no risk of xid
> wraparound.
> Risk of stalled statistics is also small, because
> vacuum_cleanup_index_scale_factor
> default value is quite low.  User can increase
> vacuum_cleanup_index_scale_factor
> on his own risk and have less load of B-tree cleanup scan bought by more gap
> in
> index statistics.

Agreed.

I've not reviewed the code deeply yet but the regression test of this
patch seems to wrongly pass the regression tests and bt_metap()
function of pageinspect needs to be updated. Attached an updated patch
fixed these issue. Will review the patch again.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


0001-lazy-btree-cleanup-4.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-08 Thread Alexander Korotkov
Hi!

I'd like to propose a revised patch based on various ideas upthread.

This patch works as following.

1) B-tree meta page is extended with 2 additional parameters:
 * btm_oldest_btpo_xact – oldest btpo_xact among of deleted pages,
 * btm_last_cleanup_num_heap_tuples – number of heap tuples during last
cleanup scan.

2) These parameters are reset during btbulkdelete() and set during
btvacuumcleanup().

3) Index scans during second and subsequent btvacuumcleanup() happen only if
   btm_oldest_btpo_xact is older than RecentGlobalXmin
  OR num_heap_tuples >= btm_last_cleanup_num_heap_tuples(1
+ vacuum_cleanup_index_scale_factor).

In other words btvacuumcleanup() scans the index only if there are
recyclable pages,
or index statistics is stalled (inserted more than
vacuum_cleanup_index_scale_factor
since last index statistics collection).

4) vacuum_cleanup_index_scale_factor can be set either by GUC or reloption.
Default value is 0.1.  So, by default cleanup scan is triggered after
increasing of
table size by 10%.

5) Since new fields are added to the metapage, BTREE_VERSION is bumped.
In order to support pg_upgrade, read of previous metapage version is
supported.
On metapage rewrite, it's upgraded to the new version.

So, since we don't skip scan of recyclable pages, there is no risk of xid
wraparound.
Risk of stalled statistics is also small, because
vacuum_cleanup_index_scale_factor
default value is quite low.  User can increase
vacuum_cleanup_index_scale_factor
on his own risk and have less load of B-tree cleanup scan bought by more
gap in
index statistics.

Some simple benchmark shows the effect.

Before patch.

# insert into t select i from generate_series(1,1) i;
# create index t_i_idx on t(i);
# vacuum t;
VACUUM
Time: 15639,822 ms (00:15,640)
# insert into t select i from generate_series(1,1000) i;
INSERT 0 1000
Time: 6,195 ms
# vacuum t;
VACUUM
Time: 1012,794 ms (00:01,013)
# insert into t select i from generate_series(1,1000) i;
INSERT 0 1000
Time: 5,276 ms
# vacuum t;
VACUUM
Time: 1013,254 ms (00:01,013)

After patch.

# insert into t select i from generate_series(1,1) i;
# create index t_i_idx on t(i);
# vacuum t;
VACUUM
Time: 15689,450 ms (00:15,689)
# insert into t select i from generate_series(1,1000) i;
INSERT 0 1000
Time: 5,585 ms
# vacuum t;
VACUUM
Time: 50,777 ms
# insert into t select i from generate_series(1,1000) i;
INSERT 0 1000
Time: 5,641 ms
# vacuum t;
VACUUM
Time: 46,997 ms

Thus, vacuum time for append-only table drops from 1000 ms to 50 ms (in
about 20X).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-lazy-btree-cleanup-3.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-05 Thread Alexander Korotkov
On Mon, Mar 5, 2018 at 5:56 AM, Masahiko Sawada 
wrote:

> On Sun, Mar 4, 2018 at 8:59 AM, Alexander Korotkov
>  wrote:
> > On Fri, Mar 2, 2018 at 10:53 AM, Masahiko Sawada 
> > wrote:
> >>
> >> > 2) In the append-only case, index statistics can lag indefinitely.
> >>
> >> The original proposal proposed a new GUC that specifies a fraction of
> >> the modified pages to trigger a cleanup indexes.
> >
> >
> > Regarding original proposal, I didn't get what exactly it's intended to
> be.
> > You're checking if vacuumed_pages >= nblocks *
> vacuum_cleanup_index_scale.
> > But vacuumed_pages is the variable which could be incremented when
> > no indexes exist on the table.  When indexes are present, this variable
> is
> > always
> > zero.  I can assume, that it's intended to compare number of pages where
> > at least one tuple is deleted to nblocks * vacuum_cleanup_index_scale.
> > But that is also not an option for us, because we're going to optimize
> the
> > case when exactly zero tuples is deleted by vacuum.
>
> In the latest v4 patch, I compare scanned_pages and the threshold,
> which means if the number of pages that are modified since the last
> vacuum is larger than the threshold we force cleanup index.
>

Right, sorry I've overlooked that.  However, if even use number of pages
I would still prefer cumulative measure.  So, number of vacuums are
taken into account even if each of them touched only small number of
pages.


> > The thing I'm going to propose is to add estimated number of tuples in
> > table to IndexVacuumInfo.  Then B-tree can memorize that number of tuples
> > when last time index was scanned in the meta-page.  If pass value
> > is differs from the value in meta-page too much, then cleanup is forced.
> >
> > Any better ideas?
>
> I think that would work. But I'm concerned about metapage format
> compatibility.


That's not show-stopper.  B-tree meta page have version number.  So,
it's no problem to provide online update.


> And since I've not fully investigated about cleanup
> index of other index types I'm not sure that interface makes sense. It
> might not be better but an alternative idea is to add a condition
> (Irel[i]->rd_rel->relam == BTREE_AM_OID) in lazy_scan_heap.


I meant putting this logic *inside* btvacuumcleanup() while passing
required measure to IndexVacuumInfo which is accessible from
btvacuumcleanup().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-04 Thread Masahiko Sawada
On Sun, Mar 4, 2018 at 8:59 AM, Alexander Korotkov
 wrote:
> On Fri, Mar 2, 2018 at 10:53 AM, Masahiko Sawada 
> wrote:
>>
>> > 2) In the append-only case, index statistics can lag indefinitely.
>>
>> The original proposal proposed a new GUC that specifies a fraction of
>> the modified pages to trigger a cleanup indexes.
>
>
> Regarding original proposal, I didn't get what exactly it's intended to be.
> You're checking if vacuumed_pages >= nblocks * vacuum_cleanup_index_scale.
> But vacuumed_pages is the variable which could be incremented when
> no indexes exist on the table.  When indexes are present, this variable is
> always
> zero.  I can assume, that it's intended to compare number of pages where
> at least one tuple is deleted to nblocks * vacuum_cleanup_index_scale.
> But that is also not an option for us, because we're going to optimize the
> case when exactly zero tuples is deleted by vacuum.

In the latest v4 patch, I compare scanned_pages and the threshold,
which means if the number of pages that are modified since the last
vacuum is larger than the threshold we force cleanup index.

> The thing I'm going to propose is to add estimated number of tuples in
> table to IndexVacuumInfo.  Then B-tree can memorize that number of tuples
> when last time index was scanned in the meta-page.  If pass value
> is differs from the value in meta-page too much, then cleanup is forced.
>
> Any better ideas?

I think that would work. But I'm concerned about metapage format
compatibility. And since I've not fully investigated about cleanup
index of other index types I'm not sure that interface makes sense. It
might not be better but an alternative idea is to add a condition
(Irel[i]->rd_rel->relam == BTREE_AM_OID) in lazy_scan_heap.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-03 Thread Alexander Korotkov
On Fri, Mar 2, 2018 at 10:53 AM, Masahiko Sawada 
wrote:

> > 2) In the append-only case, index statistics can lag indefinitely.
>
> The original proposal proposed a new GUC that specifies a fraction of
> the modified pages to trigger a cleanup indexes.


Regarding original proposal, I didn't get what exactly it's intended to be.
You're checking if vacuumed_pages >= nblocks * vacuum_cleanup_index_scale.
But vacuumed_pages is the variable which could be incremented when
no indexes exist on the table.  When indexes are present, this variable is
always
zero.  I can assume, that it's intended to compare number of pages where
at least one tuple is deleted to nblocks * vacuum_cleanup_index_scale.
But that is also not an option for us, because we're going to optimize the
case when exactly zero tuples is deleted by vacuum.

The thing I'm going to propose is to add estimated number of tuples in
table to IndexVacuumInfo.  Then B-tree can memorize that number of tuples
when last time index was scanned in the meta-page.  If pass value
is differs from the value in meta-page too much, then cleanup is forced.

Any better ideas?


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-03-01 Thread Masahiko Sawada
On Wed, Feb 28, 2018 at 1:45 AM, Alexander Korotkov
 wrote:
> On Wed, Nov 29, 2017 at 6:06 PM, Masahiko Sawada 
> wrote:
>>
>> On Wed, Nov 29, 2017 at 11:05 PM, Simon Riggs 
>> wrote:
>> > On 25 September 2017 at 22:34, Kyotaro HORIGUCHI
>> >  wrote:
>> >
>> >>> > Here is a small patch that skips scanning btree index if no pending
>> >>> > deleted pages exists.
>> >>> > It detects this situation by comparing pages_deleted with
>> >>> > pages_free.
>> >>
>> >> It seems to work to prevent needless cleanup scans.
>> >
>> > So this leaves us in the situation that
>> >
>> > 1. Masahiko's patch has unresolved problems
>> > 2. Yura's patch works and is useful
>> >
>> > Unless there is disagreement on the above, it seems we should apply
>> > Yura's patch (an edited version, perhaps).
>> >
>>
>> IIRC the patches that makes the cleanup scan skip has a problem
>> pointed by Peter[1], that is that we stash an XID when a btree page is
>> deleted, which is used to determine when it's finally safe to recycle
>> the page. Yura's patch doesn't have that problem?
>>
>> [1]
>> https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
>
>
> Yes, I think Yura's patch doesn't have that problem, because it skips
> cleanup only when there are no recyclable pages.  And that means there
> is no btpo.xact stored, so no XIDs to be wraparounded.

I've looked at the patch again. And you're right, Yura's patch doesn't
have that problem.

>
> BTW, I've rebased Yura's patch.  I think this patch has following issues
> for now:
>
> 1) Adding BTP_NEED_NO_CLEANUP as per-page flag doesn't look nice.

Yeah, the alternative ideas are to store the flag it into pd_rune_xid
of meta page or to use 1bit of btm_version so that we don't break disk
format compatibility.

> 2) In the append-only case, index statistics can lag indefinitely.

The original proposal proposed a new GUC that specifies a fraction of
the modified pages to trigger a cleanup indexes. Combining with Yura's
patch and telling requirement of cleanup to indexes from lazyvacuum
code we can deal with it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-02-27 Thread Alexander Korotkov
On Fri, Jan 12, 2018 at 7:05 AM, Masahiko Sawada 
wrote:

> On Wed, Jan 10, 2018 at 11:28 AM, Masahiko Sawada 
> wrote:
> > On Sun, Jan 7, 2018 at 8:40 AM, Peter Geoghegan  wrote:
> >> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost 
> wrote:
>  > IIRC the patches that makes the cleanup scan skip has a problem
>  > pointed by Peter[1], that is that we stash an XID when a btree page
> is
>  > deleted, which is used to determine when it's finally safe to
> recycle
>  > the page. Yura's patch doesn't have that problem?
>  >
>  > [1] https://www.postgresql.org/message-id/CAH2-Wz%3D1%
> 3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
> >>
> >>> Masahiko Sawada, if this patch isn't viable or requires serious rework
> >>> to be acceptable, then perhaps we should change its status to 'returned
> >>> with feedback' and you can post a new patch for the next commitfest..?
> >>
> >> I believe that the problem that I pointed out with freezing/wraparound
> >> is a solvable problem. If we think about it carefully, we will come up
> >> with a good solution. I have tried to get the ball rolling with my
> >> pd_prune_xid suggestion. I think it's important to not lose sight of
> >> the fact that the page deletion/recycling XID thing is just one detail
> >> that we need to think about some more.
> >>
> >> I cannot fault Sawada-san for waiting to hear other people's views
> >> before proceeding. It really needs to be properly discussed.
> >>
> >
> > Thank you for commenting.
> >
> > IIUC we have two approaches: one idea is based on Peter's suggestion.
> > We can use pd_prune_xid to store epoch of xid of which the page is
> > deleted. That way, we can correctly mark deleted pages as recyclable
> > without breaking on-disk format.
> >
> > Another idea is suggested by  Sokolov Yura. His original patch makes
> > btree have a flag in btpo_flags that implies the btree has deleted but
> > not recyclable page or not. I'd rather want to store it as bool in
> > BTMetaPageData. Currently btm_version is defined as uint32 but I think
> > we won't use all of them. If we store it in part of btm_version we
> > don't break on-disk format. However, we're now assuming that the
> > vacuum on btree index always scans whole btree rather than a part, and
> > this approach will nurture it more. It might be possible that it will
> > become a restriction in the future.
> >
>
> On third thought, it's similar to what we are doing for heaps but we
> can store the oldest btpo.xact of a btree index to somewhere(metapage
> or pg_class.relfrozenxid?) after vacuumed. We can skip cleanup
> vacuuming as long as the xid is not more than a certain threshold old
> (say vacuum_index_cleanup_age). Combining with new parameter I
> proposed before, the condition of doing cleanup index vacuum will
> become as follows.
>
> if (there is garbage on heap)
>Vacuum index, and update oldest btpo.xact
> else /* there is no garbage on heap */
> {
> if (# of scanned pages > (nblocks *
> vacuum_cleanup_index_scale_factor) OR oldest btpo.xact is more than
> vacuum_index_cleanup_age old?))
> Cleanup vacuum index, and update oldest btpo.xact
> }
>
> In current index vacuuming, it scans whole index pages if there is
> even one garbage on heap(I'd like to improve it though someday), which
> it also lead to update the oldest btpo.xact at the time. If
> vacuum_cleanup_index_slace_factor is enough high, we can skip the
> scanning whole index as long as either the table isn't modified for 2
> billion transactions or the oldest btpo.xact isn't more than
> vacuum_index_cleanup_age transactions old. But if there is a opened
> transaction for a very long time, we might not be able to mark deleted
> page as recyclable. Some tricks might be required. Feedback is
> welcome.


Storing oldest btpo.xact seems like a good idea for me.  So, by comparing
oldest btpo.xact to RecentGlobalXmin we can determine if there are
some recyclable pages which are really going to be recycled during
possible cleanup?  Assuming that index pages deletions are rare, that
seems like sufficient criterion to run the cleanup.  Also, this criterion
will eliminate the risk of wraparound, because we'll recycle index pages
not later than we do now.

Another case when running cleanup is necessary (or at least desirable)
is stalled index statistics.  Assuming that index statistics can be stalled
only when there is no deletions from index (otherwise index would
be scanned during vacuum), we can use number of index insertions as
measure of statistics oldness.

So, what about following logic?

if (there is garbage on heap)
   Vacuum index, and update oldest btpo.xact
else /* there is no garbage on heap */
{
if (number of inserted index tuples since last btvacuumscan() /
total number of index tuples >= some threshold
OR oldest btpo.xact is older than RecentGlobalXmin)

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-02-27 Thread Alexander Korotkov
On Wed, Nov 29, 2017 at 6:06 PM, Masahiko Sawada 
wrote:

> On Wed, Nov 29, 2017 at 11:05 PM, Simon Riggs 
> wrote:
> > On 25 September 2017 at 22:34, Kyotaro HORIGUCHI
> >  wrote:
> >
> >>> > Here is a small patch that skips scanning btree index if no pending
> >>> > deleted pages exists.
> >>> > It detects this situation by comparing pages_deleted with pages_free.
> >>
> >> It seems to work to prevent needless cleanup scans.
> >
> > So this leaves us in the situation that
> >
> > 1. Masahiko's patch has unresolved problems
> > 2. Yura's patch works and is useful
> >
> > Unless there is disagreement on the above, it seems we should apply
> > Yura's patch (an edited version, perhaps).
> >
>
> IIRC the patches that makes the cleanup scan skip has a problem
> pointed by Peter[1], that is that we stash an XID when a btree page is
> deleted, which is used to determine when it's finally safe to recycle
> the page. Yura's patch doesn't have that problem?
>
> [1] https://www.postgresql.org/message-id/CAH2-Wz%3D1%
> 3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com


Yes, I think Yura's patch doesn't have that problem, because it skips
cleanup only when there are no recyclable pages.  And that means there
is no btpo.xact stored, so no XIDs to be wraparounded.

BTW, I've rebased Yura's patch.  I think this patch has following issues
for now:

1) Adding BTP_NEED_NO_CLEANUP as per-page flag doesn't look nice.
2) In the append-only case, index statistics can lag indefinitely.
3) Patch definitely needs more comments.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-lazy-btree-cleanup-2.patch
Description: Binary data


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-30 Thread Masahiko Sawada
On Fri, Jan 12, 2018 at 1:05 PM, Masahiko Sawada  wrote:
> On Wed, Jan 10, 2018 at 11:28 AM, Masahiko Sawada  
> wrote:
>> On Sun, Jan 7, 2018 at 8:40 AM, Peter Geoghegan  wrote:
>>> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost  wrote:
> > IIRC the patches that makes the cleanup scan skip has a problem
> > pointed by Peter[1], that is that we stash an XID when a btree page is
> > deleted, which is used to determine when it's finally safe to recycle
> > the page. Yura's patch doesn't have that problem?
> >
> > [1] 
> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
>>>
 Masahiko Sawada, if this patch isn't viable or requires serious rework
 to be acceptable, then perhaps we should change its status to 'returned
 with feedback' and you can post a new patch for the next commitfest..?
>>>
>>> I believe that the problem that I pointed out with freezing/wraparound
>>> is a solvable problem. If we think about it carefully, we will come up
>>> with a good solution. I have tried to get the ball rolling with my
>>> pd_prune_xid suggestion. I think it's important to not lose sight of
>>> the fact that the page deletion/recycling XID thing is just one detail
>>> that we need to think about some more.
>>>
>>> I cannot fault Sawada-san for waiting to hear other people's views
>>> before proceeding. It really needs to be properly discussed.
>>>
>>
>> Thank you for commenting.
>>
>> IIUC we have two approaches: one idea is based on Peter's suggestion.
>> We can use pd_prune_xid to store epoch of xid of which the page is
>> deleted. That way, we can correctly mark deleted pages as recyclable
>> without breaking on-disk format.
>>
>> Another idea is suggested by  Sokolov Yura. His original patch makes
>> btree have a flag in btpo_flags that implies the btree has deleted but
>> not recyclable page or not. I'd rather want to store it as bool in
>> BTMetaPageData. Currently btm_version is defined as uint32 but I think
>> we won't use all of them. If we store it in part of btm_version we
>> don't break on-disk format. However, we're now assuming that the
>> vacuum on btree index always scans whole btree rather than a part, and
>> this approach will nurture it more. It might be possible that it will
>> become a restriction in the future.
>>
>
> On third thought, it's similar to what we are doing for heaps but we
> can store the oldest btpo.xact of a btree index to somewhere(metapage
> or pg_class.relfrozenxid?) after vacuumed. We can skip cleanup
> vacuuming as long as the xid is not more than a certain threshold old
> (say vacuum_index_cleanup_age). Combining with new parameter I
> proposed before, the condition of doing cleanup index vacuum will
> become as follows.
>
> if (there is garbage on heap)
>Vacuum index, and update oldest btpo.xact
> else /* there is no garbage on heap */
> {
> if (# of scanned pages > (nblocks *
> vacuum_cleanup_index_scale_factor) OR oldest btpo.xact is more than
> vacuum_index_cleanup_age old?))
> Cleanup vacuum index, and update oldest btpo.xact
> }
>
> In current index vacuuming, it scans whole index pages if there is
> even one garbage on heap(I'd like to improve it though someday), which
> it also lead to update the oldest btpo.xact at the time. If
> vacuum_cleanup_index_slace_factor is enough high, we can skip the
> scanning whole index as long as either the table isn't modified for 2
> billion transactions or the oldest btpo.xact isn't more than
> vacuum_index_cleanup_age transactions old. But if there is a opened
> transaction for a very long time, we might not be able to mark deleted
> page as recyclable. Some tricks might be required. Feedback is
> welcome.
>

Since this patch still needs to be discussed before proceeding, I've
marked it as "Needs Review". Feedback is very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-11 Thread Masahiko Sawada
On Wed, Jan 10, 2018 at 11:28 AM, Masahiko Sawada  wrote:
> On Sun, Jan 7, 2018 at 8:40 AM, Peter Geoghegan  wrote:
>> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost  wrote:
 > IIRC the patches that makes the cleanup scan skip has a problem
 > pointed by Peter[1], that is that we stash an XID when a btree page is
 > deleted, which is used to determine when it's finally safe to recycle
 > the page. Yura's patch doesn't have that problem?
 >
 > [1] 
 > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
>>
>>> Masahiko Sawada, if this patch isn't viable or requires serious rework
>>> to be acceptable, then perhaps we should change its status to 'returned
>>> with feedback' and you can post a new patch for the next commitfest..?
>>
>> I believe that the problem that I pointed out with freezing/wraparound
>> is a solvable problem. If we think about it carefully, we will come up
>> with a good solution. I have tried to get the ball rolling with my
>> pd_prune_xid suggestion. I think it's important to not lose sight of
>> the fact that the page deletion/recycling XID thing is just one detail
>> that we need to think about some more.
>>
>> I cannot fault Sawada-san for waiting to hear other people's views
>> before proceeding. It really needs to be properly discussed.
>>
>
> Thank you for commenting.
>
> IIUC we have two approaches: one idea is based on Peter's suggestion.
> We can use pd_prune_xid to store epoch of xid of which the page is
> deleted. That way, we can correctly mark deleted pages as recyclable
> without breaking on-disk format.
>
> Another idea is suggested by  Sokolov Yura. His original patch makes
> btree have a flag in btpo_flags that implies the btree has deleted but
> not recyclable page or not. I'd rather want to store it as bool in
> BTMetaPageData. Currently btm_version is defined as uint32 but I think
> we won't use all of them. If we store it in part of btm_version we
> don't break on-disk format. However, we're now assuming that the
> vacuum on btree index always scans whole btree rather than a part, and
> this approach will nurture it more. It might be possible that it will
> become a restriction in the future.
>

On third thought, it's similar to what we are doing for heaps but we
can store the oldest btpo.xact of a btree index to somewhere(metapage
or pg_class.relfrozenxid?) after vacuumed. We can skip cleanup
vacuuming as long as the xid is not more than a certain threshold old
(say vacuum_index_cleanup_age). Combining with new parameter I
proposed before, the condition of doing cleanup index vacuum will
become as follows.

if (there is garbage on heap)
   Vacuum index, and update oldest btpo.xact
else /* there is no garbage on heap */
{
if (# of scanned pages > (nblocks *
vacuum_cleanup_index_scale_factor) OR oldest btpo.xact is more than
vacuum_index_cleanup_age old?))
Cleanup vacuum index, and update oldest btpo.xact
}

In current index vacuuming, it scans whole index pages if there is
even one garbage on heap(I'd like to improve it though someday), which
it also lead to update the oldest btpo.xact at the time. If
vacuum_cleanup_index_slace_factor is enough high, we can skip the
scanning whole index as long as either the table isn't modified for 2
billion transactions or the oldest btpo.xact isn't more than
vacuum_index_cleanup_age transactions old. But if there is a opened
transaction for a very long time, we might not be able to mark deleted
page as recyclable. Some tricks might be required. Feedback is
welcome.

Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-09 Thread Masahiko Sawada
On Sun, Jan 7, 2018 at 8:40 AM, Peter Geoghegan  wrote:
> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost  wrote:
>>> > IIRC the patches that makes the cleanup scan skip has a problem
>>> > pointed by Peter[1], that is that we stash an XID when a btree page is
>>> > deleted, which is used to determine when it's finally safe to recycle
>>> > the page. Yura's patch doesn't have that problem?
>>> >
>>> > [1] 
>>> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
>
>> Masahiko Sawada, if this patch isn't viable or requires serious rework
>> to be acceptable, then perhaps we should change its status to 'returned
>> with feedback' and you can post a new patch for the next commitfest..?
>
> I believe that the problem that I pointed out with freezing/wraparound
> is a solvable problem. If we think about it carefully, we will come up
> with a good solution. I have tried to get the ball rolling with my
> pd_prune_xid suggestion. I think it's important to not lose sight of
> the fact that the page deletion/recycling XID thing is just one detail
> that we need to think about some more.
>
> I cannot fault Sawada-san for waiting to hear other people's views
> before proceeding. It really needs to be properly discussed.
>

Thank you for commenting.

IIUC we have two approaches: one idea is based on Peter's suggestion.
We can use pd_prune_xid to store epoch of xid of which the page is
deleted. That way, we can correctly mark deleted pages as recyclable
without breaking on-disk format.

Another idea is suggested by  Sokolov Yura. His original patch makes
btree have a flag in btpo_flags that implies the btree has deleted but
not recyclable page or not. I'd rather want to store it as bool in
BTMetaPageData. Currently btm_version is defined as uint32 but I think
we won't use all of them. If we store it in part of btm_version we
don't break on-disk format. However, we're now assuming that the
vacuum on btree index always scans whole btree rather than a part, and
this approach will nurture it more. It might be possible that it will
become a restriction in the future.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-06 Thread Peter Geoghegan
Hi Stephen,

On Sat, Jan 6, 2018 at 4:02 PM, Stephen Frost  wrote:
> Perhaps it should really be in Needs review state then..?

Probably.

As I pointed out already some time ago, this RecentGlobalXmin
interlock stuff is our particular implementation of what Lanin &
Shasha call "The Drain Technique" within "2.5 Freeing Empty Nodes".
It's not easy to understand why this is necessary in general (you have
to understand the whole paper for that), but our implementation of the
drain technique is simple. I'm afraid that I made the problem sound
scarier than it actually is.

As Lanin & Shasha put it: "freeing empty nodes by the drain technique
is transparent to the rest of the algorithm and can be added by a
separate module". nbtree doesn't actually care very much about XIDs --
testing an XID against RecentGlobalXmin was just the most convenient
way of using L's technique to defer recycling until it is definitely
safe. We only need to make sure that _bt_page_recyclable() cannot
become confused by XID wraparound to fix this problem -- that's it.

-- 
Peter Geoghegan



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-06 Thread Stephen Frost
Greetings Peter,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost  wrote:
> >> > IIRC the patches that makes the cleanup scan skip has a problem
> >> > pointed by Peter[1], that is that we stash an XID when a btree page is
> >> > deleted, which is used to determine when it's finally safe to recycle
> >> > the page. Yura's patch doesn't have that problem?
> >> >
> >> > [1] 
> >> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
> 
> > Masahiko Sawada, if this patch isn't viable or requires serious rework
> > to be acceptable, then perhaps we should change its status to 'returned
> > with feedback' and you can post a new patch for the next commitfest..?
> 
> I believe that the problem that I pointed out with freezing/wraparound
> is a solvable problem. If we think about it carefully, we will come up
> with a good solution. I have tried to get the ball rolling with my
> pd_prune_xid suggestion. I think it's important to not lose sight of
> the fact that the page deletion/recycling XID thing is just one detail
> that we need to think about some more.
> 
> I cannot fault Sawada-san for waiting to hear other people's views
> before proceeding. It really needs to be properly discussed.

Thanks for commenting!

Perhaps it should really be in Needs review state then..?

Either way, thanks again for the clarification and hopefully this will
revive the discussion.

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-06 Thread Peter Geoghegan
On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost  wrote:
>> > IIRC the patches that makes the cleanup scan skip has a problem
>> > pointed by Peter[1], that is that we stash an XID when a btree page is
>> > deleted, which is used to determine when it's finally safe to recycle
>> > the page. Yura's patch doesn't have that problem?
>> >
>> > [1] 
>> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com

> Masahiko Sawada, if this patch isn't viable or requires serious rework
> to be acceptable, then perhaps we should change its status to 'returned
> with feedback' and you can post a new patch for the next commitfest..?

I believe that the problem that I pointed out with freezing/wraparound
is a solvable problem. If we think about it carefully, we will come up
with a good solution. I have tried to get the ball rolling with my
pd_prune_xid suggestion. I think it's important to not lose sight of
the fact that the page deletion/recycling XID thing is just one detail
that we need to think about some more.

I cannot fault Sawada-san for waiting to hear other people's views
before proceeding. It really needs to be properly discussed.

-- 
Peter Geoghegan



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-01-06 Thread Stephen Frost
Greetings,

(pruned the CC list)

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Nov 30, 2017 at 12:06 AM, Masahiko Sawada  
> wrote:
> > On Wed, Nov 29, 2017 at 11:05 PM, Simon Riggs  wrote:
> >> Unless there is disagreement on the above, it seems we should apply
> >> Yura's patch (an edited version, perhaps).
> >>
> >
> > IIRC the patches that makes the cleanup scan skip has a problem
> > pointed by Peter[1], that is that we stash an XID when a btree page is
> > deleted, which is used to determine when it's finally safe to recycle
> > the page. Yura's patch doesn't have that problem?
> >
> > [1] 
> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
> 
> The latest patch present on this thread does not apply
> (https://www.postgresql.org/message-id/c4a47caef28024ab7528946bed264...@postgrespro.ru).
> Please send a rebase. For now I am moving the patch to next CF, with
> "waiting on author".

We are now just about a week into the CF and this patch hasn't been
updated and is still in 'waiting on author' state and it sounds like it
might have a pretty serious issue based on the above comment.

Masahiko Sawada, if this patch isn't viable or requires serious rework
to be acceptable, then perhaps we should change its status to 'returned
with feedback' and you can post a new patch for the next commitfest..?

Otherwise, I'd encourage you to try and find time to post an updated
patch soon, so that it can be reviewed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-11-30 Thread Michael Paquier
On Thu, Nov 30, 2017 at 12:06 AM, Masahiko Sawada  wrote:
> On Wed, Nov 29, 2017 at 11:05 PM, Simon Riggs  wrote:
>> Unless there is disagreement on the above, it seems we should apply
>> Yura's patch (an edited version, perhaps).
>>
>
> IIRC the patches that makes the cleanup scan skip has a problem
> pointed by Peter[1], that is that we stash an XID when a btree page is
> deleted, which is used to determine when it's finally safe to recycle
> the page. Yura's patch doesn't have that problem?
>
> [1] 
> https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com

The latest patch present on this thread does not apply
(https://www.postgresql.org/message-id/c4a47caef28024ab7528946bed264...@postgrespro.ru).
Please send a rebase. For now I am moving the patch to next CF, with
"waiting on author".
-- 
Michael