Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-04-01 Thread Jeff Janes
On Sat, Apr 1, 2017 at 10:09 AM, Alvaro Herrera 
wrote:

> Alvaro Herrera wrote:
>
> > I also removed the behavior that on index creation the final partial
> > block range is always summarized.  It's pointless.
>
> I just pushed this, without this change, because it breaks
> src/test/modules/brin.  I still think it's pointless, but it'd require
> more than one line to change.
>


This is failing for me (and the entire build farm, it looks like).

Cheers,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-04-01 Thread Alvaro Herrera
Alvaro Herrera wrote:

> I also removed the behavior that on index creation the final partial
> block range is always summarized.  It's pointless.

I just pushed this, without this change, because it breaks
src/test/modules/brin.  I still think it's pointless, but it'd require
more than one line to change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-04-01 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Mar 21, 2017 at 4:10 PM, Alvaro Herrera
>  wrote:
> > Well, the number of work items is currently fixed; but if you have many
> > BRIN indexes, then you'd overflow (lose requests).  By using DSA I am
> > making it easy to patch this afterwards so that an arbitrary number of
> > requests can be recorded.
> 
> But that could also use an arbitrarily large amount of memory, and any
> leaks will be cluster-lifespan.

Good point -- probably not such a great idea as presented.  The patch
only uses a fixed amount of memory currently, so it should be fine on
that front.  I think this is a good enough start.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-31 Thread Robert Haas
On Tue, Mar 21, 2017 at 4:10 PM, Alvaro Herrera
 wrote:
> Well, the number of work items is currently fixed; but if you have many
> BRIN indexes, then you'd overflow (lose requests).  By using DSA I am
> making it easy to patch this afterwards so that an arbitrary number of
> requests can be recorded.

But that could also use an arbitrarily large amount of memory, and any
leaks will be cluster-lifespan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-31 Thread Alvaro Herrera
Here's a version of this patch which I consider final.

Thanks for your tips on using DSA.  No crashes now.

I am confused about not needing dsa_attach the second time around.  If I
do that, the dsa handle has been 0x7f'd, which I don't understand since
it is supposed to be allocated in TopMemoryContext.  I didn't dig too
deep to try and find what is causing that behavior.  Once we do, it's
easy to remove the dsa_detach/dsa_attach calls.

I added a new SQL-callable function to invoke summarization of an
individual page range.  That is what I wanted to do in vacuum (rather
than a scan of the complete index), and it seems independently useful.

I also removed the behavior that on index creation the final partial
block range is always summarized.  It's pointless.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 5bf11dc..5140a38 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -74,9 +74,14 @@
tuple; those tuples remain unsummarized until a summarization run is
invoked later, creating initial summaries.
This process can be invoked manually using the
-   brin_summarize_new_values(regclass) function,
-   or automatically when VACUUM processes the table.
+   brin_summarize_range(regclass, bigint) or
+   brin_summarize_new_values(regclass) functions;
+   automatically when VACUUM processes the table;
+   or by automatic summarization executed by autovacuum, as insertions
+   occur.  (This last trigger is disabled by default and can be enabled
+   with the autosummarize parameter.)
   
+
  
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6887eab..25c18d1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19685,6 +19685,13 @@ postgres=# SELECT * FROM 
pg_walfile_name_offset(pg_stop_backup());
   
   

+brin_summarize_range(index 
regclass, blockNumber 
bigint)
+   
+   integer
+   summarize the page range covering the given block, if not 
already summarized
+  
+  
+   
 gin_clean_pending_list(index 
regclass)

bigint
@@ -19700,7 +19707,8 @@ postgres=# SELECT * FROM 
pg_walfile_name_offset(pg_stop_backup());
 that are not currently summarized by the index; for any such range
 it creates a new summary index tuple by scanning the table pages.
 It returns the number of new page range summaries that were inserted
-into the index.
+into the index.  brin_summarize_range does the same, except
+it only summarizes the range that covers the given block number.

 

diff --git a/doc/src/sgml/ref/create_index.sgml 
b/doc/src/sgml/ref/create_index.sgml
index 7163b03..83ee7d3 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -382,7 +382,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS 
] 
 

-BRIN indexes accept a different parameter:
+BRIN indexes accept different parameters:

 

@@ -396,6 +396,16 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS 
] 
 

+
+   
+autosummarize
+
+
+ Defines whether a summarization run is invoked for the previous page
+ range whenever an insertion is detected on the next one.
+
+
+   

   
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b22563b..707d04e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_am.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "utils/builtins.h"
@@ -60,10 +61,12 @@ typedef struct BrinOpaque
BrinDesc   *bo_bdesc;
 } BrinOpaque;
 
+#define BRIN_ALL_BLOCKRANGES   InvalidBlockNumber
+
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
   BrinRevmap *revmap, 
BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
-static void brinsummarize(Relation index, Relation heapRel,
+static void brinsummarize(Relation index, Relation heapRel, BlockNumber 
pageRange,
  double *numSummarized, double *numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
@@ -126,8 +129,11 @@ brinhandler(PG_FUNCTION_ARGS)
  * with those of the new tuple.  If the tuple values are not consistent with
  * the summary tuple, we need to update the index tuple.
  *
+ * If autosummarization is enabled, check if we need to summarize the previous
+ * page range.
+ *
  * If the range is not currently summarized (i.e. the revmap returns NULL for
- * it), there's nothing to do.
+ * it), there's nothing to do for this tuple.
  */
 bool
 

Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-21 Thread Alvaro Herrera
Thomas Munro wrote:

> What is your motivation for using DSA?  It seems you are creating an
> area and then using it to make exactly one allocation of a constant
> size known up front to hold your fixed size workitems array.  You
> don't do any dynamic allocation at runtime, apart from the detail that
> it happens to allocated on demand.  Perhaps it would make sense if you
> had a separate space per database or something like that, so that the
> shared memory need would be dynamic?

Well, the number of work items is currently fixed; but if you have many
BRIN indexes, then you'd overflow (lose requests).  By using DSA I am
making it easy to patch this afterwards so that an arbitrary number of
requests can be recorded.  

> It looks like outstanding autosummarisation work will be forgotten if
> you restart before it is processed.

That's true.  However, it would be easy to make index scans also request
work items if they find a full page range that should have been
summarized, so if they are lost, it's not a big deal.

> Over in another thread[1] we
> exchanged emails on another way to recognise that summarisation work
> needs to be done, if we are only interested in unsummarised ranges at
> the end of the heap.  I haven't studied BRIN enough to know if that is
> insufficient: can you finish up with unsummarised ranges not in a
> contiguous range at the end of the heap?

If we include my other patch to remove the index tuple for a certain
range, then yes, it can happen.  (That proposed patch requires manual
action, but range invalidation could also be invoked automatically when,
say, a certain number of tuples are removed from a page range.)

> If so, perhaps the BRIN
> index itself should also have a way to record that certain non-final
> ranges are unsummarised but should be summarised asynchronously?

I think this is unnecessary, and would lead to higher operating
overhead.  With this patch, it's very cheap to file a work item.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-21 Thread Alvaro Herrera
Thomas Munro wrote:

> Another thought about this design:  Why autovacuum?

One reason is that autovacuum is already there, so it's convenient to
give it the responsibility for this kind of task.  Another reason is
that autovacuum is already doing this, via vacuum.  I don't see the
need to have a completely different process set for tasks that belong to
the system's cleanup process.

With this infrastructure, we could have other types of individual tasks
that could be run by autovacuum.  GIN pending list cleanup, for
instance, or VM bit setting.  Both of those are currently being doing
whenever VACUUM fires, but only because at the time they were written
there was no other convenient place to hook them onto.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-02 Thread Thomas Munro
On Wed, Mar 1, 2017 at 6:06 PM, Thomas Munro
 wrote:
> On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera  
> wrote:
>> I think one of the most serious issues with BRIN indexes is how they
>> don't get updated automatically as the table is filled.  This patch
>> attempts to improve on that.  During brininsert() time, we check whether
>> we're inserting the first item on the first page in a range.  If we are,
>> request autovacuum to do a summarization run on that table.  This is
>> dependent on a new reloption for BRIN called "autosummarize", default
>> off.
>
> Nice.

Another thought about this design:  Why autovacuum?

Obviously we don't want to get to the point where you start up
PostgreSQL and see 25 llines like BRIN SummarizationLauncher started,
Foo Launcher started, Bar Launcher started, ... but perhaps there is a
middle ground between piling all the background work into the
autovacuum framework, and making new dedicated launchers and workers
for each thing.

Is there some way we could turn this kind of maintenance work into a
'task' (insert better word) that can be scheduled to run
asynchronously by magic workers, so that you don't have to supply a
whole worker and main loop and possibly launcher OR jam new
non-vacuum-related work into the vacuum machinery, for each thing like
this that someone decides to invent?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-02 Thread Thomas Munro
On Wed, Mar 1, 2017 at 6:06 PM, Thomas Munro
 wrote:
> On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera  
> wrote:
>> I think one of the most serious issues with BRIN indexes is how they
>> don't get updated automatically as the table is filled.  This patch
>> attempts to improve on that.  During brininsert() time, we check whether
>> we're inserting the first item on the first page in a range.  If we are,
>> request autovacuum to do a summarization run on that table.  This is
>> dependent on a new reloption for BRIN called "autosummarize", default
>> off.
>
> Nice.
>
>> The way the request works is that autovacuum maintains a DSA which can
>> be filled by backends with "work items".  Currently, work items can
>> specify a BRIN summarization of some specific index; in the future we
>> could use this framework to request other kinds of things that do not
>> fit in the "dead tuples / recently inserted tuples" logic that autovac
>> currently uses to decide to vacuum/analyze tables.
>>
>> However, it seems I have not quite gotten the hang of DSA just yet,
>> because after a couple of iterations, crashes occur.  I think the reason
>> has to do with either a resource owner clearing the DSA at an unwelcome
>> time, or perhaps there's a mistake in my handling of DSA "relative
>> pointers" stuff.
>
> Ok, I'll take a look.  It's set up for ease of use in short lifespan
> situations like parallel query, and there are a few extra hoops to
> jump through for longer lived DSA areas.

I haven't tested this, but here is some initial feedback after reading
it through once:

 /*
  * Attach to an area given a handle generated (possibly in another process) by
- * dsa_get_area_handle.  The area must have been created with dsa_create (not
+ * dsa_get_handle.  The area must have been created with dsa_create (not
  * dsa_create_in_place).
  */

This is an independent slam-dunk typo fix.

+/*
+ * Set up our DSA so that backends can install work-item requests.  It may
+ * already exist as created by a previous launcher.
+ */
+if (!AutoVacuumShmem->av_dsa_handle)
+{
+LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+AutoVacuumDSA = dsa_create(LWTRANCHE_AUTOVACUUM);
+AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
+/* delay array allocation until first request */
+AutoVacuumShmem->av_workitems = InvalidDsaPointer;
+LWLockRelease(AutovacuumLock);
+}
+else
+AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);

I haven't looked into the autovacuum launcher lifecycle, but if it can
be restarted as implied by the above then I see no reason to believe
that the DSA area still exists at the point where you call
dsa_attach() here.  DSA areas are reference counted, so if there ever
a scenario where no backend is currently attached, then it will be
destroyed and this call will fail.  If you want to create a DSA area
that lasts until cluster shutdown even while all backends are
detached, you need to call dsa_pin() after creating it.

In AutoVacuumRequestWork:

+AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+
+if (!AutoVacuumDSA)
+{
+/* cannot attach?  disregard request */
+LWLockRelease(AutovacuumLock);
+return;
+}

dsa_attach either succeeds or throws, so that conditional code is unreachable.

+/* XXX surely there is a simpler way to do this */
+wi_ptr = AutoVacuumShmem->av_workitems + sizeof(AutovacWorkItems) +
+sizeof(AutoVacuumWorkItem) * i;
+workitem = (AutoVacuumWorkItem *)
dsa_get_address(AutoVacuumDSA, wi_ptr);

It'd probably be simpler to keep hold of the backend-local address of
the the base of the workitems array and then use regular C language
facilities like array notation to work with it: workitems =
[i], and then:

+/* ... and put it on autovacuum's to-do list */
+workitems->avs_usedItems = wi_ptr;

Considering that i is really an index into the contiguous workitems
array, maybe you should really just be storing the index from i here,
instead of dealing in dsa_pointers.  The idea with dsa_pointers is
that they're useful for complex data structures that might point into
different DSM segments, like a hash table or binary tree that has
internal pointers that could pointer arbitrary other objects in a data
structure because it's allocated in incremental pieces.  Here, you are
dealing with objects in a contiguous memory space of fixed size.  This
leads to a bigger question about this design, which I'll ask at the
end.

Then at the bottom of AutoVacuumRequestWork:

+LWLockRelease(AutovacuumLock);
+dsa_detach(AutoVacuumDSA);
+AutoVacuumDSA = NULL;
+}

I'm guessing that you intended to remain attached, rather than
detaching at the end like this?  Otherwise every backend that is
inserting lots of new data attaches and detaches 

Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-02-28 Thread Thomas Munro
On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera  wrote:
> I think one of the most serious issues with BRIN indexes is how they
> don't get updated automatically as the table is filled.  This patch
> attempts to improve on that.  During brininsert() time, we check whether
> we're inserting the first item on the first page in a range.  If we are,
> request autovacuum to do a summarization run on that table.  This is
> dependent on a new reloption for BRIN called "autosummarize", default
> off.

Nice.

> The way the request works is that autovacuum maintains a DSA which can
> be filled by backends with "work items".  Currently, work items can
> specify a BRIN summarization of some specific index; in the future we
> could use this framework to request other kinds of things that do not
> fit in the "dead tuples / recently inserted tuples" logic that autovac
> currently uses to decide to vacuum/analyze tables.
>
> However, it seems I have not quite gotten the hang of DSA just yet,
> because after a couple of iterations, crashes occur.  I think the reason
> has to do with either a resource owner clearing the DSA at an unwelcome
> time, or perhaps there's a mistake in my handling of DSA "relative
> pointers" stuff.

Ok, I'll take a look.  It's set up for ease of use in short lifespan
situations like parallel query, and there are a few extra hoops to
jump through for longer lived DSA areas.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] brin autosummarization -- autovacuum "work items"

2017-02-28 Thread Alvaro Herrera
I think one of the most serious issues with BRIN indexes is how they
don't get updated automatically as the table is filled.  This patch
attempts to improve on that.  During brininsert() time, we check whether
we're inserting the first item on the first page in a range.  If we are,
request autovacuum to do a summarization run on that table.  This is
dependent on a new reloption for BRIN called "autosummarize", default
off.

The way the request works is that autovacuum maintains a DSA which can
be filled by backends with "work items".  Currently, work items can
specify a BRIN summarization of some specific index; in the future we
could use this framework to request other kinds of things that do not
fit in the "dead tuples / recently inserted tuples" logic that autovac
currently uses to decide to vacuum/analyze tables.

However, it seems I have not quite gotten the hang of DSA just yet,
because after a couple of iterations, crashes occur.  I think the reason
has to do with either a resource owner clearing the DSA at an unwelcome
time, or perhaps there's a mistake in my handling of DSA "relative
pointers" stuff.

This patch was initially written by Simon Riggs, who envisioned that
brininsert itself would invoke the summarization.  However, this doesn't
work because summarization requires having ShareUpdateExclusive lock,
which brininsert doesn't have.  So I modified things to instead use the
DSA stuff.  (He also set things up so that brininsert would only
summarize the just-filled range, but I didn't preserve that idea in the
autovacuum-based implementation; some changed lines there can probably
be removed.)

-- 
Álvaro HerreraPostgreSQL Expert, https://www.2ndQuadrant.com/
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 6448b18..480895b 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -74,9 +74,13 @@
tuple; those tuples remain unsummarized until a summarization run is
invoked later, creating initial summaries.
This process can be invoked manually using the
-   brin_summarize_new_values(regclass) function,
-   or automatically when VACUUM processes the table.
+   brin_summarize_new_values(regclass) function;
+   automatically when VACUUM processes the table;
+   or by automatic summarization executed by autovacuum, as insertions
+   occur.  (This last trigger is disabled by default and is enabled with
+   the parameter autosummarize.)
   
+
  
 
 
diff --git a/doc/src/sgml/ref/create_index.sgml 
b/doc/src/sgml/ref/create_index.sgml
index fcb7a60..80d9c39 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -382,7 +382,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS 
] 
 

-BRIN indexes accept a different parameter:
+BRIN indexes accept different parameters:

 

@@ -396,6 +396,16 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS 
] 
 

+
+   
+autosummarize
+
+
+ Defines whether a summarization run is invoked for the previous page
+ range whenever an insertion is detected on the next one.
+
+
+   

   
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b22563b..01586ff 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_am.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "utils/builtins.h"
@@ -60,10 +61,12 @@ typedef struct BrinOpaque
BrinDesc   *bo_bdesc;
 } BrinOpaque;
 
+#define BRIN_ALL_BLOCKRANGES   InvalidBlockNumber
+
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
   BrinRevmap *revmap, 
BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
-static void brinsummarize(Relation index, Relation heapRel,
+static void brinsummarize(Relation index, Relation heapRel, BlockNumber 
pageRange,
  double *numSummarized, double *numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
@@ -126,8 +129,11 @@ brinhandler(PG_FUNCTION_ARGS)
  * with those of the new tuple.  If the tuple values are not consistent with
  * the summary tuple, we need to update the index tuple.
  *
+ * If autosummarization is enabled, check if we need to summarize the previous
+ * page range.
+ *
  * If the range is not currently summarized (i.e. the revmap returns NULL for
- * it), there's nothing to do.
+ * it), there's nothing to do for this tuple.
  */
 bool
 brininsert(Relation idxRel, Datum *values, bool *nulls,
@@ -141,6 +147,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
Buffer  buf = InvalidBuffer;
MemoryContext tupcxt = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
+   bool