Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-23 Thread Amit Kapila
On Wed, Sep 20, 2017 at 9:19 AM, Amit Kapila  wrote:
> On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier
>  wrote:
>> On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane  wrote:
>>> Amit Kapila  writes:
 On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
  wrote:
> I am not saying that no index AMs take advantage FPW compressibility
> for their meta pages. There are cases like this one, as well as one
> code path in BRIN where this is useful, and it is useful as well when
> logging FPWs of the init forks for unlogged relations.
>>>
 Hmm, why is it useful for logging FPWs of the init forks for unlogged
 relations?  We don't use REGBUF_STANDARD in those cases.
>>>
>>> But if we started to do so, that would be a concrete benefit of this
>>> patch ...
>>
>> In the proposed set of patches, all the empty() routines part of index
>> AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the
>> right thing by updating log_newpage_buffer(). btree also should have
>> its call to log_newpage updated in btbuildempty(), and your patch is
>> missing that.
>>
>
> We can add that for btree patch.
>

Added and updated the comments for both btree and hash index patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


change_metapage_usage_btree-v2.patch
Description: Binary data


change_metapage_usage_hash-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-23 Thread Peter Eisentraut
On 9/21/17 14:15, Jeff Janes wrote:
> Here is a patch that expands the SCRAM documentation a bit, adds more
> explanation how the different options are related, and sets some better
> links.  I think now you can get from the release notes to the relevant
> documentation and have enough information on how to put the new features
> into use.
> 
> 
> 
> This looks good to me.  Might suggest adding verifying the clients as a
> specific step:

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-23 Thread Euler Taveira
2017-09-23 14:01 GMT-03:00 Alvaro Hernandez :
> However, AFAIK, AWS's DMS uses it for production purposes (see
> http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).
>
It seems a bad idea. AFAICS test_decoding was not designed to be a
ready-for-production plugin. It is just a proof of concept for logical
decoding.

> I would be happy to see another logical decoding plugin into core
> starting on 11. However, this also poses a bit of a challenge for middleware
> implementors: you need to support one for 9.4-9.5 (test_decoding), another
> for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users
> to install a binary plugin is very unsexy, so these are the options
> available.
>
wal2json works for 9.4+ (besides the WAL messages I committed a month
ago). Since this boat was already shipped we can arrange some packages
for 9.4-10 (an external project) and ask vendors to support the
backward-compatible plugin. The middleware implementor will have to
support this new plugin format. Being JSON a widespread format, it is
easier to refactor the code to parse JSON.

> However, having said that, and while json is a great output format for
> interoperability, if there's a discussion on which plugin to include next,
> I'd also favor one that has some more compact representation format (or that
> supports several formats, not only json).
>
We could certainly extend pgoutput to support more than one format
(like pglogical did AFAIR), however, we wouldn't reuse code (different
formats) and will have a fat plugin (I don't foresee a plugin using
different formats in the same connection. It is difficult to
coordinate a change like that having only one-way communication).


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Tom Lane
Andrew Dunstan  writes:
> OK, I think I'm convinced. Here's is the WIP code I put together for the
> blacklist. I'm was looking for a place to put the init call, but since
> it's possibly not going anywhere I stopped :-) . My initial thought
> about substransactions was that we should ignore them for this purpose
> (That's why I used TopTransactionContext for the table).

For the blacklist, I agree we could just ignore subtransactions: all
subtransaction levels are equally uncommitted for this purpose, and
leaving entries from failed subtransactions in place seems like a
non-issue, since they'd never be referenced again.  (Well, barring OID
wraparound and an enum-value-OID collision while the transaction runs,
but I think we can ignore that as having probability epsilon.)

But you need to actually put the table in TopTransactionContext, not
CurTransactionContext ;-).  Also, I don't think you need an init call
so much as an end-of-transaction cleanup call.  Maybe call it
AtEOXactEnum(), for consistency with other functions called in the
same area.

> w.r.t. table size - how large? I confess I haven't seen any systems with
> more than a few hundred enum types. But even a million or two shouldn't
> consume a huge amount of memory, should it?

Dynahash tables are self-expanding, so I don't see a need to stress about
that too much.  Anything in 10-100 seems reasonable for initial size.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Andrew Dunstan


On 09/23/2017 03:52 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 09/23/2017 02:00 PM, Tom Lane wrote:
>>> So I'm back to not being sure about the path forward.  Maybe it would be
>>> all right to say "the value added by ADD VALUE can't be used in the same
>>> transaction, period".  That's still a step forward compared to the pre-v10
>>> prohibition on doing it at all.  I don't remember if there were use-cases
>>> where we really needed the exception for new-in-transaction types.
>> Well, my idea was to have the test run like this:
>>   * is the value an old one? Test txnid of tuple. If yes it's ok
>>   * is the value one created by ALTER TYPE ADD VALUE? Test
>> blacklist. If no, it's ok.
>>   * is the enum a new one? Test whitelist. If yes, it's ok.
>>   * anything else is not ok.
> My point is that if you do 1 and 3, you don't need 2.  Or if you do
> 2 and 3, you don't need 1.  But in most cases, testing the tuple
> hint bits is cheap, so you don't really want that option.
>
> In any case, what I'm worried about is the amount of bookkeeping
> overhead added by keeping a whitelist of enum-types-created-in-
> current-transaction.  That's less than trivial, especially since
> you have to account correctly for subtransactions.  And there are
> common use-cases where that table will become large.
>
>> If we just did the blacklist and stuck with our current heuristic test
>> for enum being created in the current transaction, we'd still probably
>> avoid 99% of the problems, including specifically the one that gave rise
>> to the bug report.
> True.  But I'm not sure whether the heuristic test is adding anything
> meaningful if we use a blacklist first.  The case where it could help
> is
>
>   begin;
>   create type t as enum();
>   alter type t add value 'v';
>   -- do something with 'v'
>   commit;
>
> That perhaps is worth something, but if somebody is trying to build a new
> enum type in pieces like that, doesn't it seem fairly likely that they
> might throw in an ALTER OWNER or GRANT as well?  My feeling is that the
> lesson we need to learn is that the heuristic test isn't good enough.
>
>   


OK, I think I'm convinced. Here's is the WIP code I put together for the
blacklist. I'm was looking for a place to put the init call, but since
it's possibly not going anywhere I stopped :-) . My initial thought
about substransactions was that we should ignore them for this purpose
(That's why I used TopTransactionContext for the table).

I agree the heuristic test isn't good enough, and if we can get a 100%
accurate test for the newness of the enum type then the blacklist would
be redundant.

w.r.t. table size - how large? I confess I haven't seen any systems with
more than a few hundred enum types. But even a million or two shouldn't
consume a huge amount of memory, should it?

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index fe61d4d..52c1271 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -28,6 +28,8 @@
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 
@@ -38,6 +40,9 @@ Oid			binary_upgrade_next_pg_enum_oid = InvalidOid;
 static void RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems);
 static int	sort_order_cmp(const void *p1, const void *p2);
 
+/* hash table of values added in current transaction by AddEnumLabel */
+
+static HTAB *enum_blacklist = NULL;
 
 /*
  * EnumValuesCreate
@@ -460,8 +465,44 @@ restart:
 	heap_freetuple(enum_tup);
 
 	heap_close(pg_enum, RowExclusiveLock);
+
+	/* set up blacklist hash if required */
+	if (enum_blacklist == NULL)
+	{
+		HASHCTL hash_ctl;
+		memset(_ctl, 0, sizeof(hash_ctl));
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(Oid);
+		hash_ctl.hcxt = CurTransactionContext;
+		enum_blacklist = hash_create("Enum blacklist for current transaction",
+		   32,
+		   _ctl,
+		   HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	}
+
+	/* and add the new value to the blacklist */
+
+	(void) hash_search(enum_blacklist, , HASH_ENTER, NULL);
 }
 
+bool
+EnumBlacklisted(Oid enum_id)
+{
+	bool found;
+
+	if (enum_blacklist == NULL)
+		return false;
+
+	(void) hash_search(enum_blacklist, _id, HASH_FIND, );
+	return found;
+}
+
+void
+InitEnumBlacklist(void)
+{
+	enum_blacklist = NULL;
+}
 
 /*
  * RenameEnumLabel
diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c
index 973397c..a7ba3d0 100644
--- a/src/backend/utils/adt/enum.c
+++ b/src/backend/utils/adt/enum.c
@@ -76,6 +76,10 @@ check_safe_enum_use(HeapTuple enumval_tup)
 		TransactionIdDidCommit(xmin))
 		

[HACKERS] Proposed fix for bug #14826 (the invalid empty array business)

2017-09-23 Thread Tom Lane
I poked around among the callers of construct_[md_]array() and found at
least two more besides ts_lexize() that could build zero-element arrays;
one of very ancient standing is in the pg_prepared_statements view.

So I think we've got a clear hazard here that justifies changing the
behavior of the low-level array functions, rather than expecting every
call site to be on its guard about empty arrays.  Accordingly, I propose
the attached patch which makes construct_md_array responsible for getting
this right.

In principle we could now remove code from the places that do take care to
call construct_empty_array instead.  But I found only one place where it
really seemed worth removing anything, in ExecEvalArrayExpr.

I'm a little bit scared about back-patching this, as it seems at least
possible that some external code could be broken by the change in what
construct_[md_]array could hand back.  Given that some of these bugs
have been in place for ~ten years and nobody noticed till now, seems
like it might be good enough to fix them in HEAD only.

Comments?

regards, tom lane

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index bd8a15d..09abd46 100644
*** a/src/backend/executor/execExprInterp.c
--- b/src/backend/executor/execExprInterp.c
*** ExecEvalArrayExpr(ExprState *state, Expr
*** 2131,2144 
  		Datum	   *dvalues = op->d.arrayexpr.elemvalues;
  		bool	   *dnulls = op->d.arrayexpr.elemnulls;
  
- 		/* Shouldn't happen here, but if length is 0, return empty array */
- 		if (nelems == 0)
- 		{
- 			*op->resvalue =
- PointerGetDatum(construct_empty_array(element_type));
- 			return;
- 		}
- 
  		/* setup for 1-D array of the given length */
  		ndims = 1;
  		dims[0] = nelems;
--- 2131,2136 
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index e4101c9..d1f2fe7 100644
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
*** array_map(FunctionCallInfo fcinfo, Oid r
*** 3297,3302 
--- 3297,3303 
   *
   * A palloc'd 1-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
+  * Also note the result will be 0-D not 1-D if nelems = 0.
   *
   * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
*** construct_array(Datum *elems, int nelems
*** 3331,3336 
--- 3332,3338 
   *
   * A palloc'd ndims-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
+  * Also note the result will be 0-D not ndims-D if any dims[i] = 0.
   *
   * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
*** construct_md_array(Datum *elems,
*** 3362,3373 
   errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
  		ndims, MAXDIM)));
  
- 	/* fast track for empty array */
- 	if (ndims == 0)
- 		return construct_empty_array(elmtype);
- 
  	nelems = ArrayGetNItems(ndims, dims);
  
  	/* compute required space */
  	nbytes = 0;
  	hasnulls = false;
--- 3364,3375 
   errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
  		ndims, MAXDIM)));
  
  	nelems = ArrayGetNItems(ndims, dims);
  
+ 	/* if ndims <= 0 or any dims[i] == 0, return empty array */
+ 	if (nelems <= 0)
+ 		return construct_empty_array(elmtype);
+ 
  	/* compute required space */
  	nbytes = 0;
  	hasnulls = false;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Tom Lane
Andrew Dunstan  writes:
> On 09/23/2017 02:00 PM, Tom Lane wrote:
>> So I'm back to not being sure about the path forward.  Maybe it would be
>> all right to say "the value added by ADD VALUE can't be used in the same
>> transaction, period".  That's still a step forward compared to the pre-v10
>> prohibition on doing it at all.  I don't remember if there were use-cases
>> where we really needed the exception for new-in-transaction types.

> Well, my idea was to have the test run like this:

>   * is the value an old one? Test txnid of tuple. If yes it's ok
>   * is the value one created by ALTER TYPE ADD VALUE? Test
> blacklist. If no, it's ok.
>   * is the enum a new one? Test whitelist. If yes, it's ok.
>   * anything else is not ok.

My point is that if you do 1 and 3, you don't need 2.  Or if you do
2 and 3, you don't need 1.  But in most cases, testing the tuple
hint bits is cheap, so you don't really want that option.

In any case, what I'm worried about is the amount of bookkeeping
overhead added by keeping a whitelist of enum-types-created-in-
current-transaction.  That's less than trivial, especially since
you have to account correctly for subtransactions.  And there are
common use-cases where that table will become large.

> If we just did the blacklist and stuck with our current heuristic test
> for enum being created in the current transaction, we'd still probably
> avoid 99% of the problems, including specifically the one that gave rise
> to the bug report.

True.  But I'm not sure whether the heuristic test is adding anything
meaningful if we use a blacklist first.  The case where it could help
is

begin;
create type t as enum();
alter type t add value 'v';
-- do something with 'v'
commit;

That perhaps is worth something, but if somebody is trying to build a new
enum type in pieces like that, doesn't it seem fairly likely that they
might throw in an ALTER OWNER or GRANT as well?  My feeling is that the
lesson we need to learn is that the heuristic test isn't good enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variable substitution in psql backtick expansion

2017-09-23 Thread Fabien COELHO


Hello Gerdan,

This is an internal address that should not be exposed:

postgresql@coelho.net

I'm unsure why it gets substituted by the "commitfest application"...


When i try apply this patch he failed with a following messenger:


It just worked for me on head with

   git checkout -b test
   git apply ~/psql-server-version-1.patch

My guess is that your mailer or navigator mangles the file contents 
because its mime type is "text/x-diff" and that it considers that it is 
okay to change NL to CR or CRNL... which is a BAD IDEA (tm).


I re-attached the file compressed so that it uses another mime-type and 
should not change its contents.


--
Fabien.

psql-server-version-1.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Andrew Dunstan


On 09/23/2017 02:00 PM, Tom Lane wrote:
> I wrote:
>> Andrew Dunstan  writes:
>>> I see what you're saying, but my idea was slightly different. We would
>>> only add to the hashtable I had in mind at the bottom AddEnumLabel().
>>> Any other value, whether added in the current transaction or not, should
>>> be safe, AIUI.
>> Oh, I see: a list of enum values we need to blacklist, not whitelist.
>> Yes, that's a significantly better idea than mine, because in common
>> use-cases that would be empty or have a very small number of entries.
> Oh, wait a minute ... that's not where the problem is, really.  We can
> already tell reliably whether an enum value was created in the current
> transaction: the is-it-committed check in check_safe_enum_use is
> perfectly safe for that AFAICS.  The hard part of this is the part about
> "was the enum type created in the current transaction?".  We could make
> that reliable with the table I proposed of enum types created in the
> current transaction, but the possible performance drag is a concern.
>
> What I understand you to be proposing is to blacklist individual
> enum values added by ALTER ADD VALUE, but *not* values created
> aboriginally by CREATE TYPE AS ENUM.  (The latter are surely safe,
> because the type must disappear if they do.)  However, that would
> require dropping the second part of the current documentation promise:
>
>If ALTER TYPE ... ADD VALUE (the form that adds a new value to
>an enum type) is executed inside a transaction block, the new value cannot
>be used until after the transaction has been committed, except in the case
>that the enum type itself was created earlier in the same transaction.
>
> We'd have to just say "it can't be used till the transaction has been
> committed", full stop.  Otherwise we're right back into the problem of
> needing to detect whether the enum type is new in transaction.
>
>>> Maybe we should also keep a cache of whitelisted enums
>>> created in the current transaction.
>> What for?
> I now realize that what you probably meant here was to track enum *types*,
> not values, created in the current transaction.  But if we're doing that
> then we don't really need the per-enum-value-blacklist part of it.
>
> So I'm back to not being sure about the path forward.  Maybe it would be
> all right to say "the value added by ADD VALUE can't be used in the same
> transaction, period".  That's still a step forward compared to the pre-v10
> prohibition on doing it at all.  I don't remember if there were use-cases
> where we really needed the exception for new-in-transaction types.
>
>   



Well, my idea was to have the test run like this:

  * is the value an old one? Test txnid of tuple. If yes it's ok
  * is the value one created by ALTER TYPE ADD VALUE? Test
blacklist. If no, it's ok.
  * is the enum a new one? Test whitelist. If yes, it's ok.
  * anything else is not ok.

I think that would let us keep our promise.

If we just did the blacklist and stuck with our current heuristic test
for enum being created in the current transaction, we'd still probably
avoid 99% of the problems, including specifically the one that gave rise
to the bug report.

Am I missing something?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Tom Lane
I wrote:
> Andrew Dunstan  writes:
>> I see what you're saying, but my idea was slightly different. We would
>> only add to the hashtable I had in mind at the bottom AddEnumLabel().
>> Any other value, whether added in the current transaction or not, should
>> be safe, AIUI.

> Oh, I see: a list of enum values we need to blacklist, not whitelist.
> Yes, that's a significantly better idea than mine, because in common
> use-cases that would be empty or have a very small number of entries.

Oh, wait a minute ... that's not where the problem is, really.  We can
already tell reliably whether an enum value was created in the current
transaction: the is-it-committed check in check_safe_enum_use is
perfectly safe for that AFAICS.  The hard part of this is the part about
"was the enum type created in the current transaction?".  We could make
that reliable with the table I proposed of enum types created in the
current transaction, but the possible performance drag is a concern.

What I understand you to be proposing is to blacklist individual
enum values added by ALTER ADD VALUE, but *not* values created
aboriginally by CREATE TYPE AS ENUM.  (The latter are surely safe,
because the type must disappear if they do.)  However, that would
require dropping the second part of the current documentation promise:

   If ALTER TYPE ... ADD VALUE (the form that adds a new value to
   an enum type) is executed inside a transaction block, the new value cannot
   be used until after the transaction has been committed, except in the case
   that the enum type itself was created earlier in the same transaction.

We'd have to just say "it can't be used till the transaction has been
committed", full stop.  Otherwise we're right back into the problem of
needing to detect whether the enum type is new in transaction.

>> Maybe we should also keep a cache of whitelisted enums
>> created in the current transaction.

> What for?

I now realize that what you probably meant here was to track enum *types*,
not values, created in the current transaction.  But if we're doing that
then we don't really need the per-enum-value-blacklist part of it.

So I'm back to not being sure about the path forward.  Maybe it would be
all right to say "the value added by ADD VALUE can't be used in the same
transaction, period".  That's still a step forward compared to the pre-v10
prohibition on doing it at all.  I don't remember if there were use-cases
where we really needed the exception for new-in-transaction types.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Andrew Dunstan


On 09/23/2017 11:16 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>
>>> The immediate question is do we care to design/implement such a thing
>>> post-RC1.  I'd have to vote "no".  I think the most prudent thing to
>>> do is revert 15bc038f9 and then have another go at it during the v11
>>> cycle.
>> Sadly I agree. We've made decisions like this in the past, and I have
>> generally been supportive of them. I think this is the first time I have
>> been on the receiving end of one so late in the process :-(
> Unless you want to try writing a patch for this in the next day or two,
> I think we have to do that.  But now that I see the plan clearly, maybe
> we could get away with a post-RC1 fix.


OK, I'll give it a shot.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Bossart, Nathan
On 9/23/17, 12:36 PM, "Tom Lane"  wrote:
>"Bossart, Nathan"  writes:
>> This looks reasonable to me as well.  I haven't noticed any issues after
>> a couple hours of pgbench with aggressive autovacuum settings, either.
>
> Thanks for looking.  As I'm sure you realize, what motivated that was
> not liking the switch into AutovacMemCxt that you'd added in
> autovacuum_do_vac_analyze ... with this patch, we can drop that.

Yup.  I’ll go ahead and update that patch now.

Nathan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Tom Lane
"Bossart, Nathan"  writes:
> This looks reasonable to me as well.  I haven't noticed any issues after
> a couple hours of pgbench with aggressive autovacuum settings, either.

Thanks for looking.  As I'm sure you realize, what motivated that was
not liking the switch into AutovacMemCxt that you'd added in
autovacuum_do_vac_analyze ... with this patch, we can drop that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Tom Lane
Michael Paquier  writes:
> I have spent some time looking at your patch and testing it. This
> looks sane. A small comment that I have would be to add an assertion
> at the top of perform_work_item to be sure that it is called in the
> memory context of AutovacMemCxt.

Done like that, thanks for reviewing!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Bossart, Nathan
On 9/23/17, 5:27 AM, "Michael Paquier"  wrote:
>On Sat, Sep 23, 2017 at 6:09 AM, Tom Lane  wrote:
>> I notice that autovacuum.c calls autovacuum_do_vac_analyze, and
>> thereby vacuum(), in TopTransactionContext.  This doesn't seem
>> like a terribly great idea, because it doesn't correspond to what
>> happens during a manually-invoked vacuum.
>
> Indeed, the inconsistency is not good here.
>
>> What I think we should do instead is invoke autovacuum_do_vac_analyze
>> in the PortalContext that do_autovacuum has created, which we already
>> have a mechanism to reset once per table processed in do_autovacuum.
>>
>> The attached patch does that, and also modifies perform_work_item()
>> to use the same approach.  Right now perform_work_item() has a
>> copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext)
>> call in its error recovery path, but that seems a bit out of place
>> given that perform_work_item() isn't using PortalContext otherwise.
>
> I have spent some time looking at your patch and testing it. This
> looks sane. A small comment that I have would be to add an assertion
> at the top of perform_work_item to be sure that it is called in the
> memory context of AutovacMemCxt.

This looks reasonable to me as well.  I haven't noticed any issues after
a couple hours of pgbench with aggressive autovacuum settings, either.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench regression test failure

2017-09-23 Thread Tom Lane
Fabien COELHO  writes:
>> [...] After another week of buildfarm runs, we have a few more cases of 
>> 3 rows of output, and none of more than 3 or less than 1.  So I went 
>> ahead and pushed your patch.  I'm still suspicious of these results, but 
>> we might as well try to make the buildfarm green pending investigation 
>> of how this is happening.

> Yep. I keep the issue of pgbench tap test determinism in my todo list, 
> among other things.

> I think that it should be at least clearer under which condition (load ? 
> luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs 
> some thinking.

skink blew up real good just now:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-09-23%2010%3A50%3A01

the critical bit being

#   Failed test 'pgbench progress stderr /(?^:progress: 1\b)/'
#   at 
/home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm 
line 369.
#   'starting vacuum...end.
# progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
# progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
# progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 
skipped
# '
# doesn't match '(?^:progress: 1\b)'

#   Failed test 'transaction count for 001_pgbench_log_1.15981 (5)'
#   at t/001_pgbench_with_server.pl line 438.

#   Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)'
#   at t/001_pgbench_with_server.pl line 438.
# Looks like you failed 3 tests of 233.

That's exceeded my patience with this test case, so I've removed it
for the moment.  We can put it back as soon as we figure some way
to make it more robust.  (BTW, the "-nan" bits suggest an actual
pgbench bug, independently of anything else.)

Possibly you can duplicate skink's issue by running things under
valgrind.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-23 Thread Alvaro Hernandez



On 23/09/17 18:42, Euler Taveira wrote:

2017-09-22 19:28 GMT-03:00 Gregory Brail :

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib" directory was
hard for us to work with. Primarily:


test_decoding is a proof of concept to illustrate the logical decoding
potential. It is not intended for production.


    However, AFAIK, AWS's DMS uses it for production purposes (see 
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).



I developed wal2json [1]
for general use. It outputs changes in JSON. It was one of the first
logical decoding plugins.


1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).


wal2json includes both.


2) It outputs a text format that is hard to parse.


There are a lot of JSON parsers.


I imagine that other users of logical decoding are facing similar problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)


There was a prior discussion and it was suggestted that we have a
ready-for-production plugin in core (besides pgoutput). It was
suggested [1] that I submit wal2json for 11. I'm in process to clean
up the code and hope to submit it to CF2.


    I would be happy to see another logical decoding plugin into core 
starting on 11. However, this also poses a bit of a challenge for 
middleware implementors: you need to support one for 9.4-9.5 
(test_decoding), another for 10 (pgoutput) and maybe another for 11 
onwards. The idea of asking users to install a binary plugin is very 
unsexy, so these are the options available.


    However, having said that, and while json is a great output format 
for interoperability, if there's a discussion on which plugin to include 
next, I'd also favor one that has some more compact representation 
format (or that supports several formats, not only json).



    Regards,

    Álvaro

--

Alvaro Hernandez


---
OnGres



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-23 Thread Euler Taveira
2017-09-22 19:28 GMT-03:00 Gregory Brail :
> We have been working on a project that makes extensive use of logical
> replication for use inside Apigee (which is a very small part of Google):
>
> https://github.com/apigee-labs/transicator
>
> In order to do this, we had to write our own logical replication plugin
> because the supplied "test_decoding" plugin from the "contrib" directory was
> hard for us to work with. Primarily:
>
test_decoding is a proof of concept to illustrate the logical decoding
potential. It is not intended for production. I developed wal2json [1]
for general use. It outputs changes in JSON. It was one of the first
logical decoding plugins.

> 1) It doesn't include all the fields that we need for Transicator (most
> importantly we need the LSN and the 64-bit transaction ID).
>
wal2json includes both.

> 2) It outputs a text format that is hard to parse.
>
There are a lot of JSON parsers.

> I imagine that other users of logical decoding are facing similar problems.
>
> Would the community support the development of another plugin that is
> distributed as part of "contrib" that addresses these issues? I'd be happy
> to submit a patch, or GitHub repo, or whatever works best as an example.
> (Also, although Transicator uses protobuf, I'm happy to have it output a
> simple binary format as well.)
>
There was a prior discussion and it was suggestted that we have a
ready-for-production plugin in core (besides pgoutput). It was
suggested [1] that I submit wal2json for 11. I'm in process to clean
up the code and hope to submit it to CF2.


[1] https://github.com/eulerto/wal2json
[2] 
https://www.postgresql.org/message-id/CAHE3wggh6ucSCB%2BhnSK04xEQx75f3DTz0wPSjRMJFjum6pRrPQ%40mail.gmail.com


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2017-09-23 Thread Tom Lane
Tomas Vondra  writes:
>>> [ scalarineqsel may fall over when used by extension operators ]

> What about using two-pronged approach:

> 1) fall back to mid bucket in back branches (9.3 - 10)
> 2) do something smarter (along the lines you outlined) in PG11

Sure.  We need to test the fallback case anyway.

>> [ sketch of a more extensible design ]

> Sounds reasonable to me, I guess - I can't really think about anything
> simpler giving us the same flexibility.

Actually, on further thought, that's still too simple.  If you look
at convert_string_to_scalar() you'll see it's examining all three
values concurrently (the probe value, of one datatype, and two bin
boundary values of possibly a different type).  The reason is that
it's stripping off whatever common prefix those strings have before
trying to form a numeric equivalent.  While certainly
convert_string_to_scalar is pretty stupid in the face of non-ASCII
sort orders, the prefix-stripping is something I really don't want
to give up.  So the design I sketched of considering each value
totally independently isn't good enough.

We could, perhaps, provide an API that lets an operator estimation
function replace convert_to_scalar in toto, but that seems like
you'd still end up duplicating code in many cases.  Not sure about
how to find a happy medium.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Tom Lane
Andrew Dunstan  writes:
> On 09/22/2017 11:19 PM, Tom Lane wrote:
>> Yeah, I was considering the same thing over dinner, though I'd phrase
>> it oppositely: keep a list of enum type OIDs created in the current
>> transaction, so that we could whitelist them.  This could maybe become
>> a problem if someone created a zillion enums in one xact, though.

> I see what you're saying, but my idea was slightly different. We would
> only add to the hashtable I had in mind at the bottom AddEnumLabel().
> Any other value, whether added in the current transaction or not, should
> be safe, AIUI.

Oh, I see: a list of enum values we need to blacklist, not whitelist.
Yes, that's a significantly better idea than mine, because in common
use-cases that would be empty or have a very small number of entries.
In particular that fixes the "pg_restore -1" scenario, because no
matter how many enums you're restoring, pg_dump doesn't use ALTER
TYPE ADD VALUE.  (Well, it does in --binary-upgrade mode, but those
scripts are run in transaction-per-statement mode so it's fine.)

> Maybe we should also keep a cache of whitelisted enums
> created in the current transaction.

What for?  Wouldn't be any faster to search, in fact likely slower
because it could get large in common use-cases.

>> The immediate question is do we care to design/implement such a thing
>> post-RC1.  I'd have to vote "no".  I think the most prudent thing to
>> do is revert 15bc038f9 and then have another go at it during the v11
>> cycle.

> Sadly I agree. We've made decisions like this in the past, and I have
> generally been supportive of them. I think this is the first time I have
> been on the receiving end of one so late in the process :-(

Unless you want to try writing a patch for this in the next day or two,
I think we have to do that.  But now that I see the plan clearly, maybe
we could get away with a post-RC1 fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2017-09-23 Thread Tomas Vondra
On 09/21/2017 04:24 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> [ scalarineqsel may fall over when used by extension operators ]
> 
> I concur with your thought that we could have
> ineq_histogram_selectivity fall back to a "mid bucket" default if
> it's working with a datatype it is unable to convert_to_scalar. But I
> think if we're going to touch this at all, we ought to have higher
> ambition than that, and try to provide a mechanism whereby an
> extension that's willing to work a bit harder could get that
> additional increment of estimation accuracy. I don't care for this
> way to do that:
> 

The question is - do we need a solution that is back-patchable? This
means we can't really use FIXEDDECIMAL without writing effectively
copying a lot of the selfuncs.c stuff, only to make some minor fixes.

What about using two-pronged approach:

1) fall back to mid bucket in back branches (9.3 - 10)
2) do something smarter (along the lines you outlined) in PG11

I'm willing to spend some time on both, but (2) alone is not a
particularly attractive for us as it only helps PG11 and we'll have to
do the copy-paste evil anyway to get the data type working on back branches.

>> * Make convert_numeric_to_scalar smarter, so that it checks if
>> there is an implicit cast to numeric, and fail only if it does not
>> find one.
> 
> because it's expensive, and it only works for numeric-category cases,
> and it will fail outright for numbers outside the range of "double".
> (Notice that convert_numeric_to_scalar is *not* calling the regular
> cast function for numeric-to-double.) Moreover, any operator ought to
> know what types it can accept; we shouldn't have to do more catalog
> lookups to figure out what to do.
> 

Ah, I see. I haven't realized it's not using the regular cast functions,
but now that you point that out it's clear relying on casts would fail.

> Now that scalarltsel and friends are just trivial wrappers for a
> common function, we could imagine exposing scalarineqsel_wrapper as a
> non-static function, with more arguments (and perhaps a better-chosen
> name ;-)). The idea would be for extensions that want to go this
> extra mile to provide their own selectivity estimation functions,
> which again would just be trivial wrappers for the core function, but
> would provide additional knowledge through additional arguments.
> 
> The additional arguments I'm envisioning are a couple of C function 
> pointers, one function that knows how to convert the operator's 
> left-hand input type to scalar, and one function that knows how to
> convert the right-hand type to scalar. (Identical APIs of course.) 
> Passing a NULL would imply that the core code must fall back on its 
> own devices for that input.
> 
> Now the thing about convert_to_scalar is that there are several
> different conversion conventions already (numeric, string, timestamp,
> ...), and there probably could be more once extension types are
> coming to the party. So I'm imagining that the API for these
> conversion functions could be something like
> 
>   bool convert(Datum value, Oid valuetypeid,
>int *conversion_convention, double *converted_value);
> 
> The conversion_convention output would make use of some agreed-on 
> constants, say 1 for numeric, 2 for string, etc etc. In the core 
> code, if either converter fails (returns false) or if they don't 
> return the same conversion_convention code, we give up and use the 
> mid-bucket default. If they both produce results with the same 
> conversion_convention, then we can treat the converted_values as 
> commensurable.
> 

OK, so instead of re-implementing the whole function, we'd essentially
do just something like this:

#typedef bool (*convert_callback) (Datum value, Oid valuetypeid, \
   int *conversion_convention, \
   double *converted_value);

double
scalarineqsel(PlannerInfo *root, Oid operator,
 bool isgt, bool iseq, VariableStatData *vardata,
 Datum constval, Oid consttype,
 convert_calback convert);

and then, from the extension

double
scalarineqsel_wrapper(PlannerInfo *root, Oid operator,
 bool isgt, bool iseq, VariableStatData *vardata,
 Datum constval, Oid consttype)
{
return scalarineqsel(root, operator, isgt, iseq, vardata,
 constval, consttype, my_convert_func);
}

Sounds reasonable to me, I guess - I can't really think about anything
simpler giving us the same flexibility.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenFile() Permissions Refactor

2017-09-23 Thread Peter Eisentraut
On 9/13/17 10:26, David Steele wrote:
> Here's a new patch based on your review.  Where I had a question I made
> a choice as described below:

I have committed the changes to the file APIs and a fix for the umask
save/restore issue.

The mkdir changes didn't really inspire me.  Replacing mkdir() with
MakeDirectoryPerm() without any change in functionality doesn't really
improve clarity.  Maybe we'll revisit that when your next patches arrive.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-23 Thread Andrew Dunstan


On 09/22/2017 11:19 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 09/22/2017 05:46 PM, Tom Lane wrote:
>>> I'm not sure if that qualifies as a stop-ship problem, but it ain't
>>> good, for sure.  We need to look at whether we should revert 15bc038f9
>>> or somehow revise its rules.
>> I wonder if we wouldn't be better
>> doing this more directly, keeping a per-transaction hash of unsafe enum
>> values (which will almost always be empty). It might even speed up the
>> check.
> Yeah, I was considering the same thing over dinner, though I'd phrase
> it oppositely: keep a list of enum type OIDs created in the current
> transaction, so that we could whitelist them.  This could maybe become
> a problem if someone created a zillion enums in one xact, though.


I see what you're saying, but my idea was slightly different. We would
only add to the hashtable I had in mind at the bottom AddEnumLabel().
Any other value, whether added in the current transaction or not, should
be safe, AIUI. Maybe we should also keep a cache of whitelisted enums
created in the current transaction.

I'm not to worried about people creating a zillion enums (or enum labels
being added for the solution I had in mind). Even a hash of a million
Oids will only consume a few megabytes, won't it?

>
> The immediate question is do we care to design/implement such a thing
> post-RC1.  I'd have to vote "no".  I think the most prudent thing to
> do is revert 15bc038f9 and then have another go at it during the v11
> cycle.
>
>   


Sadly I agree. We've made decisions like this in the past, and I have
generally been supportive of them. I think this is the first time I have
been on the receiving end of one so late in the process :-(

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin_summarize_new_values error checking

2017-09-23 Thread Alvaro Herrera
Fujii Masao wrote:
> On Wed, Jan 27, 2016 at 11:56 AM, Fujii Masao  wrote:
> > On Mon, Jan 25, 2016 at 4:03 PM, Jeff Janes  wrote:
> >> In reviewing one of my patches[1], Fujii-san has pointed out that I
> >> didn't include checks for being in recovery, or for working on another
> >> backend's temporary index.
> >>
> >> I think that brin_summarize_new_values in 9.5.0 commits those same
> >> sins. In its case, I don't think those are critical, as they just
> >> result in getting less specific error messages that one might hope
> >> for, rather than something worse like a panic.
> >>
> >> But still, we might want to address them.
> >
> > Agreed to add those checks.
> 
> Attached patch does this.

Uh, I just realized you never applied this patch ... Will you do so now?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Alvaro Herrera
Alvaro Herrera wrote:

> One idea I just had is to somehow add it to src/test/modules/brin, and
> set up the postmaster in that test with autovacuum_naptime=1s.  I'll go
> check how feasible that is.  (By placing it there we could also verify
> that the index does indeed contain the index entries we expect, since it
> has pageinspect available.)

Yeah, this works, as per the attached patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 196bec86e957cc6413d1e4823cb891a0edc192de Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Sat, 23 Sep 2017 12:56:56 +0200
Subject: [PATCH] Add autovacuum work-items test for BRIN

---
 src/test/modules/brin/Makefile  |  7 --
 src/test/modules/brin/t/01_workitems.pl | 39 +
 2 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 src/test/modules/brin/t/01_workitems.pl

diff --git a/src/test/modules/brin/Makefile b/src/test/modules/brin/Makefile
index dda84c23c7..18c5cafd5e 100644
--- a/src/test/modules/brin/Makefile
+++ b/src/test/modules/brin/Makefile
@@ -16,7 +16,7 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
 
-check: isolation-check
+check: isolation-check prove-check
 
 isolation-check: | submake-isolation
$(MKDIR_P) isolation_output
@@ -24,7 +24,10 @@ isolation-check: | submake-isolation
--outputdir=./isolation_output \
$(ISOLATIONCHECKS)
 
-.PHONY: check isolation-check
+prove-check:
+   $(prove_check)
+
+.PHONY: check isolation-check prove-check
 
 submake-isolation:
$(MAKE) -C $(top_builddir)/src/test/isolation all
diff --git a/src/test/modules/brin/t/01_workitems.pl 
b/src/test/modules/brin/t/01_workitems.pl
new file mode 100644
index 00..11c9981d40
--- /dev/null
+++ b/src/test/modules/brin/t/01_workitems.pl
@@ -0,0 +1,39 @@
+# Verify that work items work correctly
+
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 2;
+use PostgresNode;
+
+my $node = get_new_node('tango');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum_naptime=1s');
+$node->start;
+
+$node->safe_psql('postgres', 'create extension pageinspect');
+
+# Create a table with an autosummarizing BRIN index
+$node->safe_psql('postgres',
+   'create table brin_wi (a int) with (fillfactor = 10);
+create index brin_wi_idx on brin_wi using brin (a) with 
(pages_per_range=1, autosummarize=on);
+'
+);
+my $count = $node->safe_psql('postgres',
+   "select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 
'brin_wi_idx'::regclass)"
+);
+is($count, '1', "initial index state is correct");
+
+$node->safe_psql('postgres',
+   'insert into brin_wi select * from generate_series(1, 100)');
+
+$node->poll_query_until('postgres',
+   "select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 
2), 'brin_wi_idx'::regclass)",
+   't');
+
+$count = $node->safe_psql('postgres',
+   "select count(*) > 1 from brin_page_items(get_raw_page('brin_wi_idx', 
2), 'brin_wi_idx'::regclass)"
+);
+is($count, 't', "index got summarized");
+$node->stop;
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Alvaro Herrera
Tom Lane wrote:
> I notice that autovacuum.c calls autovacuum_do_vac_analyze, and
> thereby vacuum(), in TopTransactionContext.  This doesn't seem
> like a terribly great idea, because it doesn't correspond to what
> happens during a manually-invoked vacuum.  TopTransactionContext
> will go away when vacuum() commits the outer transaction, whereas
> in non-autovac usage, we call vacuum() in a PortalHeapMemory
> context that is not a child of TopTransactionContext and is not
> at risk of being reset multiple times during the vacuum().  This'd
> be a hazard if autovacuum_do_vac_analyze or vacuum did any palloc's
> before getting to the main loop.  More generally, I'm not aware of
> other cases where we invoke a function in a context that we know
> that function will destroy as it executes.

Oh, good catch.  This must be a very old oversight -- I bet it goes all
the way back to the introduction of autovacuum.  I think if there were
any actual bugs, we would have noticed by now.

> I don't see any live bug associated with this in HEAD, but this behavior
> requires a rather ugly (and memory-leaking) workaround in the proposed
> patch to allow multiple vacuum target rels.

Well, it's definitely broken, so I'd vote for fixing it even if we
weren't considering that patch.

> What I think we should do instead is invoke autovacuum_do_vac_analyze
> in the PortalContext that do_autovacuum has created, which we already
> have a mechanism to reset once per table processed in do_autovacuum.

Sounds sensible.

> The attached patch does that, and also modifies perform_work_item()
> to use the same approach.  Right now perform_work_item() has a
> copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext)
> call in its error recovery path, but that seems a bit out of place
> given that perform_work_item() isn't using PortalContext otherwise.

Oops :-(

> PS: I was disappointed to find out that perform_work_item() isn't
> exercised at all in the standard regression tests.

Yeah ... It's fairly simple to create a test that will invoke it a few
times, but one problem is that it depends on autovacuum's timing.  If I
add this in brin.sql

-- Test BRIN work items
CREATE TABLE brin_work_items (LIKE brintest) WITH (fillfactor = 10);
CREATE INDEX brin_work_items_idx ON brin_work_items USING brin (textcol)
WITH (autosummarize = on, pages_per_range=1);
INSERT INTO brin_work_items SELECT * FROM brintest;

the work item is only performed about 15 seconds after the complete
regression tests are finished, which I fear would mean "never" in
practice.

One idea I just had is to somehow add it to src/test/modules/brin, and
set up the postmaster in that test with autovacuum_naptime=1s.  I'll go
check how feasible that is.  (By placing it there we could also verify
that the index does indeed contain the index entries we expect, since it
has pageinspect available.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-23 Thread Julien Rouhaud
On Fri, Sep 22, 2017 at 11:09 PM, Robert Haas  wrote:
> During planning, *every* node has a list of pathkeys associated with
> it which represent the presumed output ordering.  You can support
> ordered append paths without changing any data structures at all; it's
> just a matter of putting pathkeys into an AppendPath.
>

I totally agree.

> The reason why MergeAppend has extra stuff in the node (numCols,
> sortColIdx, etc.) is so that it can actually perform the sort at
> runtime.  But this feature requires no runtime support -- that's kinda
> the point -- so there's no need for it to carry that information, or
> to add any new nodes.
>
> At least, IIUC.

That's true, but numCols, sortColdIdx etc are also used to display the
sort key in an explain.  If an append can return sorted data, it
should also display the sort information, so I think these fields are
still required in an Append node.

If it's ok to only add these fields in an Append node for the explain
part, I'll revert the Append/MergeAppend refactoring and do some other
cleanup as you advised earlier.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-23 Thread Michael Paquier
On Sat, Sep 23, 2017 at 6:09 AM, Tom Lane  wrote:
> I notice that autovacuum.c calls autovacuum_do_vac_analyze, and
> thereby vacuum(), in TopTransactionContext.  This doesn't seem
> like a terribly great idea, because it doesn't correspond to what
> happens during a manually-invoked vacuum.

Indeed, the inconsistency is not good here.

> What I think we should do instead is invoke autovacuum_do_vac_analyze
> in the PortalContext that do_autovacuum has created, which we already
> have a mechanism to reset once per table processed in do_autovacuum.
>
> The attached patch does that, and also modifies perform_work_item()
> to use the same approach.  Right now perform_work_item() has a
> copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext)
> call in its error recovery path, but that seems a bit out of place
> given that perform_work_item() isn't using PortalContext otherwise.

I have spent some time looking at your patch and testing it. This
looks sane. A small comment that I have would be to add an assertion
at the top of perform_work_item to be sure that it is called in the
memory context of AutovacMemCxt.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-09-23 Thread Sokolov Yura

Hello, Claudio.

Thank you for review and confirm of improvement.

On 2017-09-23 01:12, Claudio Freire wrote:

On Tue, Sep 12, 2017 at 12:49 PM, Sokolov Yura
 wrote:

On 2017-07-21 13:49, Sokolov Yura wrote:


On 2017-05-17 17:46, Sokolov Yura wrote:


Alvaro Herrera писал 2017-05-15 20:13:


As I understand, these patches are logically separate, so putting 
them
together in a single file isn't such a great idea.  If you don't 
edit
the patches further, then you're all set because we already have 
the
previously archived patches.  Next commitfest starts in a few 
months

yet, and if you feel the need to submit corrected versions in the
meantime, please do submit in separate files.  (Some would even 
argue
that each should be its own thread, but I don't think that's 
necessary.)



Thank you for explanation.

I'm adding new version of first patch with minor improvement:
- I added detection of a case when all buckets are trivial
  (ie 0 or 1 element). In this case no need to sort buckets at all.



I'm putting rebased version of second patch.



Again rebased version of both patches.
Now second patch applies cleanly independent of first patch.


Patch 1 applies cleanly, builds, and make check runs fine.

The code looks similar in style to surrounding code too, so I'm not
going to complain about the abundance of underscores in the macros :-p

I can reproduce the results in the OP's benchmark, with slightly
different numbers, but an overall improvement of ~6%, which matches
the OP's relative improvement.

Algorithmically, everything looks sound.


A few minor comments about patch 1:

+if (max == 1)
+goto end;

That goto is unnecessary, you could just as simply say

if (max > 1)
{
   ...
}


Done.
(I don't like indentation, though :-( )




+#define pg_shell_sort_pass(elem_t, cmp, off) \
+do { \
+int _i, _j; \
+elem_t _temp; \
+for (_i = off; _i < _n; _i += off) \
+{ \

_n right there isn't declared in the macro, and it isn't an argument
either. It should be an argument, having stuff inherited from the
enclosing context like that is confusing.

Same with _arr, btw.


pg_shell_sort_pass is not intended to be used outside pg_shell_sort
and ph_insertion_sort, so I think, stealing from their context is ok.
Nonetheless, done.




Patch 2 LGTM.


--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom e486014c0a7545f7bb2264d0d7ac96f0544eacdb Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 14:23:39 +0300
Subject: [PATCH] Improve compactify_tuples

Items passed to compactify_tuples are almost sorted. So call to general
qsort makes unnecessary overhead on function calls (swapfunc, med,
itemoffcompare).

This patch implements bucket sort:
- one pass of stable prefix sort on high 8 bits of offset
- and insertion sort for buckets larger than 1 element

Also for smaller arrays shell sort is used.

Insertion and Shell sort are implemented using macroses.

This approach allows to save 3% of cpu in degenerate case
(highly intensive HOT random updates on unlogged table with
 synchronized_commit=off), and speeds up WAL replaying (as were
found by Heikki Linnakangas).

Same approach were implemented by Heikki Linnakangas some time ago with
several distinctions.
---
 src/backend/storage/page/bufpage.c | 87 --
 src/include/c.h| 63 +++
 2 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 41642eb59c..0f9f7d5dfc 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -434,6 +434,86 @@ itemoffcompare(const void *itemidp1, const void *itemidp2)
 }
 
 /*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses Shell sort. Given array is small and itemoffcompare is inlined,
+ * it is much faster than call to qsort.
+ */
+static void
+sort_itemIds_small(itemIdSort itemidbase, int nitems)
+{
+	pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare);
+}
+
+/*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses bucket sort:
+ * - single pass of stable prefix sort on high 8 bits
+ * - and insertion sort on buckets larger than 1 element
+ */
+static void
+sort_itemIds(itemIdSort itemidbase, int nitems)
+{
+	itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+#define NSPLIT 256
+#define PREFDIV (BLCKSZ / NSPLIT)
+	/* two extra elements to emulate offset on previous step */
+	uint16		count[NSPLIT + 2] = {0};
+	int			i,
+max,
+total,
+pos,
+highbits;
+
+	Assert(nitems <= MaxIndexTuplesPerPage);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		count[highbits]++;
+	}
+	/* sort in decreasing order */
+	max = total = count[NSPLIT - 1];
+	for (i =