date:20150218

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier

On Thu, Feb 19, 2015 at 3:57 PM, Michael Paquier
 wrote:
> On Thu, Feb 19, 2015 at 2:58 PM, Tom Lane  wrote:
>> Michael Paquier  writes:
>>> 1-2) sizeof(ParamListInfoData) is present in a couple of places,
>>> assuming that sizeof(ParamListInfoData) has the equivalent of 1
>>> parameter, like prepare.c, functions.c, spi.c and postgres.c:
>>> -   /* sizeof(ParamListInfoData) includes the first array element */
>>> paramLI = (ParamListInfo)
>>> palloc(sizeof(ParamListInfoData) +
>>> -  (num_params - 1) * sizeof(ParamExternData));
>>> +  num_params * sizeof(ParamExternData));
>>> 1-3) FuncCandidateList in namespace.c (thanks Andres!):
>>> newResult = (FuncCandidateList)
>>> -   palloc(sizeof(struct _FuncCandidateList) - 
>>> sizeof(Oid)
>>> -  + effective_nargs * sizeof(Oid));
>>> +   palloc(sizeof(struct _FuncCandidateList) +
>>> +  effective_nargs * sizeof(Oid));
>>> I imagine that we do not want for those palloc calls to use ifdef
>>> FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if
>>> compiler does not support flexible-array length, right?
>>
>> These are just wrong.  As a general rule, we do not want to *ever* take
>> sizeof() a struct that contains a flexible array: the results will not
>> be consistent across platforms.  The right thing is to use offsetof()
>> instead.  See the helpful comment autoconf provides:
>>
>> [...]
>
> And I had this one in front of my eyes a couple of hours ago... Thanks.
>
>> This point is actually the main reason we've not done this change long
>> since.  People did not feel like running around to make sure there were
>> no overlooked uses of sizeof().
>
> Thanks for the clarifications and the review. Attached is a new set.

Grr. Completely forgot to use offsetof in dumputils.c as well. Patch
that can be applied on top of 0001 is attached.
-- 
Michael
diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 892d3fa..3459adc 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -1217,7 +1217,7 @@ simple_string_list_append(SimpleStringList *list, const char *val)
 	SimpleStringListCell *cell;
 
 	/* this calculation correctly accounts for the null trailing byte */
-	cell = (SimpleStringListCell *) pg_malloc(sizeof(SimpleStringListCell));
+	cell = (SimpleStringListCell *) pg_malloc(offsetof(SimpleStringListCell, val));
 
 	cell->next = NULL;
 	strcpy(cell->val, val);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join push-down support for foreign tables

2015-02-18 Thread Shigeru Hanada

2015-02-17 10:39 GMT+09:00 Kouhei Kaigai :
> Let me put some comments in addition to where you're checking now.
>
> [design issues]
> * Cost estimation
> Estimation and evaluation of cost for remote join query is not an
> obvious issue. In principle, local side cannot determine the cost
> to run remote join without remote EXPLAIN, because local side has
> no information about JOIN logic applied on the remote side.
> Probably, we have to put an assumption for remote join algorithm,
> because local planner has no idea about remote planner's choice
> unless foreign-join don't take "use_remote_estimate".
> I think, it is reasonable assumption (even if it is incorrect) to
> calculate remote join cost based on local hash-join algorithm.
> If user wants more correct estimation, remote EXPLAIN will make
> more reliable cost estimation.

Hm, I guess that you chose hash-join as "least-costed join".  In the
pgbench model, most combination between two tables generate hash join
as cheapest path.  Remote EXPLAIN is very expensive in the context of
planning, so it would easily make the plan optimization meaningless.
But giving an option to users is good, I agree.

>
> It also needs a consensus whether cost for remote CPU execution is
> equivalent to local CPU. If we think local CPU is rare resource
> than remote one, a discount rate will make planner more preferable
> to choose remote join than local one

Something like cpu_cost_ratio as a new server-level FDW option?

>
> Once we assume a join algorithm for remote join, unit cost for
> remote CPU, we can calculate a cost for foreign join based on
> the local join logic plus cost for network translation (maybe
> fdw_tuple_cost?).

Yes, sum of these costs is the total cost of a remote join.
o fdw_startup_cost
o hash-join cost, estimated as a local join
o fdw_tuple_cost * rows * width

> * FDW options
> Unlike table scan, FDW options we should refer is unclear.
> Table level FDW options are associated with a foreign table as
> literal. I think we have two options here:
> 1. Foreign-join refers FDW options for foreign-server, but ones
>for foreign-tables are ignored.
> 2. Foreign-join is prohibited when both of relations don't have
>identical FDW options.
> My preference is 2. Even though N-way foreign join, it ensures
> all the tables involved with (N-1)-way foreign join has identical
> FDW options, thus it leads we can make N-way foreign join with
> all identical FDW options.
> One exception is "updatable" flag of postgres_fdw. It does not
> make sense on remote join, so I think mixture of updatable and
> non-updatable foreign tables should be admitted, however, it is
> a decision by FDW driver.
>
> Probably, above points need to take time for getting consensus.
> I'd like to see your opinion prior to editing your patch.

postgres_fdw can't push down a join which contains foreign tables on
multiple servers, so use_remote_estimate and fdw_startup_cost  are the
only FDW options to consider.  So we have options for each option.

1-a. If all foreign tables in the join has identical
use_remote_estimate, allow pushing down.
1-b. If any of foreign table in the join has true as
use_remote_estimate, use remote estimate.

2-a. If all foreign tables in the join has identical fdw_startup_cost,
allow pushing down.
2-b. Always use max value in the join. (cost would be more expensive)
2-c. Always use min value in the join.  (cost would be cheaper)

I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have
reasonable cost about startup.

I agree about "updatable" option.

>
> [implementation issues]
> The interface does not intend to add new Path/Plan type for each scan
> that replaces foreign joins. What postgres_fdw should do is, adding
> ForeignPath towards a particular joinrel, then it populates ForeignScan
> with remote join query once it got chosen by the planner.

That idea is interesting, and make many things simpler.  Please let me consider.

>
> A few functions added in src/backend/foreign/foreign.c are not
> called by anywhere, at this moment.
>
> create_plan_recurse() is reverted to static. It is needed for custom-
> join enhancement, if no other infrastructure can support.

I made it back to static because I thought that create_plan_recurse
can be called by core before giving control to FDWs.  But I'm not sure
it can be applied to custom scans.  I'll recheck that part.


-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier

On Thu, Feb 19, 2015 at 2:58 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> 1-2) sizeof(ParamListInfoData) is present in a couple of places,
>> assuming that sizeof(ParamListInfoData) has the equivalent of 1
>> parameter, like prepare.c, functions.c, spi.c and postgres.c:
>> -   /* sizeof(ParamListInfoData) includes the first array element */
>> paramLI = (ParamListInfo)
>> palloc(sizeof(ParamListInfoData) +
>> -  (num_params - 1) * sizeof(ParamExternData));
>> +  num_params * sizeof(ParamExternData));
>> 1-3) FuncCandidateList in namespace.c (thanks Andres!):
>> newResult = (FuncCandidateList)
>> -   palloc(sizeof(struct _FuncCandidateList) - 
>> sizeof(Oid)
>> -  + effective_nargs * sizeof(Oid));
>> +   palloc(sizeof(struct _FuncCandidateList) +
>> +  effective_nargs * sizeof(Oid));
>> I imagine that we do not want for those palloc calls to use ifdef
>> FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if
>> compiler does not support flexible-array length, right?
>
> These are just wrong.  As a general rule, we do not want to *ever* take
> sizeof() a struct that contains a flexible array: the results will not
> be consistent across platforms.  The right thing is to use offsetof()
> instead.  See the helpful comment autoconf provides:
>
> [...]

And I had this one in front of my eyes a couple of hours ago... Thanks.

> This point is actually the main reason we've not done this change long
> since.  People did not feel like running around to make sure there were
> no overlooked uses of sizeof().

Thanks for the clarifications and the review. Attached is a new set.
-- 
Michael
From ca92019afc1db4da147ebb5c436164a787f7fa62 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 16 Feb 2015 02:44:10 +0900
Subject: [PATCH 1/4] First cut with FLEXIBLE_ARRAY_MEMBER

This is targetted to prevent false positive errors that static code
analyzers may do by assuming that some structures have an unvarying
size.
---
 contrib/cube/cubedata.h |  7 +++
 contrib/intarray/_int.h |  4 ++--
 contrib/ltree/ltree.h   | 14 +++---
 contrib/pg_trgm/trgm.h  |  2 +-
 src/backend/catalog/namespace.c |  4 ++--
 src/backend/commands/prepare.c  |  6 +++---
 src/backend/executor/functions.c|  6 +++---
 src/backend/executor/spi.c  |  5 ++---
 src/backend/nodes/params.c  |  5 ++---
 src/backend/tcop/postgres.c |  5 ++---
 src/bin/pg_dump/dumputils.c |  3 +--
 src/bin/pg_dump/dumputils.h |  2 +-
 src/include/access/gin_private.h|  2 +-
 src/include/access/gist_private.h   |  5 +++--
 src/include/access/heapam_xlog.h|  2 +-
 src/include/access/spgist_private.h | 10 +-
 src/include/access/xact.h   |  8 
 src/include/c.h |  4 ++--
 src/include/catalog/namespace.h |  4 ++--
 src/include/commands/dbcommands.h   |  4 ++--
 src/include/commands/tablespace.h   |  2 +-
 src/include/executor/hashjoin.h |  2 +-
 src/include/nodes/bitmapset.h   |  4 ++--
 src/include/nodes/params.h  |  2 +-
 src/include/nodes/tidbitmap.h   |  2 +-
 src/include/postgres.h  |  9 +
 src/include/postmaster/syslogger.h  |  2 +-
 src/include/replication/walsender_private.h |  2 +-
 src/include/storage/bufpage.h   |  3 ++-
 src/include/storage/fsm_internals.h |  2 +-
 src/include/storage/standby.h   |  4 ++--
 src/include/tsearch/dicts/regis.h   |  2 +-
 src/include/tsearch/dicts/spell.h   |  6 +++---
 src/include/tsearch/ts_type.h   |  6 +++---
 src/include/utils/catcache.h|  4 ++--
 src/include/utils/datetime.h|  5 +++--
 src/include/utils/geo_decls.h   |  5 +++--
 src/include/utils/jsonb.h   |  2 +-
 src/include/utils/relmapper.h   |  2 +-
 src/include/utils/varbit.h  |  3 ++-
 40 files changed, 86 insertions(+), 85 deletions(-)

diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
index 5d44e11..3535847 100644
--- a/contrib/cube/cubedata.h
+++ b/contrib/cube/cubedata.h
@@ -23,11 +23,10 @@ typedef struct NDBOX
 	unsigned int header;
 
 	/*
-	 * Variable length array. The lower left coordinates for each dimension
-	 * come first, followed by upper right coordinates unless the point flag
-	 * is set.
+	 * The lower left coordinates for each dimension come first, followed
+	 * by upper right coordinates unless the point flag is set.
 	 */
-	double		x[1];
+	double		x[FLEXIBLE_ARRAY_MEMBER]

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-18 Thread Peter Geoghegan

On Tue, Feb 10, 2015 at 12:09 PM, Peter Geoghegan  wrote:
>> Then the problem suddenly becomes that previous choices of
>> indexes/statements aren't possible anymore. It seems much better to
>> introduce the syntax now and not have too much of a usecase for
>> it.
>
> The only way the lack of a way of specifying which opclass to use
> could bite us is if there were two *actually* defined unique indexes
> on the same column, each with different "equality" operators. That
> seems like kind of a funny scenario, even if that were quite possible
> (even if non-default opclasses existed that had a non-identical
> "equality" operators, which is basically not the case today).
>
> I grant that is a bit odd that we're talking about unique indexes
> definitions affecting semantics, but that is to a certain extent the
> nature of the beast. As a compromise, I suggest having the inference
> specification optionally accept a named opclass per attribute, in the
> style of CREATE INDEX (I'm already reusing a bit of the raw parser
> support for CREATE INDEX, you see) - that'll make inference insist on
> that opclass, rather than make it a strict matter of costing available
> alternatives (not that any alternative is expected with idiomatic
> usage). That implies no additional parser overhead, really. If that's
> considered ugly, then at least it's an ugly thing that literally no
> one will ever use in the foreseeable future...and an ugly thing that
> is no more necessary in CREATE INDEX than here (and yet CREATE INDEX
> lives with the ugliness).

Any thoughts on this, anyone?

AFAICT, only this and the behavior of logical decoding are open items
at this point. I'd like to close out both of those sooner rather than
later.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Michael Paquier

On Mon, Feb 16, 2015 at 8:55 PM, Andres Freund  wrote:
> On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
>> - * As a trivial form of data compression, the XLOG code is aware that
>> - * PG data pages usually contain an unused "hole" in the middle, which
>> - * contains only zero bytes.  If hole_length > 0 then we have removed
>> - * such a "hole" from the stored data (and it's not counted in the
>> - * XLOG record's CRC, either).  Hence, the amount of block data actually
>> - * present is BLCKSZ - hole_length bytes.
>> + * Block images are able to do several types of compression:
>> + * - When wal_compression is off, as a trivial form of compression, the
>> + * XLOG code is aware that PG data pages usually contain an unused "hole"
>> + * in the middle, which contains only zero bytes.  If length < BLCKSZ
>> + * then we have removed such a "hole" from the stored data (and it is
>> + * not counted in the XLOG record's CRC, either).  Hence, the amount
>> + * of block data actually present is "length" bytes.  The hole "offset"
>> + * on page is defined using "hole_offset".
>> + * - When wal_compression is on, block images are compressed using a
>> + * compression algorithm without their hole to improve compression
>> + * process of the page. "length" corresponds in this case to the length
>> + * of the compressed block. "hole_offset" is the hole offset of the page,
>> + * and the length of the uncompressed block is defined by "raw_length",
>> + * whose data is included in the record only when compression is enabled
>> + * and "with_hole" is set to true, see below.
>> + *
>> + * "is_compressed" is used to identify if a given block image is compressed
>> + * or not. Maximum page size allowed on the system being 32k, the hole
>> + * offset cannot be more than 15-bit long so the last free bit is used to
>> + * store the compression state of block image. If the maximum page size
>> + * allowed is increased to a value higher than that, we should consider
>> + * increasing this structure size as well, but this would increase the
>> + * length of block header in WAL records with alignment.
>> + *
>> + * "with_hole" is used to identify the presence of a hole in a block image.
>> + * As the length of a block cannot be more than 15-bit long, the extra bit 
>> in
>> + * the length field is used for this identification purpose. If the block 
>> image
>> + * has no hole, it is ensured that the raw size of a compressed block image 
>> is
>> + * equal to BLCKSZ, hence the contents of 
>> XLogRecordBlockImageCompressionInfo
>> + * are not necessary.
>>   */
>>  typedef struct XLogRecordBlockImageHeader
>>  {
>> - uint16  hole_offset;/* number of bytes before "hole" */
>> - uint16  hole_length;/* number of bytes in "hole" */
>> + uint16  length:15,  /* length of block data in 
>> record */
>> + with_hole:1;/* status of hole in the block 
>> */
>> +
>> + uint16  hole_offset:15, /* number of bytes before "hole" */
>> + is_compressed:1;/* compression status of image */
>> +
>> + /* Followed by the data related to compression if block is compressed 
>> */
>>  } XLogRecordBlockImageHeader;
>
> Yikes, this is ugly.
>
> I think we should change the xlog format so that the block_id (which
> currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't
> the block id but something like XLR_CHUNK_ID. Which is used as is for
> XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to
> XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED,
> XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the
> block id following the chunk id.
> Yes, that'll increase the amount of data for a backup block by 1 byte,
> but I think that's worth it. I'm pretty sure we will be happy about the
> added extensibility pretty soon.

Yeah, that would help for readability and does not cost much compared
to BLCKSZ. Still could you explain what kind of extensibility you have
in mind except code readability? It is hard to make a nice picture
with only the paper and the pencils, and the current patch approach
has been taken to minimize the record length, particularly for users
who do not care about WAL compression.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-18 Thread Michael Paquier

On Thu, Feb 19, 2015 at 2:13 PM, Naoya Anzai
 wrote:
> As a result, I think you should not delete VACOPT_VERBOSE.

In v8 it is not deleted. It is still declared, and its use is isolated
in gram.y, similarly to VACOPT_FREEZE.

> According to the last mail I have posted, the difference of
> manual-vacuum log and auto-vacuum log exists clearly.

Did you test the latest patch v8? I have added checks in it to see if
a process is an autovacuum worker to control elevel and the extra logs
of v7 do not show up.

> So, at least you should not touch the mechanism of VACOPT_VERBOSE
> until both vacuum log formats are unified to a same format.

If you mean that we should have the same kind of log outputs for
autovacuum and manual vacuum, I think that this is not going to
happen. Autovacuum entries are kept less verbose on purpose, contract
that v7 clealy broke.

> If you agree my think, please undo your removing VACOPT_VERBOSE work.

Well, I don't agree :) And I am guessing that you did not look at v8
as well. Centralizing the control of logs using log_min_duration is
more extensible than simply having VACOPT_VERBOSE.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assessing parallel-safety

2015-02-18 Thread Noah Misch

On Wed, Feb 18, 2015 at 12:23:05PM -0500, Robert Haas wrote:
> On Sun, Feb 15, 2015 at 6:24 PM, Noah Misch  wrote:
> > It's important for parallelism to work under the extended query protocol, 
> > and
> > that's nontrivial.  exec_parse_message() sets cursorOptions.
> > exec_bind_message() runs the planner.  We don't know if a parallel plan is
> > okay before seeing max_rows in exec_execute_message().
> 
> Yes, that's a problem.  One could require users of the extended query
> protocol to indicate their willingness to accept a parallel query plan
> when sending the bind message by setting the appropriate cursor
> option; and one could, when that option is specified, refuse execute
> messages with max_rows != 0.  However, that has the disadvantage of
> forcing all clients to be updated for the new world order.

Right; that would imply a protocol version bump.

> Or, one
> could assume a parallel plan is OK and re-plan if we choose one and
> then a non-zero max_rows shows up.  In the worst case, though, that
> could require every query to be planned twice.

That sounds reasonable as a first cut.  It is perfect for libpq clients, which
always use zero max_rows.  The policy deserves wider scrutiny before release,
but it's the sort of thing we can tweak without wide-ranging effects on the
rest of the parallelism work.  One might use the Bind message portal name as
an additional heuristic.  libpq usually (always?) uses the empty portal name.
A caller specifying a non-empty portal name is far more likely to have a
non-zero max_rows on its way.  More radical still: upon receiving the Bind
message, peek ahead to see if an Execute message is waiting.  If so, let the
max_rows of the Execute inform Bind-associated planning.

> > The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
> > reset proparallel=u for some built-in functions, such as make_interval.
> 
> I can fix that if I add a PARALLEL { SAFE | RESTRICTED | UNSAFE }
> clause to CREATE FUNCTION.

Eventually that, but adding "UPDATE pg_proc" after the CORF is a fine stopgap.

> Or are you saying that CORF shouldn't
> change the existing value of the flag?  I'm not sure exactly what our
> policy is here, but I would expect that the CORF is doing the right
> thing and we need to add the necessary syntax to CREATE FUNCTION and
> then add that clause to those calls.

I agree that CREATE OR REPLACE FUNCTION is doing the right thing.

> > - Several functions, such as array_in and anytextcat, can call arbitrary I/O
> >   functions.  This implies a policy that I/O functions must be
> >   PROPARALLEL_SAFE.  I agree with that decision, but it should be 
> > documented.
> 
> Where?

On further thought, it can wait for the CREATE FUNCTION syntax additions.  At
that time, xtypes.sgml is the place.  Perhaps just adding PARALLEL SAFE to the
example CREATE FUNCTION calls will suffice.

> > - All selectivity estimator functions are PROPARALLEL_SAFE.  That implies, 
> > for
> >   example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or

(I meant to write "eqsel", not "eqjoinsel".  eqjoinsel is not a restriction
selectivity estimator function, but the rest of the argument applies to both
functions.)  That argument does not apply to all possible selectivity
estimator functions.  It is common for an estimator to call the opercode on
MCV and/or histogram values, which makes the estimator function no more
parallel safe than the weakest associated opercode.  eqsel, eqjoinsel and
scalarltsel all do that.

> >   define its own restriction selectivity estimator function.  On the other
> >   hand, the planner need not check the parallel safety of selectivity
> >   estimator functions.  If we're already in parallel mode at the moment we
> >   enter the planner, we must be planning a query called within a
> >   PROPARALLEL_SAFE function.  If the query uses an unsafe operator, the
> >   containing function was mismarked.
> 
> This seems backwards to me.  If some hypothetical selectivity
> estimator were PROPARALLEL_UNSAFE, then any operator that uses that
> function would also need to be PROPARALLEL_UNSAFE.

It's fair to have a PROPARALLEL_UNSAFE estimator for a PROPARALLEL_SAFE
function, because the planning of a parallel query is often not itself done in
parallel mode.  In that case, "SELECT * FROM tablename WHERE colname OP 0"
might use a parallel seqscan but fail completely if called from inside a
function running in parallel mode.  That is to say, an affected query can
itself use parallelism, but placing the query in a function makes the function
PROPARALLEL_UNSAFE.  Surprising, but not wrong.

Rereading my previous message, I failed to make the bottom line clear: I
recommend marking eqsel etc. PROPARALLEL_UNSAFE but *not* checking an
estimator's proparallel before calling it in the planner.

> The reverse is not
> true.  The operator can be as unsafe as it likes and still use a safe
> estimator.

Agreed.  "contsel" is PROPARALLEL_SAFE no matter

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-18 Thread Naoya Anzai

Hi, Michael

I thought about VACOPT_VERBOSE again.

As a result, I think you should not delete VACOPT_VERBOSE.

According to the last mail I have posted, the difference of 
manual-vacuum log and auto-vacuum log exists clearly.
So, at least you should not touch the mechanism of VACOPT_VERBOSE 
until both vacuum log formats are unified to a same format.

If you agree my think, please undo your removing VACOPT_VERBOSE work.

Regards,
---
Naoya


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane

Michael Paquier  writes:
> 1-2) sizeof(ParamListInfoData) is present in a couple of places,
> assuming that sizeof(ParamListInfoData) has the equivalent of 1
> parameter, like prepare.c, functions.c, spi.c and postgres.c:
> -   /* sizeof(ParamListInfoData) includes the first array element */
> paramLI = (ParamListInfo)
> palloc(sizeof(ParamListInfoData) +
> -  (num_params - 1) * sizeof(ParamExternData));
> +  num_params * sizeof(ParamExternData));
> 1-3) FuncCandidateList in namespace.c (thanks Andres!):
> newResult = (FuncCandidateList)
> -   palloc(sizeof(struct _FuncCandidateList) - sizeof(Oid)
> -  + effective_nargs * sizeof(Oid));
> +   palloc(sizeof(struct _FuncCandidateList) +
> +  effective_nargs * sizeof(Oid));
> I imagine that we do not want for those palloc calls to use ifdef
> FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if
> compiler does not support flexible-array length, right?

These are just wrong.  As a general rule, we do not want to *ever* take
sizeof() a struct that contains a flexible array: the results will not
be consistent across platforms.  The right thing is to use offsetof()
instead.  See the helpful comment autoconf provides:

/* Define to nothing if C supports flexible array members, and to 1 if it does
   not. That way, with a declaration like `struct s { int n; double
   d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
   compilers. When computing the size of such an object, don't use 'sizeof
   (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
   instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
   MSVC and with C++ compilers. */
#define FLEXIBLE_ARRAY_MEMBER /**/

This point is actually the main reason we've not done this change long
since.  People did not feel like running around to make sure there were
no overlooked uses of sizeof().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier

On Thu, Feb 19, 2015 at 11:00 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane  wrote:
>>> Moreover, if we have any code that is assuming such cases are okay, it
>>> probably needs a second look.  Isn't this situation effectively assuming
>>> that a variable-length array is fixed-length?
>
>> AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
>> put a couple of things in light that could be revisited:
>> 1) tuptoaster.c, with this declaration of varlena:
>> struct
>> ...
>> }   chunk_data;
>
> I'm pretty sure that thing ought to be a union, not a struct.
>
>> 2) inv_api.c with this thing:
>> ...
> And probably this too.

Sounds good to me, but with an additional VARHDRSZ to give enough room
IMO (or toast_save_datum explodes).

>> 3) heapam.c in three places with HeapTupleHeaderData:
>> struct
>> {
>> HeapTupleHeaderData hdr;
>> chardata[MaxHeapTupleSize];
>> }   tbuf;
>
> And this, though I'm not sure if we'd have to change the size of the
> padding data[] member.

Here I think that we should add sizeof(HeapTupleHeaderData) to ensure
that there is enough room

>> 5) reorderbuffer.h with its use of HeapTupleHeaderData:
>
> Hmm.  Andres will have to answer for that one ;-)

Surely. This impacts decode.c and reorder.c at quick glance.

So, attached are a new set of patches:
1) 0001 is more or less the same as upthread, changing trivial places
with foo[1]. I have checked as well calls to sizeof for the structures
impacted:
1-1) In dumputils.c, I guess that the call of sizeof with
SimpleStringListCell should be changed as follows:
cell = (SimpleStringListCell *)
-   pg_malloc(sizeof(SimpleStringListCell) + strlen(val));
+   pg_malloc(sizeof(SimpleStringListCell));
1-2) sizeof(ParamListInfoData) is present in a couple of places,
assuming that sizeof(ParamListInfoData) has the equivalent of 1
parameter, like prepare.c, functions.c, spi.c and postgres.c:
-   /* sizeof(ParamListInfoData) includes the first array element */
paramLI = (ParamListInfo)
palloc(sizeof(ParamListInfoData) +
-  (num_params - 1) * sizeof(ParamExternData));
+  num_params * sizeof(ParamExternData));
1-3) FuncCandidateList in namespace.c (thanks Andres!):
newResult = (FuncCandidateList)
-   palloc(sizeof(struct _FuncCandidateList) - sizeof(Oid)
-  + effective_nargs * sizeof(Oid));
+   palloc(sizeof(struct _FuncCandidateList) +
+  effective_nargs * sizeof(Oid));
I imagine that we do not want for those palloc calls to use ifdef
FLEXIBLE_ARRAY_MEMBER to save some memory for code readability even if
compiler does not support flexible-array length, right?
2) 0002 fixes pg_authid to use CATALOG_VARLEN, this is needed for 0003.
3) 0003 switches varlena in c.h to use FLEXIBLE_ARRAY_MEMBER, with
necessary tweaks added for tuptoaster.c and inv_api.c
4) 0004 is some preparatory work before switching HeapTupleHeaderData
and MinimalTupleData, changing the struct declarations to union in
heapam.c with enough room ensured for processing.

Regards,
-- 
Michael
From 509e9e28e523624881b2584ce05c78fc422cff44 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 16 Feb 2015 02:44:10 +0900
Subject: [PATCH 1/4] First cut with FLEXIBLE_ARRAY_MEMBER

This is targetted to prevent false positive errors that static code
analyzers may do by assuming that some structures have an unvarying
size.
---
 contrib/cube/cubedata.h |  7 +++
 contrib/intarray/_int.h |  4 ++--
 contrib/ltree/ltree.h   | 14 +++---
 contrib/pg_trgm/trgm.h  |  2 +-
 src/backend/catalog/namespace.c |  4 ++--
 src/backend/commands/prepare.c  |  4 ++--
 src/backend/executor/functions.c|  3 +--
 src/backend/executor/spi.c  |  3 +--
 src/backend/nodes/params.c  |  3 +--
 src/backend/tcop/postgres.c |  3 +--
 src/bin/pg_dump/dumputils.c |  3 +--
 src/bin/pg_dump/dumputils.h |  2 +-
 src/include/access/gin_private.h|  2 +-
 src/include/access/gist_private.h   |  5 +++--
 src/include/access/heapam_xlog.h|  2 +-
 src/include/access/spgist_private.h | 10 +-
 src/include/access/xact.h   |  8 
 src/include/c.h |  4 ++--
 src/include/catalog/namespace.h |  4 ++--
 src/include/commands/dbcommands.h   |  4 ++--
 src/include/commands/tablespace.h   |  2 +-
 src/include/executor/hashjoin.h |  2 +-
 src/include/nodes/bitmapset.h   |  4 ++--
 src/include/node

[HACKERS] Dead code in gin_private.h related to page split in WAL

2015-02-18 Thread Michael Paquier

Hi all,

I noticed that the following structures are still defined in
gin_private.h but they are used nowhere since 2c03216d that has
reworked WAL format:
- ginxlogSplitEntry
- ginxlogSplitDataLeaf
- ginxlogSplitDataInternal
Attached is a trivial patch to remove them.
Regards,
-- 
Michael
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index bda7c28..9bb0ba4 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -511,34 +511,6 @@ typedef struct ginxlogSplit
 #define GIN_INSERT_ISLEAF	0x02	/* .. */
 #define GIN_SPLIT_ROOT		0x04	/* only for split records */
 
-typedef struct
-{
-	OffsetNumber separator;
-	OffsetNumber nitem;
-
-	/* FOLLOWS: IndexTuples */
-} ginxlogSplitEntry;
-
-typedef struct
-{
-	uint16		lsize;
-	uint16		rsize;
-	ItemPointerData lrightbound;	/* new right bound of left page */
-	ItemPointerData rrightbound;	/* new right bound of right page */
-
-	/* FOLLOWS: new compressed posting lists of left and right page */
-	char		newdata[1];
-} ginxlogSplitDataLeaf;
-
-typedef struct
-{
-	OffsetNumber separator;
-	OffsetNumber nitem;
-	ItemPointerData rightbound;
-
-	/* FOLLOWS: array of PostingItems */
-} ginxlogSplitDataInternal;
-
 /*
  * Vacuum simply WAL-logs the whole page, when anything is modified. This
  * functionally identical heap_newpage records, but is kept separate for
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cfd580c..ca7454f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2192,9 +2192,6 @@ ginxlogInsertEntry
 ginxlogInsertListPage
 ginxlogRecompressDataLeaf
 ginxlogSplit
-ginxlogSplitDataInternal
-ginxlogSplitDataLeaf
-ginxlogSplitEntry
 ginxlogUpdateMeta
 ginxlogVacuumDataLeafPage
 ginxlogVacuumPage

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread David G. Johnston

On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan  wrote:

>
> On 02/18/2015 08:34 PM, David Fetter wrote:
>
>> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
>>
>>> On 1/20/15 6:32 PM, David G Johnston wrote:
>>>
 In fact, as far as the database knows, the values provided to this
 function do represent an entire population and such a correction
 would be unnecessary.  I guess it boils down to whether "future"
 queries are considered part of the population or whether the
 population changes upon each query being run and thus we are
 calculating the ever-changing population variance.

>>>

> I think we should be calculating the population variance.
>>>
>>

> Why population variance and not sample variance?  In distributions
>> where the second moment about the mean exists, it's an unbiased
>> estimator of the variance.  In this, it's different from the
>> population variance.
>>
>

> Because we're actually measuring the whole population, and not a sample?
>

This.

The key incorrect word in David Fetter's statement is "estimator".  We are
not estimating anything but rather providing descriptive statistics for a
defined population.

Users extrapolate that the next member added to the population would be
expected to conform to this statistical description without bias (though
see below).  We can also then define the new population by including this
new member and generating new descriptive statistics (which allows for
evolution to be captured in the statistics).

Currently (I think) we allow the end user to kill off the entire population
and build up from scratch so that while, in the short term, the ability to
predict the attributes of future members is limited once the population has
reached a statistically significant level new predictions will no longer
be skewed by population members who attributes were defined in a older and
possibly significantly different environment.  In theory it would be nice
to be able to give the user the ability to specify - by time or percentage
- a subset of the population to leave alive.

Actual time-weighted sampling would be an alternative but likely one
significantly more difficult to accomplish.  I really have dug too deep
into the mechanics of the current code but I don't see any harm in sharing
the thought.

David J.

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Fujii Masao

On Thu, Feb 19, 2015 at 12:25 AM, David Steele  wrote:
> Hi Fujii,
>
> Thanks for taking a look at the patch.  Comments below:
>
> On 2/18/15 6:11 AM, Fujii Masao wrote:
>> On Wed, Feb 18, 2015 at 1:26 AM, David Steele  wrote:
>>> On 2/17/15 10:23 AM, Simon Riggs wrote:
 I vote to include pgaudit in 9.5, albeit with any changes. In
 particular, David may have some changes to recommend, but I haven't
 seen a spec or a patch, just a new version of code (which isn't how we
 do things...).
>>>
>>> I submitted the new patch in my name under a separate thread "Auditing
>>> extension for PostgreSQL (Take 2)" (54e005cc.1060...@pgmasters.net)
>>
>> I played the patch version of pg_audit a bit and have basic comments about
>> its spec.
>>
>> The pg_audit doesn't log BIND parameter values when prepared statement is 
>> used.
>> Seems this is an oversight of the patch. Or is this intentional?
>
> It's actually intentional - following the model I talked about in my
> earlier emails, the idea is to log statements only.

Is this acceptable for audit purpose in many cases? Without the values,
I'm afraid that it's hard to analyze what table records are affected by
the statements from the audit logs. I was thinking that identifying the
data affected is one of important thing for the audit. If I'm malicious DBA,
I will always use the extended protocol to prevent the values from being
audited when I execute the statement.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)

2015-02-18 Thread Tomas Vondra

On 23.12.2014 11:28, Heikki Linnakangas wrote:
> On 12/07/2014 03:54 AM, Tomas Vondra wrote:
>> The one interesting case is the 'step skew' with statistics_target=10,
>> i.e. estimates based on mere 3000 rows. In that case, the adaptive
>> estimator significantly overestimates:
>>
>>  values   currentadaptive
>>  --
>>  106   99 107
>>  1068 6449190
>>  1006  38 6449190
>>  10006327   42441
>>
>> I don't know why I didn't get these errors in the previous runs, because
>> when I repeat the tests with the old patches I get similar results with
>> a 'good' result from time to time. Apparently I had a lucky day back
>> then :-/
>>
>> I've been messing with the code for a few hours, and I haven't
>> found any significant error in the implementation, so it seems that
>> the estimator does not perform terribly well for very small samples
>> (in this case it's 3000 rows out of 10.000.000 (i.e. ~0.03%).
> 
> The paper [1] gives an equation for an upper bound of the error of this
> GEE estimator. How do the above numbers compare with that bound?

Well, that's a bit more complicated because the "Theorem 1" you mention
does not directly specify upper boundary for a single estimate. It's
formulated like this:

  Assume table with "N" rows, D distinct values and sample of "r"
  rows (all those values are fixed). Then there exists a dataset with
  those features, so that "ratio error"

  error(D, D') = max(D'/D, D/D')

  is greater than f(N, r, P) with probability at least "P". I.e. if you
  randomly choose a sample of 'r' rows, you'll get an error exceeding
  the ratio with probability P.

So it's not not a hard limit, but speaks about probability of estimation
error with some (unknown) dataset dataset. So it describes what you can
achieve at best - if you think your estimator is better, there'll always
be a dataset hiding in the shadows hissing "Theorem 1".

Let's say we're looking for boundary that's crossed only in 1% (or 5%)
of measurements. Applying this to the sample data I posted before, i.e.
10M rows with three sample sizes 'r' (3000, 30.000 and 300.000 rows),
the ratio error boundary per the the paper is

3.000 30.000 300.000

1% 88 28   9
5% 70 22   7

At least that's what I get if I compute it using this python function:

def err(N, r, p):
return sqrt((N-r)/(2.0*r) * log(1.0/p))

So the estimates I posted before are not terribly good, I guess,
especially the ones returning 6449190. I wonder whether there really is
some stupid bug in the implementation.

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Gavin Flower


On 19/02/15 15:00, Tom Lane wrote:

Michael Paquier  writes:

On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane  wrote:

Moreover, if we have any code that is assuming such cases are okay, it
probably needs a second look.  Isn't this situation effectively assuming
that a variable-length array is fixed-length?

AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
put a couple of things in light that could be revisited:
1) tuptoaster.c, with this declaration of varlena:
 struct
 {
 struct varlena hdr;
 chardata[TOAST_MAX_CHUNK_SIZE]; /* make
struct big enough */
 int32   align_it;   /* ensure struct is
aligned well enough */
 }   chunk_data;

I'm pretty sure that thing ought to be a union, not a struct.


2) inv_api.c with this thing:
 struct
 {
 bytea   hdr;
 chardata[LOBLKSIZE];/* make struct
big enough */
 int32   align_it;   /* ensure struct is
aligned well enough */
 }   workbuf;

And probably this too.


3) heapam.c in three places with HeapTupleHeaderData:
 struct
 {
 HeapTupleHeaderData hdr;
 chardata[MaxHeapTupleSize];
 }   tbuf;

And this, though I'm not sure if we'd have to change the size of the
padding data[] member.


5) reorderbuffer.h with its use of HeapTupleHeaderData:

Hmm.  Andres will have to answer for that one ;-)

regards, tom lane



Curious, has this problem been raised with the gcc maintainers?

Is this still a problem with gcc 5.0 (which is due to be released soon)?


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-02-18 Thread Magnus Hagander

On Wed, Feb 18, 2015 at 10:57 PM, Kevin Grittner  wrote:

> Magnus Hagander  wrote:
> > On Feb 17, 2015 12:26 AM, "Andres Freund" 
> wrote:
> >> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
> >> But max_standby_streaming_delay, max_standby_archive_delay and
> >> hot_standby_feedback are among the most frequent triggers for
> >> questions and complaints that I/we see.
> >>
> > Agreed.
> > And a really bad one used to be vacuum_defer_cleanup_age, because
> > of confusing units amongst other things. Which in terms seems
> > fairly close to Kevins suggestions, unfortunately.
>
> Particularly my initial suggestion, which was to base snapshot
> "age" it on the number of transaction IDs assigned.  Does this look
> any better to you if it is something that can be set to '20min' or
> '1h'?  Just to restate, that would not automatically cancel the
> snapshots past that age; it would allow vacuum of any tuples which
> became "dead" that long ago, and would cause a "snapshot too old"
> message for any read of a page modified more than that long ago
> using a snapshot which was older than that.
>

Yes, it would definitely look much better. My reference per above was
exactly that - having a setting in the unit "number of xids" confused a lot
of users and made it really hard to tune. Having something in time units is
a lot easier to understand and tune for most people.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Etsuro Fujita


On 2015/02/18 21:44, Stephen Frost wrote:

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

On 2015/02/18 7:44, Stephen Frost wrote:

Attached is a patch which should address this.  Would love your (or
anyone else's) feedback on it.  It appears to address the issue which
you raised and the regression test changes are all in-line with
inserting a LockRows into the subquery, as anticipated.


I've looked into the patch.

* The patch applies to the latest head, 'make' passes successfully,
but 'make check' fails in the rowsecurity test.


Apologies for not being clear- the patch was against 9.4, where it
passes all the regression tests (at least for me- if you see
differently, please let me know!).


Sorry, I assumed that the patch was against HEAD.  I comfermed that the 
back-patched 9.4 passes all the regression tests!


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anyarray

2015-02-18 Thread Tomas Vondra

On 19.2.2015 03:14, Tomas Vondra wrote:
>
> I've noticed two unrelated files

Meh, should be "I noticed the patch removes two unrelated files" ...

> 
> ../src/test/modules/dummy_seclabel/expected/dummy_seclabel.out
> ../src/test/modules/dummy_seclabel/sql/dummy_seclabel.sql
> 
> I suppose that's not intentional, right?
> 



-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anyarray

2015-02-18 Thread Tomas Vondra

On 13.2.2015 16:20, Teodor Sigaev wrote:
> Some of users of intarray contrib module wish to use its features
> with another kind of arrays, not only for int4 type. Suggested
> module generalizes intarray over other (not all) types op pgsql.
> 
> Anyarray also provides a calculation of similarity two one
> dimensinal arrays similar to smlar module. Anyarray module doesn't
> provide an something similar to query_int feature of intarray,
> because this feature is very hard to implement (it requires new
> pseudo-type anyquery), it is close to impossible to have operation
> extensibility and it's complicated queries are hidden from pgsql's
> optimizer (like to jsquery). As far I know, query_int isn't very
> popular for now.

Hi Teodor,

I started reviewing this patch today - first of all, kudos for having
944kB of regression tests in 1MB patch ;-)

I've noticed two unrelated files

../src/test/modules/dummy_seclabel/expected/dummy_seclabel.out
../src/test/modules/dummy_seclabel/sql/dummy_seclabel.sql

I suppose that's not intentional, right?

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane

Michael Paquier  writes:
> On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane  wrote:
>> Moreover, if we have any code that is assuming such cases are okay, it
>> probably needs a second look.  Isn't this situation effectively assuming
>> that a variable-length array is fixed-length?

> AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
> put a couple of things in light that could be revisited:
> 1) tuptoaster.c, with this declaration of varlena:
> struct
> {
> struct varlena hdr;
> chardata[TOAST_MAX_CHUNK_SIZE]; /* make
> struct big enough */
> int32   align_it;   /* ensure struct is
> aligned well enough */
> }   chunk_data;

I'm pretty sure that thing ought to be a union, not a struct.

> 2) inv_api.c with this thing:
> struct
> {
> bytea   hdr;
> chardata[LOBLKSIZE];/* make struct
> big enough */
> int32   align_it;   /* ensure struct is
> aligned well enough */
> }   workbuf;

And probably this too.

> 3) heapam.c in three places with HeapTupleHeaderData:
> struct
> {
> HeapTupleHeaderData hdr;
> chardata[MaxHeapTupleSize];
> }   tbuf;

And this, though I'm not sure if we'd have to change the size of the
padding data[] member.

> 5) reorderbuffer.h with its use of HeapTupleHeaderData:

Hmm.  Andres will have to answer for that one ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the stats snapshot timestamp to SQL

2015-02-18 Thread Tomas Vondra

Hi there,

On 30.1.2015 06:27, Matt Kelly wrote:
> Actually, I just did one more code review of myself, and somehow missed
> that I submitted the version with the wrong oid.  The oid used in the
> first version is wrong (1) and was from before I read the docs on
> properly picking one.
> 
> Attached is the fixed version. (hopefully with the right mime-type and
> wrong extension.  Alas, gmail doesn't let you set mime-types; time to
> find a new email client...)

I do have a question regarding the patch, although I see the patch is
already marked as 'ready for committer' (sorry, somehow managed to miss
the patch until now).

I see the patch only works with the top-level snapshot timestamp, stored
in globalStats, but since 9.3 (when the stats were split into per-db
files) we track per-database timestamps too.

Shouldn't we make those timestamps accessible too? It's not necessary
for detecting unresponsive statistics collector (if it's stuck it's not
writing anything, so the global timestamp will be old too), but it seems
more appropriate for querying database-level stats to query
database-level timestamp too.

But maybe that's not necessary, because to query database stats you have
to be connected to that particular database and that should write fresh
stats, so the timestamps should not be very different.

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread Andrew Dunstan



On 02/18/2015 08:34 PM, David Fetter wrote:

On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:

On 1/20/15 6:32 PM, David G Johnston wrote:

In fact, as far as the database knows, the values provided to this
function do represent an entire population and such a correction
would be unnecessary.  I guess it boils down to whether "future"
queries are considered part of the population or whether the
population changes upon each query being run and thus we are
calculating the ever-changing population variance.

I think we should be calculating the population variance.

Why population variance and not sample variance?  In distributions
where the second moment about the mean exists, it's an unbiased
estimator of the variance.  In this, it's different from the
population variance.




Because we're actually measuring the whole population, and not a sample?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread David Fetter

On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
> On 1/20/15 6:32 PM, David G Johnston wrote:
> > In fact, as far as the database knows, the values provided to this
> > function do represent an entire population and such a correction
> > would be unnecessary.  I guess it boils down to whether "future"
> > queries are considered part of the population or whether the
> > population changes upon each query being run and thus we are
> > calculating the ever-changing population variance.
> 
> I think we should be calculating the population variance.

Why population variance and not sample variance?  In distributions
where the second moment about the mean exists, it's an unbiased
estimator of the variance.  In this, it's different from the
population variance.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier

On Thu, Feb 19, 2015 at 7:29 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund  
>> wrote:
>>> The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
>>> middle of a struct but not when when you embed a struct that uses it
>>> into the middle another struct. At least gcc doesn't and I think it'd be
>>> utterly broken if another compiler did that. If there's a compiler that
>>> does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.
>
>> clang does complain on my OSX laptop regarding that ;)
>
> I'm a bit astonished that gcc doesn't consider this an error.  Sure seems
> like it should.  (Has anyone tried it on recent gcc?)

Just tried with gcc 4.9.2 on an ArchLinux bix and it does not complain
after switching the declaration of varlena declaration from [1] to
FLEXIBLE_ARRAY_MEMBER in c.h on HEAD. But it does with clang 3.5.1 on
the same box.

> I am entirely
> opposed to Andreas' claim that we ought to consider compilers that do warn
> to be broken; if anything it's the other way around.

I'm on board with that.

> Moreover, if we have any code that is assuming such cases are okay, it
> probably needs a second look.  Isn't this situation effectively assuming
> that a variable-length array is fixed-length?

AFAIK, switching a bunch of things to use FLEXIBLE_ARRAY_MEMBER has
put a couple of things in light that could be revisited:
1) tuptoaster.c, with this declaration of varlena:
struct
{
struct varlena hdr;
chardata[TOAST_MAX_CHUNK_SIZE]; /* make
struct big enough */
int32   align_it;   /* ensure struct is
aligned well enough */
}   chunk_data;
2) inv_api.c with this thing:
struct
{
bytea   hdr;
chardata[LOBLKSIZE];/* make struct
big enough */
int32   align_it;   /* ensure struct is
aligned well enough */
}   workbuf;
3) heapam.c in three places with HeapTupleHeaderData:
struct
{
HeapTupleHeaderData hdr;
chardata[MaxHeapTupleSize];
}   tbuf;
4) pg_authid.h with its use of relpasswd.
5) reorderbuffer.h with its use of HeapTupleHeaderData:
typedef struct ReorderBufferTupleBuf
{
/* position in preallocated list */
slist_node  node;

/* tuple, stored sequentially */
HeapTupleData tuple;
HeapTupleHeaderData header;
chardata[MaxHeapTupleSize];
} ReorderBufferTupleBuf;

Those issues can be grouped depending on where foo[1] is switched to
FLEXIBLE_ARRAY_MEMBER, so I will try to get a set of patches depending
on that.

Regards,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-02-18 Thread Stephen Frost

* Kevin Grittner (kgri...@ymail.com) wrote:
> Stephen Frost  wrote:
> > I also agree with the general idea that it makes sense to provide a way
> > to control bloat, but I think you've missed what Andres was getting at
> > with his suggestion (if I understand correctly, apologies if I don't).
> >
> > The problem is that we're only looking at the overall xmin / xmax
> > horizon when it comes to deciding if a given tuple is dead.  That's
> > not quite right- the tuple might actually be dead to all *current*
> > transactions by being newer than the oldest transaction but dead for all
> > later transactions.  Basically, there exist gaps between our cluster
> > wide xmin / xmax where we might find actually dead rows.  Marking those
> > rows dead and reusable *would* stop the bloat, not just slow it down.
> >
> > In the end, with a single long-running transaction, the worst bloat you
> > would have is double the size of the system at the time the long-running
> > transaction started.
> 
> I agree that limiting bloat to one dead tuple for every live one
> for each old snapshot is a limit that has value, and it was unfair
> of me to characterize that as not being a limit.  Sorry for that.
> 
> This possible solution was discussed with the user whose feedback
> caused me to write this patch, but there were several reasons they
> dismissed that as a viable total solution for them, two of which I
> can share:
> 
> (1)  They have a pool of connections each of which can have several
> long-running cursors, so the limit from that isn't just doubling
> the size of their database, it is limiting it to some two-or-three
> digit multiple of the necessary size.

This strikes me as a bit off-the-cuff; was an analysis done which
deteremined that would be the result?  If there is overlap between the
long-running cursors then there would be less bloat, and most systems
which I'm familiar with don't turn the entire database over in 20
minutes, 20 hours, or even 20 days except in pretty specific cases.
Perhaps this is one of those, and if so then I'm all wet, but the
feeling I get is that this is a way to dismiss this solution because
it's not what's wanted, which is "what Oracle did."

> (2)  They are already prepared to deal with "snapshot too old"
> errors on queries that run more than about 20 minutes and which
> access tables which are modified.  They would rather do that than
> suffer the bloat beyond that point.

That, really, is the crux here- they've already got support for dealing
with it the way Oracle did and they'd like PG to do that too.
Unfortunately, that, by itself, isn't a good reason for a particular
capability (we certainly aren't going to be trying to duplicate PL/SQL
in PG any time soon).  That said, there are capabilities in other
RDBMS's which are valuable and which we *do* want, so the fact that
Oracle does this also isn't a reason to not include it.

> IMO all of these changes people are working are very valuable, and
> complement each other.  This particular approach is likely to be
> especially appealing to those moving from Oracle because it is a
> familiar mechanism, and one which some of them have written their
> software to be aware of and deal with gracefully.

For my 2c, I'd much rather provide them with a system where they don't
have to deal with broken snapshots than give them a way to have them the
way Oracle provided them. :)  That said, even the approach Andres
outlined will cause bloat and it may be beyond what's acceptable in some
environments, and it's certainly more complicated and unlikely to get
done in the short term.

> > I'm not against having a knob like this, which is defaulted to off,
> 
> Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
> sound like a -1.  :-)

So, at the time I wrote that, I wasn't sure if it was a +1 or not
myself.  I've been thinking about it since then, however, and I'm
leaning more towards having the capability than not, so perhaps that's a
+1, but it doesn't excuse the need to come up with an implementation
that everyone can be happy with and what you've come up with so far
doesn't have a lot of appeal, based on the feedback (I've only glanced
through it myself, but I agree with Andres and Tom that it's a larger
footprint than we'd want for this and the number of places having to be
touched is concerning as it could lead to future bugs).

A lot of that would go away if there was a way to avoid having to mess
with the index AMs, I'd think, but I wonder if we'd actually need more
done there- it's not immediately obvious to me how an index-only scan is
safe with this.  Whenever an actual page is visited, we can check the
LSN, and the relation can't be truncated by vacuum since the transaction
will still have a lock on the table which prevents it, but does the
visibility-map update check make sure to never mark pages all-visible
when one of these old transactions is running around?  On what basis?

> > but I do think we'd be a lot better off

Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-02-18 Thread Greg Stark

On Sun, Feb 15, 2015 at 8:27 PM, Tom Lane  wrote:

> There might be something in that, but again it's not much like this patch.
> The key point I think we're both making is that nondeterministic failures
> are bad, especially when you're talking about long-running, expensive-to-
> retry transactions.
>

But I think Kevin would agree with you there. That's why he's interested in
having the errors not occur if you don't read from the volatile tables. Ie,
your reporting query reading from read-only tables would be guaranteed to
succeed even if some other table had had some rows vacuumed away.

I'm not sure that's really going to work because of things like hint bit
updates or hot pruning. That is, say your table is read-only now but last
week some of the records were updated. You might reasonably expect those
rows to be safe and indeed the rows themselves will still be in your
report. But the earlier versions of the rows could still be sitting on some
pages invisible to every query but not vacuumable quite yet and then
suddenly they get vacuumed away and the LSN on the page gets bumped.

Fwiw I would strongly suggest that instead of having a number of
transactions to have a time difference. I haven't been following the
"commit timestamp" thread but I'm hoping that patch has the infrastructure
needed to look up an lsn and find out the timestamp and vice versa. So when
you take a snapshot you could look up the effective cut-off LSN based on
the time difference specified in the GUC. As a DBA I would find it
infinitely more comforting specifying "old_snapshot_threshold=4h" than
specifying some arbitrary large number of transactions and hoping I
calculated the transaction rate reasonably accurately.

-- 
greg

[HACKERS] INSERT ... ON CONFLICT UPDATE and logical decoding

2015-02-18 Thread Peter Geoghegan

Andres pointed out that the INSERT ... ON CONFLICT UPDATE patch
doesn't work well with logical decoding. Basically, as things stand,
there is no means of determining that an INSERT had an ON CONFLICT
UPDATE clause from logical decoding. It isn't obvious that this
matters - after all, the relevant WAL records are generally consistent
with there being a vanilla INSERT (or vanilla UPDATE). If there was no
possibility of concurrent conflicts, that might actually be the end of
it, but of course there is.

I guess that the best way of fixing this is exposing to output plugins
that a "super deletion" is a
REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE.  This is kind of an
internal ReorderBufferChangeType constant, because it means that the
output plugin should probably just omit the tuple just inserted and
now deleted. I tend to think that it would be best if the fact that a
speculative insertion was a speculative insertion is not exposed. We
just formalize that a REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE is a
variant of REORDER_BUFFER_CHANGE_DELETE...it's implicit that things
like triggers are not fired here (which may or may not matter,
depending on the use case, I suppose).

Would that be sufficiently flexible for the various logical
replication use cases? I guess we are somewhat forced to push that
kind of thing into output plugins, because doing so lets them decide
how to handle this. It's a little unfortunate that this implementation
detail is exposed to output plugins, though, which is why I'd be
willing to believe that it'd be better to have transaction reassembly
normalize the records such that a super deleted tuple was never
exposed to output plugins in the first place...they'd only see a
REORDER_BUFFER_CHANGE_INSERT when that was the definitive outcome of
an UPSERT, or alternatively a REORDER_BUFFER_CHANGE_UPDATE when that
was the definitive outcome. No need for output plugins to consider
REORDER_BUFFER_CHANGE_INTERNAL_SUPERDELETE at all.

Thoughts? Can't say that I've given conflict resolution for
multi-master systems a great deal of thought before now, so I might be
quite off the mark here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] array_push is just stupid ...

2015-02-18 Thread Tom Lane

While hacking around with expanded arrays, I realized that array_push()
is just a horribly badly designed function.  There is no reason at all
for the same C function to underlie both array_append and array_prepend:
the function spends significant time and effort reverse-engineering which
way it was called, and there's very little logic-in-common that would
justify merging the two code paths.  If we split it apart into separate
array_append and array_prepend functions, I think we'd actually end up
with less code.  And it would certainly be noticeably faster because
we'd not need to be doing get_fn_expr_argtype and get_element_type (the
latter of which is a syscache lookup, hence not so cheap) on *both*
arguments every time through.  (Admittedly, some of that could be bought
back with a bit more effort on locally caching the result, but it would be
better to know which input is the array a priori.)

Barring objections, I'm going to go fix this independently of the
expanded-array changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-02-18 Thread Kevin Grittner

Stephen Frost  wrote:

> I also agree with the general idea that it makes sense to provide a way
> to control bloat, but I think you've missed what Andres was getting at
> with his suggestion (if I understand correctly, apologies if I don't).
>
> The problem is that we're only looking at the overall xmin / xmax
> horizon when it comes to deciding if a given tuple is dead.  That's
> not quite right- the tuple might actually be dead to all *current*
> transactions by being newer than the oldest transaction but dead for all
> later transactions.  Basically, there exist gaps between our cluster
> wide xmin / xmax where we might find actually dead rows.  Marking those
> rows dead and reusable *would* stop the bloat, not just slow it down.
>
> In the end, with a single long-running transaction, the worst bloat you
> would have is double the size of the system at the time the long-running
> transaction started.

I agree that limiting bloat to one dead tuple for every live one
for each old snapshot is a limit that has value, and it was unfair
of me to characterize that as not being a limit.  Sorry for that.

This possible solution was discussed with the user whose feedback
caused me to write this patch, but there were several reasons they
dismissed that as a viable total solution for them, two of which I
can share:

(1)  They have a pool of connections each of which can have several
long-running cursors, so the limit from that isn't just doubling
the size of their database, it is limiting it to some two-or-three
digit multiple of the necessary size.

(2)  They are already prepared to deal with "snapshot too old"
errors on queries that run more than about 20 minutes and which
access tables which are modified.  They would rather do that than
suffer the bloat beyond that point.

IMO all of these changes people are working are very valuable, and
complement each other.  This particular approach is likely to be
especially appealing to those moving from Oracle because it is a
familiar mechanism, and one which some of them have written their
software to be aware of and deal with gracefully.

> I'm not against having a knob like this, which is defaulted to off,

Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
sound like a -1.  :-)

> but I do think we'd be a lot better off with a system that could
> realize when rows are not visible to any currently running transaction
> and clean them up.

+1

But they are not mutually exclusive; I see them as complementary.

> If this knob's default is off then I don't think
> we'd be likely to get the complaints which are being discussed (or, if
> we did, we could point the individual at the admin who set the knob...).

That's how I see it, too.  I would not suggest making the default
anything other than "off", but there are situations where it would
be a nice tool to have in the toolbox.

Thanks for the feedback!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

2015-02-18 Thread Petr Jelinek


On 16/02/15 10:46, Andres Freund wrote:

On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:


At a quick glance, this basic design seems workable. I would suggest
expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
small price to pay, to make it work more like everything else in the system.


I don't know. Growing from 3 to 5 byte overhead per relevant record (or
even 0 to 5 in case the padding is reused) is rather noticeable. If we
later find it to be a limit (I seriously doubt that), we can still
increase it in a major release without anybody really noticing.



I agree that limiting the overhead is important.

But I have related though about this - I wonder if it's worth to try to 
map this more directly to the CommitTsNodeId. I mean it is currently 
using that for the actual storage, but I am thinking of having the 
Replication Identifiers be the public API and the CommitTsNodeId stuff 
be just hidden implementation detail. This thought is based on the fact 
that CommitTsNodeId provides somewhat meaningless number while 
Replication Identifiers give us nice name for that number. And if we do 
that then the type should perhaps be same for both?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-02-18 Thread Stephen Frost

Kevin,

* Kevin Grittner (kgri...@ymail.com) wrote:
> Magnus Hagander  wrote:
> > On Feb 17, 2015 12:26 AM, "Andres Freund"  wrote:
> >> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
> 
> >>> It seems we already have a mechanism in place that allows
> >>> tuning of query cancel on standbys vs. preventing standby
> >>> queries from seeing old data, specifically
> >>> max_standby_streaming_delay/max_standby_archive_delay.  We
> >>> obsessed about how users were going to react to these odd
> >>> variables, but there has been little negative feedback.
> >>
> >> FWIW, I think that's a somewhat skewed perception. I think it
> >> was right to introduce those, because we didn't really have any
> >> alternatives.
> 
> As far as I can see, the "alternatives" suggested so far are all
> about causing heap bloat to progress more slowly, but still without
> limit.  I suggest, based on a lot of user feedback (from the
> customer I've talked about at some length on this thread, as well
> as numerous others), that unlimited bloat based on the activity of
> one connection is a serious deficiency in the product; and that
> there is no real alternative to something like a "snapshot too old"
> error if we want to fix that deficiency.  Enhancements to associate
> a snapshot with a database and using a separate vacuum xmin per
> database, not limiting vacuum of a particular object by snapshots
> that cannot see that snapshot, etc., are all Very Good Things and I
> hope those changes are made, but none of that fixes a very
> fundamental flaw.

I also agree with the general idea that it makes sense to provide a way
to control bloat, but I think you've missed what Andres was getting at
with his suggestion (if I understand correctly, apologies if I don't).

The problem is that we're only looking at the overall xmin / xmax
horizon when it comes to deciding if a given tuple is dead.  That's
not quite right- the tuple might actually be dead to all *current*
transactions by being newer than the oldest transaction but dead for all
later transactions.  Basically, there exist gaps between our cluster
wide xmin / xmax where we might find actually dead rows.  Marking those
rows dead and reusable *would* stop the bloat, not just slow it down.

In the end, with a single long-running transaction, the worst bloat you
would have is double the size of the system at the time the long-running
transaction started.  Another way of thinking about it is with this
timeline:

xx
/\  /\ /\ /\
Long runningrow createdrow deletedvacuum
transaction starts

Now, if all transactions currently running started either before the row
was created or after the row was deleted, then the row is dead and
vacuum can reclaim it.  Finding those gaps in the transaction space for
all of the currently running processes might not be cheap, but for this
kind of a use-case, it might be worth the effort.  On a high-churn
system where the actual set of rows being copied over constantly isn't
very high then you could end up with some amount of duplication of rows-
those to satisfy the ancient transaction and the current working set,
but there wouldn't be any further bloat and it'd almost cerainly be a
much better situation than what is being seen now.

> Particularly my initial suggestion, which was to base snapshot
> "age" it on the number of transaction IDs assigned.  Does this look
> any better to you if it is something that can be set to '20min' or
> '1h'?  Just to restate, that would not automatically cancel the
> snapshots past that age; it would allow vacuum of any tuples which
> became "dead" that long ago, and would cause a "snapshot too old"
> message for any read of a page modified more than that long ago
> using a snapshot which was older than that.
> 
> Unless this idea gets some +1 votes Real Soon Now, I will mark the
> patch as Rejected and move on.

I'm not against having a knob like this, which is defaulted to off,
but I do think we'd be a lot better off with a system that could
realize when rows are not visible to any currently running transaction
and clean them up.  If this knob's default is off then I don't think
we'd be likely to get the complaints which are being discussed (or, if
we did, we could point the individual at the admin who set the knob...).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane

Michael Paquier  writes:
> On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund  
> wrote:
>> The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
>> middle of a struct but not when when you embed a struct that uses it
>> into the middle another struct. At least gcc doesn't and I think it'd be
>> utterly broken if another compiler did that. If there's a compiler that
>> does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

> clang does complain on my OSX laptop regarding that ;)

I'm a bit astonished that gcc doesn't consider this an error.  Sure seems
like it should.  (Has anyone tried it on recent gcc?)  I am entirely
opposed to Andreas' claim that we ought to consider compilers that do warn
to be broken; if anything it's the other way around.

Moreover, if we have any code that is assuming such cases are okay, it
probably needs a second look.  Isn't this situation effectively assuming
that a variable-length array is fixed-length?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier

On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund  wrote:
> On 2015-02-18 17:15:18 +0900, Michael Paquier wrote:
>> >> - I don't think that the t_bits fields in htup_details.h should be
>> >> updated either.
>> >
>> > Why not? Any not broken code should already use MinHeapTupleSize and
>> > similar macros.
>>
>> Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
>> similarly a couple of redo routines in heapam.c using
>> HeapTupleHeaderData in a couple of structures not placing it at the
>> end (compiler complains).
>
> The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
> middle of a struct but not when when you embed a struct that uses it
> into the middle another struct. At least gcc doesn't and I think it'd be
> utterly broken if another compiler did that. If there's a compiler that
> does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

clang does complain on my OSX laptop regarding that ;)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Andres Freund

On 2015-02-19 07:10:19 +0900, Michael Paquier wrote:
> On Wed, Feb 18, 2015 at 10:09 PM, Andres Freund  
> wrote:
> > On 2015-02-18 17:15:18 +0900, Michael Paquier wrote:
> >> >> - I don't think that the t_bits fields in htup_details.h should be
> >> >> updated either.
> >> >
> >> > Why not? Any not broken code should already use MinHeapTupleSize and
> >> > similar macros.
> >>
> >> Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
> >> similarly a couple of redo routines in heapam.c using
> >> HeapTupleHeaderData in a couple of structures not placing it at the
> >> end (compiler complains).
> >
> > The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
> > middle of a struct but not when when you embed a struct that uses it
> > into the middle another struct. At least gcc doesn't and I think it'd be
> > utterly broken if another compiler did that. If there's a compiler that
> > does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.
> 
> clang does complain on my OSX laptop regarding that ;)

I think that then puts the idea of doing it for those structs pretty
much to bed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow "snapshot too old" error, to prevent bloat

2015-02-18 Thread Kevin Grittner

Magnus Hagander  wrote:
> On Feb 17, 2015 12:26 AM, "Andres Freund"  wrote:
>> On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:

>>> It seems we already have a mechanism in place that allows
>>> tuning of query cancel on standbys vs. preventing standby
>>> queries from seeing old data, specifically
>>> max_standby_streaming_delay/max_standby_archive_delay.  We
>>> obsessed about how users were going to react to these odd
>>> variables, but there has been little negative feedback.
>>
>> FWIW, I think that's a somewhat skewed perception. I think it
>> was right to introduce those, because we didn't really have any
>> alternatives.

As far as I can see, the "alternatives" suggested so far are all
about causing heap bloat to progress more slowly, but still without
limit.  I suggest, based on a lot of user feedback (from the
customer I've talked about at some length on this thread, as well
as numerous others), that unlimited bloat based on the activity of
one connection is a serious deficiency in the product; and that
there is no real alternative to something like a "snapshot too old"
error if we want to fix that deficiency.  Enhancements to associate
a snapshot with a database and using a separate vacuum xmin per
database, not limiting vacuum of a particular object by snapshots
that cannot see that snapshot, etc., are all Very Good Things and I
hope those changes are made, but none of that fixes a very
fundamental flaw.

>> But max_standby_streaming_delay, max_standby_archive_delay and
>> hot_standby_feedback are among the most frequent triggers for
>> questions and complaints that I/we see.
>>
> Agreed.
> And a really bad one used to be vacuum_defer_cleanup_age, because
> of confusing units amongst other things. Which in terms seems
> fairly close to Kevins suggestions, unfortunately.

Particularly my initial suggestion, which was to base snapshot
"age" it on the number of transaction IDs assigned.  Does this look
any better to you if it is something that can be set to '20min' or
'1h'?  Just to restate, that would not automatically cancel the
snapshots past that age; it would allow vacuum of any tuples which
became "dead" that long ago, and would cause a "snapshot too old"
message for any read of a page modified more than that long ago
using a snapshot which was older than that.

Unless this idea gets some +1 votes Real Soon Now, I will mark the
patch as Rejected and move on.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > > Now, we probably don't want to hack *all* the utility commands to return
> > > ObjectAddress instead of OID, because it many cases that's just not
> > > going to be convenient (not to speak of the code churn); so I think for
> > > most objtypes the ProcessUtilitySlow stanza would look like this:
> 
> > That'd be fine with me, though for my 2c, I wouldn't object to changing
> > them all to return ObjectAddress either.  I agree that it'd cause a fair
> > bit of code churn to do so, but there's a fair bit of code churn
> > happening here anyway (looking at what 0008 does to ProcessUtilitySlow
> > anyway).
> 
> Well, that would make my life easier I think (even if it's a bit more
> work), so unless there are objections I will do things this way.  It's a
> bit of a pity that Robert and Dimitri went to huge lengths to have these
> functions return OID and we're now changing it all to ObjAddress
> instead, but oh well.

Not sure that I see it as that huge a deal..  They're still returning an
Oid, it's just embedded in the ObjAddress to provide a complete
statement of what the object is.

btw, the hunk in 0026 which adds a 'break;' into standard_ProcessUtility
caught me by surprise.  Looks like that 'break;' was missing from 0003
(for T_GrantStmt).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera

> > Now, we probably don't want to hack *all* the utility commands to return
> > ObjectAddress instead of OID, because it many cases that's just not
> > going to be convenient (not to speak of the code churn); so I think for
> > most objtypes the ProcessUtilitySlow stanza would look like this:

> That'd be fine with me, though for my 2c, I wouldn't object to changing
> them all to return ObjectAddress either.  I agree that it'd cause a fair
> bit of code churn to do so, but there's a fair bit of code churn
> happening here anyway (looking at what 0008 does to ProcessUtilitySlow
> anyway).

Well, that would make my life easier I think (even if it's a bit more
work), so unless there are objections I will do things this way.  It's a
bit of a pity that Robert and Dimitri went to huge lengths to have these
functions return OID and we're now changing it all to ObjAddress
instead, but oh well.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > > Patch 0002 I think is good to go as well, AFAICT (have the various
> > > RENAME commands return the OID and attnum of affected objects).
> > 
> > It's not a huge complaint, but it feels a bit awkward to me that
> > ExecRenameStmt is now returning one item and using an out variable for
> > the other when the two really go together (Oid and Object Sub ID, that
> > is).  Further, the comment above ExecRenameStmt should make it clear
> > that it's safe to pass NULL into objsubid if you don't care about it.
> > 
> > The same probably goes for the COMMENT bits.
> 
> Hmm, while I agree that it's a relatively minor point, it seems a fair
> one.  I think we could handle this by returning ObjectAddress rather
> than Oid in ExecRenameStmt() and CommentObject(); then you have all the
> bits you need in a single place.  Furthermore, the function in another
> patch EventTriggerStashCommand() instead of getting separately (ObjType,
> objectId, objectSubId) could take a single argument of type
> ObjectAddress.

Agreed, that thought occured to me as well while I was looking through
the other deparse patches and I agree that it makes sense.

> Now, we probably don't want to hack *all* the utility commands to return
> ObjectAddress instead of OID, because it many cases that's just not
> going to be convenient (not to speak of the code churn); so I think for
> most objtypes the ProcessUtilitySlow stanza would look like this:
> 
>   case T_AlterTSConfigurationStmt:
>   objectId = 
> AlterTSConfiguration((AlterTSConfigurationStmt *) parsetree);
>   objectType = OBJECT_TSCONFIGURATION;
>   break;
> 
> For ExecRenameStmt and CommentObject (and probably other cases such as
> security labels) the stanza in ProcessUtilitySlow would be simpler:
> 
>   case T_CommentStmt:
>   address = CommentObject((CommentStmt *) 
> parsetree);
>   break;
> 
> and at the bottom of the loop we would transform the objid/type into
> address for the cases that need it:
> 
>   if (!commandStashed)
>   {
>   if (objectId != InvalidOid)
>   {
>   address.classId = 
> get_objtype_catalog_oid(objectType);
>   address.objectId = objectId;
>   address.objectSubId = 0;
>   }
>   EventTriggerStashCommand(address, secondaryOid, 
> parsetree);
>   }

That'd be fine with me, though for my 2c, I wouldn't object to changing
them all to return ObjectAddress either.  I agree that it'd cause a fair
bit of code churn to do so, but there's a fair bit of code churn
happening here anyway (looking at what 0008 does to ProcessUtilitySlow
anyway).

> > > Yes, I will push these unless somebody objects soon, as they seem
> > > perfectly reasonable to me.  The only troubling thing is that there is
> > > no regression test for this kind of thing in event triggers (i.e. verify
> > > which command tags get support and which don't), which seems odd to me.
> > > Not these patches's fault, though, so I'm not considering adding any ATM.
> > 
> > Ugh.  I dislike that when we say an event trigger will fire before
> > 'GRANT' what we really mean is "GRANT when it's operating against a
> > local object".  At the minimum we absolutely need to be very clear in
> > the documentation about that limitation.  Perhaps there is something
> > already which reflects that, but I don't see anything in the patch
> > being added about that.
> 
> Hmm, good point, will give this some thought.  I'm thinking perhaps we
> can add a table of which object types are supported for generic commands
> such as GRANT, COMMENT and SECURITY LABEL.

That sounds like a good idea.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera

Stephen,

Stephen Frost wrote:

> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

> > Patch 0002 I think is good to go as well, AFAICT (have the various
> > RENAME commands return the OID and attnum of affected objects).
> 
> It's not a huge complaint, but it feels a bit awkward to me that
> ExecRenameStmt is now returning one item and using an out variable for
> the other when the two really go together (Oid and Object Sub ID, that
> is).  Further, the comment above ExecRenameStmt should make it clear
> that it's safe to pass NULL into objsubid if you don't care about it.
> 
> The same probably goes for the COMMENT bits.

Hmm, while I agree that it's a relatively minor point, it seems a fair
one.  I think we could handle this by returning ObjectAddress rather
than Oid in ExecRenameStmt() and CommentObject(); then you have all the
bits you need in a single place.  Furthermore, the function in another
patch EventTriggerStashCommand() instead of getting separately (ObjType,
objectId, objectSubId) could take a single argument of type
ObjectAddress.

Now, we probably don't want to hack *all* the utility commands to return
ObjectAddress instead of OID, because it many cases that's just not
going to be convenient (not to speak of the code churn); so I think for
most objtypes the ProcessUtilitySlow stanza would look like this:

case T_AlterTSConfigurationStmt:
objectId = 
AlterTSConfiguration((AlterTSConfigurationStmt *) parsetree);
objectType = OBJECT_TSCONFIGURATION;
break;

For ExecRenameStmt and CommentObject (and probably other cases such as
security labels) the stanza in ProcessUtilitySlow would be simpler:

case T_CommentStmt:
address = CommentObject((CommentStmt *) 
parsetree);
break;

and at the bottom of the loop we would transform the objid/type into
address for the cases that need it:

if (!commandStashed)
{
if (objectId != InvalidOid)
{
address.classId = 
get_objtype_catalog_oid(objectType);
address.objectId = objectId;
address.objectSubId = 0;
}
EventTriggerStashCommand(address, secondaryOid, 
parsetree);
}

> > On 0006 (which is about having ALTER TABLE return the OID/attnum of the
> > affected object on each subcommand), I have a problem about the ALTER
> > TABLE ALTER COLUMN SET DATA TYPE USING subcommand.  The problem with
> > that is that in order to fully deparse that subcommand we need to
> > deparse the expression of the USING clause.  But in the parse node we
> > only have info about the untransformed expression, so it is not possible
> > to pass it through ruleutils directly; it needs to be run by
> > transformExpr() first.
> 
> I agree- I'm pretty sure we definitely don't want to run through
> transformExpr again in the deparse code.  I'm not a huge fan of adding a
> Node* output parameter, but I havn't got any other great ideas about how
> to address that.

Yeah, my thoughts exactly :-(

> > > I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
> > > independently as well, but there previously have been raised some
> > > concerns about shared objects. I think the answer in the patches which
> > > is to raise events when affecting database local objects makes sense,
> > > but others might disagree.
> > 
> > Yes, I will push these unless somebody objects soon, as they seem
> > perfectly reasonable to me.  The only troubling thing is that there is
> > no regression test for this kind of thing in event triggers (i.e. verify
> > which command tags get support and which don't), which seems odd to me.
> > Not these patches's fault, though, so I'm not considering adding any ATM.
> 
> Ugh.  I dislike that when we say an event trigger will fire before
> 'GRANT' what we really mean is "GRANT when it's operating against a
> local object".  At the minimum we absolutely need to be very clear in
> the documentation about that limitation.  Perhaps there is something
> already which reflects that, but I don't see anything in the patch
> being added about that.

Hmm, good point, will give this some thought.  I'm thinking perhaps we
can add a table of which object types are supported for generic commands
such as GRANT, COMMENT and SECURITY LABEL.

> Still looking at the rest of the patches.

Great, thanks -- and thanks for the review so far.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Andres Freund wrote:
> > On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote:
> > I think you should just go ahead and commit 0001, 0002, 0006. Those have
> > previously been discussed and seem like a good idea and uncontroversial
> > even if the rest of the series doesn't go in.
> 
> I have pushed 0001 (changed pg_identify_object for opclasses and
> opfamilies to use USING instead of FOR) to master only, and backpatched
> a fix for pg_conversion object identities, which were missing
> schema-qualification.

That looked fine to me.

> Patch 0002 I think is good to go as well, AFAICT (have the various
> RENAME commands return the OID and attnum of affected objects).

It's not a huge complaint, but it feels a bit awkward to me that
ExecRenameStmt is now returning one item and using an out variable for
the other when the two really go together (Oid and Object Sub ID, that
is).  Further, the comment above ExecRenameStmt should make it clear
that it's safe to pass NULL into objsubid if you don't care about it.

The same probably goes for the COMMENT bits.

> On 0006 (which is about having ALTER TABLE return the OID/attnum of the
> affected object on each subcommand), I have a problem about the ALTER
> TABLE ALTER COLUMN SET DATA TYPE USING subcommand.  The problem with
> that is that in order to fully deparse that subcommand we need to
> deparse the expression of the USING clause.  But in the parse node we
> only have info about the untransformed expression, so it is not possible
> to pass it through ruleutils directly; it needs to be run by
> transformExpr() first.  Doing that in the deparse code seems the wrong
> solution, so I am toying with the idea of adding a "Node *" output
> parameter for some ATExec* routines; the Prep step would insert the
> transformed expression there, so that it is available at the deparse
> stage.  As far as I know, only the SET DATA TYPE USING form has this
> issue: for other subcommands, the current code is simply grabbing the
> expression from catalogs.  Maybe it would be good to use that Node in
> those cases as well and avoid catalog lookups, not sure.

I agree- I'm pretty sure we definitely don't want to run through
transformExpr again in the deparse code.  I'm not a huge fan of adding a
Node* output parameter, but I havn't got any other great ideas about how
to address that.

> > I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
> > independently as well, but there previously have been raised some
> > concerns about shared objects. I think the answer in the patches which
> > is to raise events when affecting database local objects makes sense,
> > but others might disagree.
> 
> Yes, I will push these unless somebody objects soon, as they seem
> perfectly reasonable to me.  The only troubling thing is that there is
> no regression test for this kind of thing in event triggers (i.e. verify
> which command tags get support and which don't), which seems odd to me.
> Not these patches's fault, though, so I'm not considering adding any ATM.

Ugh.  I dislike that when we say an event trigger will fire before
'GRANT' what we really mean is "GRANT when it's operating against a
local object".  At the minimum we absolutely need to be very clear in
the documentation about that limitation.  Perhaps there is something
already which reflects that, but I don't see anything in the patch
being added about that.

Still looking at the rest of the patches.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote:
> > This is a repost of the patch to add CREATE command deparsing support to
> > event triggers.  It now supports not only CREATE but also ALTER and
> > other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL.
> > This patch series is broken up in a rather large number of patches, but
> > my intention would be to commit separately only the first 6 patches; the
> > remaining parts are split up only so that it's easy to see how deparsing
> > each patch is implemented, and would be committed as a single changeset
> > (but before that I need a bunch of extra fixes and additions to other
> > parts.)
> 
> I think you should just go ahead and commit 0001, 0002, 0006. Those have
> previously been discussed and seem like a good idea and uncontroversial
> even if the rest of the series doesn't go in.

I have pushed 0001 (changed pg_identify_object for opclasses and
opfamilies to use USING instead of FOR) to master only, and backpatched
a fix for pg_conversion object identities, which were missing
schema-qualification.

Patch 0002 I think is good to go as well, AFAICT (have the various
RENAME commands return the OID and attnum of affected objects).

On 0006 (which is about having ALTER TABLE return the OID/attnum of the
affected object on each subcommand), I have a problem about the ALTER
TABLE ALTER COLUMN SET DATA TYPE USING subcommand.  The problem with
that is that in order to fully deparse that subcommand we need to
deparse the expression of the USING clause.  But in the parse node we
only have info about the untransformed expression, so it is not possible
to pass it through ruleutils directly; it needs to be run by
transformExpr() first.  Doing that in the deparse code seems the wrong
solution, so I am toying with the idea of adding a "Node *" output
parameter for some ATExec* routines; the Prep step would insert the
transformed expression there, so that it is available at the deparse
stage.  As far as I know, only the SET DATA TYPE USING form has this
issue: for other subcommands, the current code is simply grabbing the
expression from catalogs.  Maybe it would be good to use that Node in
those cases as well and avoid catalog lookups, not sure.

> I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
> independently as well, but there previously have been raised some
> concerns about shared objects. I think the answer in the patches which
> is to raise events when affecting database local objects makes sense,
> but others might disagree.

Yes, I will push these unless somebody objects soon, as they seem
perfectly reasonable to me.  The only troubling thing is that there is
no regression test for this kind of thing in event triggers (i.e. verify
which command tags get support and which don't), which seems odd to me.
Not these patches's fault, though, so I'm not considering adding any ATM.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > I've started taking a look at this as the pgaudit bits depend on it and
> > noticed that the patch set implies there's 42 patches, but there were
> > only 37 attached..?
> 
> Ah, I didn't realize when I posted that the subject was counting those
> extra patches.  They are later patches that add the testing framework,
> but since the tests don't pass currently, they are not usable yet.
> Mostly they are about the running deparse_init.sql file that I posted
> separately.  I will post a real patch for that stuff later today to make
> it clear what it is that we're talking about.

Oh, ok, no problem, just wanted to make sure things weren't missing.

> FWIW, one of Robert's main objections is that future hackers will forget
> to add deparse support for new commands as they are added.  In an
> attempt to get this sorted out, I have modified the stuff in
> ProcessUtilitySlow() so that instead of having one
> EventTriggerStashCommand call for each node type, there is only one call
> at the end of the function.  That way, new cases in the big switch there
> will automatically get something added to the stash, which should make
> the test fail appropriately.  (Things like adding a new clause to
> existing commands will be tested by running pg_dump on the databases and
> comparing.)

Right, I've been following the discussion and fully agree with Robert
that we really need a way to make sure that future work doesn't forget
to address deparseing (or the various other bits, object classes, etc,
really).  The approach you've outlined sounds interesting but I haven't
yet gotten to that bit of the code.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Alvaro Herrera

Stephen Frost wrote:

> I've started taking a look at this as the pgaudit bits depend on it and
> noticed that the patch set implies there's 42 patches, but there were
> only 37 attached..?

Ah, I didn't realize when I posted that the subject was counting those
extra patches.  They are later patches that add the testing framework,
but since the tests don't pass currently, they are not usable yet.
Mostly they are about the running deparse_init.sql file that I posted
separately.  I will post a real patch for that stuff later today to make
it clear what it is that we're talking about.

FWIW, one of Robert's main objections is that future hackers will forget
to add deparse support for new commands as they are added.  In an
attempt to get this sorted out, I have modified the stuff in
ProcessUtilitySlow() so that instead of having one
EventTriggerStashCommand call for each node type, there is only one call
at the end of the function.  That way, new cases in the big switch there
will automatically get something added to the stash, which should make
the test fail appropriately.  (Things like adding a new clause to
existing commands will be tested by running pg_dump on the databases and
comparing.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> This is a repost of the patch to add CREATE command deparsing support to
> event triggers.  It now supports not only CREATE but also ALTER and
> other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL.
> This patch series is broken up in a rather large number of patches, but
> my intention would be to commit separately only the first 6 patches; the
> remaining parts are split up only so that it's easy to see how deparsing
> each patch is implemented, and would be committed as a single changeset
> (but before that I need a bunch of extra fixes and additions to other
> parts.)

I've started taking a look at this as the pgaudit bits depend on it and
noticed that the patch set implies there's 42 patches, but there were
only 37 attached..?

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Stephen Frost

Etsuro,

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:
> On 2015/02/18 7:44, Stephen Frost wrote:
> * The patch applies to the latest head, 'make' passes successfully,
> but 'make check' fails in the rowsecurity test.

Here's the patch against master.  I'm still fiddling with the comment
wording and the commit message a bit, but barring objections these
patches are what I'm planning to move forward with.

Thanks!

Stephen
From ea3713a8b648459d3024d331ef0374f6c9622247 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 17 Feb 2015 15:43:33 -0500
Subject: [PATCH] Add locking clause for SB views for update/delete

In expand_security_qual(), we were handling locking correctly when a
PlanRowMark existed, but not when we were working with the target
relation (which doesn't have any PlanRowMarks, but the subquery created
for the security barrier quals still needs to lock the rows under it).

Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
properly issuing a SELECT ... FOR UPDATE to the remote side under a
DELETE.

Back-patch to 9.4 where updatable security barrier views were
introduced.
---
 src/backend/optimizer/prep/prepsecurity.c |  24 ++-
 src/test/regress/expected/rowsecurity.out |  64 ---
 src/test/regress/expected/updatable_views.out | 264 ++
 3 files changed, 199 insertions(+), 153 deletions(-)

diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
index af3ee61..ce7b203 100644
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -37,7 +37,7 @@ typedef struct
 } security_barrier_replace_vars_context;
 
 static void expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
-	 RangeTblEntry *rte, Node *qual);
+	 RangeTblEntry *rte, Node *qual, bool targetRelation);
 
 static void security_barrier_replace_vars(Node *node,
 			  security_barrier_replace_vars_context *context);
@@ -63,6 +63,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 	Query	   *parse = root->parse;
 	int			rt_index;
 	ListCell   *cell;
+	bool		targetRelation = false;
 
 	/*
 	 * Process each RTE in the rtable list.
@@ -98,6 +99,12 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 		{
 			RangeTblEntry *newrte = copyObject(rte);
 
+			/*
+			 * We need to let expand_security_qual know if this is the target
+			 * relation, as it has additional work to do in that case.
+			 */
+			targetRelation = true;
+
 			parse->rtable = lappend(parse->rtable, newrte);
 			parse->resultRelation = list_length(parse->rtable);
 
@@ -147,7 +154,8 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 			rte->securityQuals = list_delete_first(rte->securityQuals);
 
 			ChangeVarNodes(qual, rt_index, 1, 0);
-			expand_security_qual(root, tlist, rt_index, rte, qual);
+			expand_security_qual(root, tlist, rt_index, rte, qual,
+ targetRelation);
 		}
 	}
 }
@@ -160,7 +168,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
  */
 static void
 expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
-	 RangeTblEntry *rte, Node *qual)
+	 RangeTblEntry *rte, Node *qual, bool targetRelation)
 {
 	Query	   *parse = root->parse;
 	Oid			relid = rte->relid;
@@ -256,6 +264,16 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
 			}
 
 			/*
+			 * We need to handle the case where this is the target relation
+			 * explicitly since it won't have any row marks, because we still
+			 * need to lock the records coming back from the with-security-quals
+			 * subquery.  This might not appear obivous, but it matches what
+			 * we're doing above and keeps FDWs happy too.
+			 */
+			if (targetRelation)
+applyLockingClause(subquery, 1, LCS_FORUPDATE,
+   LockWaitBlock, false);
+			/*
 			 * Replace any variables in the outer query that refer to the
 			 * original relation RTE with references to columns that we will
 			 * expose in the new subquery, building the subquery's targetlist
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 21817d8..f41bef1 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1034,22 +1034,25 @@ EXPLAIN (COSTS OFF) EXECUTE p2(2);
 --
 SET SESSION AUTHORIZATION rls_regress_user1;
 EXPLAIN (COSTS OFF) UPDATE t1 SET b = b || b WHERE f_leak(b);
- QUERY PLAN  
--
+QUERY PLAN 
+---
  Update on t1 t1_3
->  Subquery Scan on t1
  Filter: f_leak(t1.b)
- ->  Seq Scan on t1 t1_4
-   Filter: ((a % 2) = 0)
+ ->  LockRows
+   ->  Seq Scan on t1 t1_4
+ Filter: ((a % 2) = 0)
->  Subquery Scan on t1_1
  Filter: f_leak(t1_1.b)
- ->  Seq Scan on t2
-

Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-18 Thread Magnus Hagander

On Tue, Feb 17, 2015 at 12:34 AM, Michael Paquier  wrote:

> On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai  wrote:
> > 2015-02-16 12:25 GMT+09:00 Michael Paquier :
> >>
> >>
> >> On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier
> >>  wrote:
> >>>
> >>> In order to move on to the next CF, I am going to go through all the
> >>> remaining patches and update their status accordingly. And sorry for
> >>> slacking a bit regarding that. If you wish to complain, of course feel
> >>> free to post messages on this thread or on the thread related to the
> >>> patch itself.
> >>
> >>
> >> As all the entries have been either moved or updated, CF 2014-12 is
> closed,
> >> and CF 2015-02 has been changed to "In Progress".
> >>
> > I tried to add my name on the CF entry for the "Join pushdown support
> > for foreign tables" patch, however, it was unintentionally marked as
> > rejected on the last CF, even though it should be "returned with
> feedback".
> >
> > https://commitfest.postgresql.org/3/20/
> >
> > Is it available to move it to the current CF?
>
> I don't recall you can...
>

You're correct, you can't. It would probably make sense for at least a CF
manager to be able to do that - added to my TODO.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-18 Thread Magnus Hagander

On Wed, Feb 18, 2015 at 12:34 AM, Michael Paquier  wrote:

> On Wed, Feb 18, 2015 at 5:55 AM, Corey Huinker 
> wrote:
> > What is required to get the New Patch superpower? I'm also in need of it.
>
> CCing Magnus... I am not sure myself, but I would imagine that the
> commit fest needs to be "Open" and not "In Process" to be able to add
> patches.
>

Sorry, I was out on a long travel day with lots of airplane time. I had
this thread on my watch list but hadn't time to fix it.

So the basic reason is that there was no Open commitfest available. Only
Commitfest Managers are allowed to "break the rules" and add patches to the
wrong commitfest - this is why Michael was able to do that, and pretty much
nobody else.

There's supposed to always be one open commitfest. Since none was added
when 2015-02 was switched to In Progress (probably because I didn't
actually tell Michael about it), there was nowhere to post new patches.

I have fixed this now by creating 2015-06, which is open for submissions.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] deparsing utility commands

2015-02-18 Thread Andres Freund

Hi,

On 2015-02-15 01:48:15 -0300, Alvaro Herrera wrote:
> This is a repost of the patch to add CREATE command deparsing support to
> event triggers.  It now supports not only CREATE but also ALTER and
> other commands such as GRANT/REVOKE, COMMENT ON and SECURITY LABEL.
> This patch series is broken up in a rather large number of patches, but
> my intention would be to commit separately only the first 6 patches; the
> remaining parts are split up only so that it's easy to see how deparsing
> each patch is implemented, and would be committed as a single changeset
> (but before that I need a bunch of extra fixes and additions to other
> parts.)

I think you should just go ahead and commit 0001, 0002, 0006. Those have
previously been discussed and seem like a good idea and uncontroversial
even if the rest of the series doesn't go in.

I think the GRANT/REVOKE, COMMENT ON and SECURITY LABEL changes are good
independently as well, but there previously have been raised some
concerns about shared objects. I think the answer in the patches which
is to raise events when affecting database local objects makes sense,
but others might disagree.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread David Steele

Hi Fujii,

Thanks for taking a look at the patch.  Comments below:

On 2/18/15 6:11 AM, Fujii Masao wrote:
> On Wed, Feb 18, 2015 at 1:26 AM, David Steele  wrote:
>> On 2/17/15 10:23 AM, Simon Riggs wrote:
>>> I vote to include pgaudit in 9.5, albeit with any changes. In
>>> particular, David may have some changes to recommend, but I haven't
>>> seen a spec or a patch, just a new version of code (which isn't how we
>>> do things...).
>>
>> I submitted the new patch in my name under a separate thread "Auditing
>> extension for PostgreSQL (Take 2)" (54e005cc.1060...@pgmasters.net)
> 
> I played the patch version of pg_audit a bit and have basic comments about
> its spec.
> 
> The pg_audit doesn't log BIND parameter values when prepared statement is 
> used.
> Seems this is an oversight of the patch. Or is this intentional?

It's actually intentional - following the model I talked about in my
earlier emails, the idea is to log statements only.  This also follows
on 2ndQuadrant's implementation.

Logging values is interesting, but I'm sure the user would want to
specify which columns to log, which I felt was beyond the scope of the
patch.

> The pg_audit cannot log the statement like "SELECT 1" which doesn't access to
> the database object. Is this intentional? I think that there are many users 
> who
> want to audit even such statement.

I think I see how to make this work.  I'll work on it for the next
version of the patch.

> 
> Imagine the case where you call the user-defined function which executes
> many nested statements. In this case, pg_audit logs only top-level statement
> (i.e., issued directly by client) every time nested statement is executed.
> In fact, one call of such UDF can cause lots of *same* log messages. I think
> this is problematic.

I agree - not sure how to go about addressing it, though.  I've tried to
cut down on the verbosity of the logging in general, but of course it
can still be a problem.

Using security definer and a different logging GUC for the defining role
might work.  I'll add that to my unit tests and see what happens.

I appreciate your feedback!

-- 
- David Steele
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-02-18 Thread Andrew Dunstan



On 02/17/2015 08:21 PM, Peter Eisentraut wrote:

On 1/20/15 6:32 PM, David G Johnston wrote:

In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether "future" queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.

I think we should be calculating the population variance.  We are
clearly taking the population to be all past queries (from the last
reset point).  Otherwise calculating min and max wouldn't make sense.




The difference is likely to be very small in any case where you actually 
want to examine the standard deviation, so I feel we're rather arguing 
about how many angels can fit on the head of a pin, but if this is the 
consensus I'll change it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-02-18 Thread David Steele

On 2/18/15 8:25 AM, Simon Riggs wrote:
> On 15 February 2015 at 02:34, David Steele  wrote:
> 
>> I've posted a couple of messages over the last few weeks about the work
>> I've been doing on the pg_audit extension.  The lack of response could
>> be due to either universal acclaim or complete apathy, but in any case I
>> think this is a very important topic so I want to give it another try.
> 
> You mentioned you had been following the thread for some time and yet
> had not contributed to it. Did that indicate your acclaim for the
> earlier patch, or was that apathy? I think neither.

In my case it actually was acclaim.  I was happy with the direction
things were going and had nothing in particular to add - and I didn't
think a +1 from me was going to carry any weight with the community.

I can see now that everyone's opinion matters here, so I'll be more
active about weighing in when I think something is valuable.

> 
> People have been working on this feature for >9 months now, so you
> having to wait 9 days for a response is neither universal acclaim, nor
> apathy. I've waited much longer than that for Stephen to give the
> review he promised, but have not bad mouthed him for that wait, nor do
> I do so now. In your first post you had removed the author's email
> addresses, so they were likely unaware of your post. I certainly was.

I understand that, but with the CF closing I felt like I had to act.
Abhijit's last comment on the thread was that he was no longer going to
work on it in relation to 9.5.  I felt that it was an important feature
(and one that I have a lot of interest in), so that's when I got involved.

I posted two messages, but I only addressed one of them directly to
Abhijit.  As you said, I'm new here and I'm still getting used to the
way things are done.

>> I've extensively reworked the code that was originally submitted by
>> 2ndQuandrant.  This is not an indictment of their work, but rather an
>> attempt to redress concerns that were expressed by members of the
>> community.  I've used core functions to determine how audit events
>> should be classified and simplified and tightened the code wherever
>> possible.  I've removed deparse and event triggers and opted for methods
>> that rely only on existing hooks.  In my last message I provided
>> numerous examples of configuration, usage, and output which I hoped
>> would alleviate concerns of complexity.  I've also written a ton of unit
>> tests to make sure that the code works as expected.
> 
> Some people that have contributed ideas to this patch are from
> 2ndQuadrant, some are not. The main point is that we work together on
> things, rather than writing a slightly altered version and then
> claiming credit.
> 
> If you want to help, please do. We give credit where its due, not to
> whoever touched the code last in some kind of bidding war. If we let
> this happen, we'd generate a flood of confusing patch versions and
> little would ever get committed.

Agreed, and I apologize if I came off that way.  It certainly wasn't my
intention.  I was hesitant because I had made so many changes and I
wasn't sure how the authors would feel about it.  I wrote to them
privately to get their take on the situation.

> Let's keep to one thread and work to include everybody's ideas then
> we'll get something useful committed.

I'm a little confused about how to proceed here.  I created a new thread
because the other patch had already been rejected.  How should I handle
that?

-- 
- David Steele
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-18 Thread Sawada Masahiko

>   VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)]
> | VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)]
> | VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})]
>
> REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)]
>
> EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] 
>

I don't think "(OptionName [bool])" style like "(VERBOSE on, FORCE
on)" is needed for REINDEX command.
EXPLAIN command has such option style because it has the FORMAT option
can have value excepting ON/TRUE or OFF/FALSE.(e.g., TEXT, XML)
But the value of REINDEX command option can have only ON or OFF.
I think the option name is good enough.

Next, regarding of the location of such option, the several
maintenance command like CLUSTER, VACUUM has option at immediately
after command name.
>From consistency perspective,  I tend to agree with Robert to put
option at immediately after command name as follows.
REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name;

Btw how long will the FORCE command available?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Tom Lane

Andres Freund  writes:
> On 2015-02-16 21:34:57 -0500, Tom Lane wrote:
>> Also, if we want to insist that these fields be accessed
>> through heap_getattr, I'd be inclined to put them inside the "#ifdef
>> CATALOG_VARLEN" to enforce that.

> That we definitely should do. It's imo just a small bug that it was
> omitted here. I'll fix it, but not backpatch unless you prefer?

Agreed, that's really independent of FLEXIBLE_ARRAY_MEMBER, so fix
it as its own patch.  I also agree that back-patching is probably
not appropriate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Alvaro Herrera

Stephen Frost wrote:
> Abhijit,
> 
> * Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:

> > I'm confused and unhappy about your characterisation of the state of
> > this patch. You make it seem as though there was broad consensus about
> > the changes that were needed, and that I left the patch sitting in the
> > archives for a long time without addressing important issues.
> 
> The specific issue which I was referring to there was the #ifdef's for
> the deparse branch.  I thought it was pretty clear, based on the
> feedback from multiple people, that all of that really needed to be
> taken out as we don't commit code to the main repo which has such
> external dependencies.

We can remove the #ifdef lines as soon as DDL deparse gets committed, of
course.  There is no external dependency here, only a dependency on a
patch that's being submitted in parallel.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-18 Thread Syed, Rahila

Hello,

>I think we should change the xlog format so that the block_id (which currently 
>is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but 
>something like XLR_CHUNK_ID. Which is used as is for 
>XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to 
>>XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... 
>The BKP blocks will then follow, storing the block id following the chunk id.

>Yes, that'll increase the amount of data for a backup block by 1 byte, but I 
>think that's worth it. I'm pretty sure we will be happy about the added 
>extensibility pretty soon.

To clarify my understanding of the above change,

Instead of a block id to reference different fragments of an xlog record , a 
single byte field "chunk_id"  should be used.  chunk_id  will be same as 
XLR_BLOCK_ID_DATA_SHORT/LONG for main data fragments. 
But for block references, it will take store following values in order to store 
information about the backup blocks.
#define XLR_CHUNK_BKP_COMPRESSED  0x01  
#define XLR_CHUNK_BKP_WITH_HOLE 0x02
...

The new xlog format should look like follows,

Fixed-size header (XLogRecord struct)
Chunk_id(add a field before id field in XLogRecordBlockHeader struct)
XLogRecordBlockHeader 
Chunk_id
 XLogRecordBlockHeader 
...
...
Chunk_id ( rename id field of the XLogRecordDataHeader struct) 
XLogRecordDataHeader[Short|Long] 
 block data
 block data
 ...
 main data

I will post a patch based on this.

Thank you,
Rahila Syed

-Original Message-
From: Andres Freund [mailto:and...@2ndquadrant.com] 
Sent: Monday, February 16, 2015 5:26 PM
To: Syed, Rahila
Cc: Michael Paquier; Fujii Masao; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On 2015-02-16 11:30:20 +, Syed, Rahila wrote:
> - * As a trivial form of data compression, the XLOG code is aware that
> - * PG data pages usually contain an unused "hole" in the middle, 
> which
> - * contains only zero bytes.  If hole_length > 0 then we have removed
> - * such a "hole" from the stored data (and it's not counted in the
> - * XLOG record's CRC, either).  Hence, the amount of block data 
> actually
> - * present is BLCKSZ - hole_length bytes.
> + * Block images are able to do several types of compression:
> + * - When wal_compression is off, as a trivial form of compression, 
> + the
> + * XLOG code is aware that PG data pages usually contain an unused "hole"
> + * in the middle, which contains only zero bytes.  If length < BLCKSZ
> + * then we have removed such a "hole" from the stored data (and it is
> + * not counted in the XLOG record's CRC, either).  Hence, the amount
> + * of block data actually present is "length" bytes.  The hole "offset"
> + * on page is defined using "hole_offset".
> + * - When wal_compression is on, block images are compressed using a
> + * compression algorithm without their hole to improve compression
> + * process of the page. "length" corresponds in this case to the 
> + length
> + * of the compressed block. "hole_offset" is the hole offset of the 
> + page,
> + * and the length of the uncompressed block is defined by 
> + "raw_length",
> + * whose data is included in the record only when compression is 
> + enabled
> + * and "with_hole" is set to true, see below.
> + *
> + * "is_compressed" is used to identify if a given block image is 
> + compressed
> + * or not. Maximum page size allowed on the system being 32k, the 
> + hole
> + * offset cannot be more than 15-bit long so the last free bit is 
> + used to
> + * store the compression state of block image. If the maximum page 
> + size
> + * allowed is increased to a value higher than that, we should 
> + consider
> + * increasing this structure size as well, but this would increase 
> + the
> + * length of block header in WAL records with alignment.
> + *
> + * "with_hole" is used to identify the presence of a hole in a block image.
> + * As the length of a block cannot be more than 15-bit long, the 
> + extra bit in
> + * the length field is used for this identification purpose. If the 
> + block image
> + * has no hole, it is ensured that the raw size of a compressed block 
> + image is
> + * equal to BLCKSZ, hence the contents of 
> + XLogRecordBlockImageCompressionInfo
> + * are not necessary.
>   */
>  typedef struct XLogRecordBlockImageHeader  {
> - uint16  hole_offset;/* number of bytes before "hole" */
> - uint16  hole_length;/* number of bytes in "hole" */
> + uint16  length:15,  /* length of block data in 
> record */
> + with_hole:1;/* status of hole in the block 
> */
> +
> + uint16  hole_offset:15, /* number of bytes before "hole" */
> + is_compressed:1;/* compression status of image */
> +
> + /* Followed by the data related to compression if block is 
> +compressed */
>  } XLogR

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Stephen Frost

Abhijit,

* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
> At 2015-02-17 13:01:46 -0500, sfr...@snowman.net wrote:
> > I have to admit that I'm confused by this.  Patches don't stabilise
> > through sitting in the archives, they stabilise when the comments are
> > being addressed, the patch updated, and further comments are
> > addressing less important issues.  The issues which Robert and I had
> > both commented on didn't appear to be getting addressed.
> 
> I'm confused and unhappy about your characterisation of the state of
> this patch. You make it seem as though there was broad consensus about
> the changes that were needed, and that I left the patch sitting in the
> archives for a long time without addressing important issues.

The specific issue which I was referring to there was the #ifdef's for
the deparse branch.  I thought it was pretty clear, based on the
feedback from multiple people, that all of that really needed to be
taken out as we don't commit code to the main repo which has such
external dependencies.  That wasn't meant as a characterization of the
patch itself but rather a comment on the state of the process and that
I, at least, had been hoping for a new version which addressed those
bits.

> You revised and refined your GRANT proposal in stages, and I tried to
> adapt the code to suit. I'm sorry that my efforts were not fast enough 
> or responsive enough to make you feel that progress was being made. But
> nobody else commented in detail on the GRANT changes except to express
> general misgivings, and nobody else even disagreed when I inferred, in
> light of the lack of consensus that Robert pointed out, that the code
> was unlikely to make it into 9.5.

Progress was being made and I certainly appreciate your efforts.  I
don't think anyone would want to stand up and say it's going to be in
9.5 or not be in 9.5 which is likely why you didn't get any reaction
from your comment about it being unlikely.  I'm certainly hopeful that
something gets in along these lines as it's a pretty big capability that
we're currently missing.

> Given that I've maintained the code over the past year despite its being
> rejected in an earlier CF, and given the circumstances outlined above, I
> do not think it's reasonable to conclude after a couple of weeks without
> a new version that it was abandoned. As I had mentioned earlier, there
> are people who already use pgaudit as-is, and complain if I break it.

For my part, at least, I didn't intend to characterize it as abandoned
but rather that it simply didn't seem to be moving forward, perhaps due
to a lack of time to work on it or other priorities; I didn't mean to
imply that you wouldn't be continuing to maintain it.  As for the
comment about people depending on it, I'm not sure what that's getting
at, but I don't think that's really a consideration for us as it relates
to having a contrib module to provide this capability.  We certainly
want to consider what capabilities users are looking for and make sure
we cover those cases, if possible, but this is at a development stage
with regard to core and therefore things may break for existing users.

If that's an issue for your users then it would probably be good to have
a clean distinction between the stable code you're providing to them for
auditing and what's being submitted for inclusion in core, with an
expectation that users will need to make some changes when they switch
to the version which is included in core.

> Anyway, I guess there is no such thing as a constructive history
> discussion, so I'll drop it.

It's not my intent to continue this as I certainly agree that it seems
unlikely to be a constructive use of time, but I wanted to reply and
clarify that it wasn't my intent to characterize pgaudit as abandoned
and to apologize for my comments coming across that way.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] postgres_fdw foreign keys with default sequence

2015-02-18 Thread Kevin Grittner

Tim Kane  wrote:

> CREATE MATERIALISED VIEW local.devices;
>
> CREATE test_table (device_id bigint FOREIGN KEY (device_id) REFERENCES 
> local.devices (device_id) );
>
> ERROR:  referenced relation "devices" is not a table

In the future, please show code that you have actually run.  In
this case it's pretty easy to know how to answer, but in others it
may really draw out the process of helping you.

At this time materialized views do not support constraints, and may
not be referenced in foreign key definitions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-02-18 Thread Simon Riggs

On 15 February 2015 at 02:34, David Steele  wrote:

> I've posted a couple of messages over the last few weeks about the work
> I've been doing on the pg_audit extension.  The lack of response could
> be due to either universal acclaim or complete apathy, but in any case I
> think this is a very important topic so I want to give it another try.

You mentioned you had been following the thread for some time and yet
had not contributed to it. Did that indicate your acclaim for the
earlier patch, or was that apathy? I think neither.

People have been working on this feature for >9 months now, so you
having to wait 9 days for a response is neither universal acclaim, nor
apathy. I've waited much longer than that for Stephen to give the
review he promised, but have not bad mouthed him for that wait, nor do
I do so now. In your first post you had removed the author's email
addresses, so they were likely unaware of your post. I certainly was.

> I've extensively reworked the code that was originally submitted by
> 2ndQuandrant.  This is not an indictment of their work, but rather an
> attempt to redress concerns that were expressed by members of the
> community.  I've used core functions to determine how audit events
> should be classified and simplified and tightened the code wherever
> possible.  I've removed deparse and event triggers and opted for methods
> that rely only on existing hooks.  In my last message I provided
> numerous examples of configuration, usage, and output which I hoped
> would alleviate concerns of complexity.  I've also written a ton of unit
> tests to make sure that the code works as expected.

Some people that have contributed ideas to this patch are from
2ndQuadrant, some are not. The main point is that we work together on
things, rather than writing a slightly altered version and then
claiming credit.

If you want to help, please do. We give credit where its due, not to
whoever touched the code last in some kind of bidding war. If we let
this happen, we'd generate a flood of confusing patch versions and
little would ever get committed.

Let's keep to one thread and work to include everybody's ideas then
we'll get something useful committed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Andres Freund

On 2015-02-16 21:34:57 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2015-02-17 05:51:22 +0900, Michael Paquier wrote:
> >> diff --git a/src/include/catalog/pg_authid.h 
> >> b/src/include/catalog/pg_authid.h
> >> index e01e6aa..d8789a5 100644
> >> --- a/src/include/catalog/pg_authid.h
> >> +++ b/src/include/catalog/pg_authid.h
> >> @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION 
> >> BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC
> >> int32  rolconnlimit;   /* max connections allowed (-1=no 
> >> limit) */
> >> 
> >> /* remaining fields may be null; use heap_getattr to read them! */
> >> -  textrolpassword;/* password, if any */
> >> timestamptz rolvaliduntil; /* password expiration time, if any */
> >> +#ifdef CATALOG_VARLEN
> >> +  textrolpassword;/* password, if any */
> >> +#endif
> >> } FormData_pg_authid;
> 
> > That change IIRC is wrong, because it'll make rolvaliduntil until NOT
> > NULL (any column that's fixed width and has only fixed with columns
> > before it is marked as such).
> 
> You were muttering about a BKI_FORCE_NOT_NULL option ... for symmetry,
> maybe we could add BKI_FORCE_NULL as well, and then use that for cases
> like this?

Yea, I guess it'd not be too hard.

> Also, if we want to insist that these fields be accessed
> through heap_getattr, I'd be inclined to put them inside the "#ifdef
> CATALOG_VARLEN" to enforce that.

That we definitely should do. It's imo just a small bug that it was
omitted here. I'll fix it, but not backpatch unless you prefer?

> I'm generally -1 on reordering any catalog columns as part of this patch.
> There should be zero user-visible change from it IMO.  However, if we
> stick both those columns inside the ifdef, we don't need to reorder.

Agreed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Stephen Frost

Neil,

* Neil Tiffin (ne...@neiltiffin.com) wrote:
> I meant it to go to the list, but hit the wrong button.

No problem.

> > On Feb 17, 2015, at 7:01 PM, Stephen Frost  wrote:
> > I noticed that you email'd me directly on this reply.  Not sure if you
> > intended to or not, but I'm fine with my response going to the list.
> > 
> > This approach doesn't violate anything in PG and can be used with any of
> > the pgaudit approaches being discussed.  The fact that it's
> > postgresql.conf, which represents GUCs, doesn't change anything
> > regarding what you’re getting it.
> > 
> 
> It changes everything, pg superusers have complete control of files in the pg 
> data directory.  If set up correctly pg superusers have no control in /etc. 

If set up correctly, postgresql.conf is in /etc. :)  That's distribution
specific though.  However, postgresql.auto.conf ends up causing problems
since it can't be disabled and it's forced to live in the PG data
directory.  Even so though, your argument was that, as long as the
system is set up to be auditing initially and whatever action the
superuser takes to disable auditing will be audited, and disabling
auditing is against policy, then it's sufficient to meet the
requirement.  That's true in either case.

> > The issue is really around what we claim to provide with this auditing.
> > We can't claim to provide *full* superuser auditing with any of these
> > approaches since the superuser can disable auditing.  We can, however,
> > provide auditing up until that happens, which is likely to be sufficient
> > in many environments.  For those environments where full superuser
> > auditing is required, an independent system must be used.
> > 
> > Of course, it's important that any auditing mechanism which is used to
> > audit superusers be impossible for the superuser to modify after the
> > fact, meaning that syslog or similar needs to be used.
> 
> I’m still confused since you do do not differentiate between db superuser and 
> os superuser and what you mean by an independent system?

When I'm talking about 'superuser' here, it's the PG superuser, not the
OS superuser.  What I mean by independent system is a process which is
not owned by the same user that the database is running as, and isn't
owned by the user who is connecting, that facilitates the connection
from the user to PG, but which logs everything that happens on that
connection.

> With the scheme I described above, how does the db superuser disable auditing 
> without os root privileges?  If they can, then pg security is fundamentally 
> broken, and I don’t believe it is.

The DB superuser can modify the running process, through mechanisms as
simple as changing GUCs to creating functions in untrusted languages
(including C) and then using them to change or break more-or-less
anything that's happening in the backend.

> How can an independent system monitor what commands are issued inside the 
> database?

It can log everything which happens on the connection between the user
and the database because it's in the middle.

> I understand my comments do not cover what is being proposed or worked on and 
> that is fine.  But saying it does not have value because the superuser could 
> disable any system in pg, is wrong IMO.  Being able to reliability log db 
> superusers without their ability to change the logging would be a 
> fundamentally good security tool as companies become more serious about 
> internal security.  This is, and will be more, important since a lot of 
> people consider insider breaches the biggest security challenge today.

It'd be a great tool, certainly, but it's not actually possible to do
completely inside the DB process given the capabilities the PG superuser
has.  Being able to create and run functions in untrusted languages
means that the superuser can completely hijack the process, that's why
those languages are untrusted.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Parallel Seq Scan

2015-02-18 Thread Andres Freund

On 2015-02-18 16:59:26 +0530, Amit Kapila wrote:
> On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund 
> wrote:
> > A query whose runetime is dominated by a sequential scan (+ attached
> > filter) is certainly going to require a bigger prefetch size than one
> > that does other expensive stuff.
> >
> > Imagine parallelizing
> > SELECT * FROM largetable WHERE col = low_cardinality_value;
> > and
> > SELECT *
> > FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
> > WHERE col = high_cardinality_value;
> >
> > The first query will be a simple sequential and disk reads on largetable
> > will be the major cost of executing it.  In contrast the second query
> > might very well sensibly be planned as a parallel sequential scan with
> > the nested loop executing in the same worker. But the cost of the
> > sequential scan itself will likely be completely drowned out by the
> > nestloop execution - index probes are expensive/unpredictable.

> I think the work/task given to each worker should be as granular
> as possible to make it more predictable.
> I think the better way to parallelize such a work (Join query) is that
> first worker does sequential scan and filtering on large table and
> then pass it to next worker for doing join with gigantic_table.

I'm pretty sure that'll result in rather horrible performance. IPC is
rather expensive, you want to do as little of it as possible.

> > >
> > > I think it makes sense to think of a set of tasks in which workers can
> > > assist.  So you a query tree which is just one query tree, with no
> > > copies of the nodes, and then there are certain places in that query
> > > tree where a worker can jump in and assist that node.  To do that, it
> > > will have a copy of the node, but that doesn't mean that all of the
> > > stuff inside the node becomes shared data at the code level, because
> > > that would be stupid.
> >
> > My only "problem" with that description is that I think workers will
> > have to work on more than one node - it'll be entire subtrees of the
> > executor tree.

> There could be some cases where it could be beneficial for worker
> to process a sub-tree, but I think there will be more cases where
> it will just work on a part of node and send the result back to either
> master backend or another worker for further processing.

I think many parallelism projects start out that way, and then notice
that it doesn't parallelize very efficiently.

The most extreme example, but common, is aggregation over large amounts
of data - unless you want to ship huge amounts of data between processes
eto parallize it you have to do the sequential scan and the
pre-aggregate step (that e.g. selects count() and sum() to implement a
avg over all the workers) inside one worker.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Andres Freund

On 2015-02-18 17:15:18 +0900, Michael Paquier wrote:
> >> - I don't think that the t_bits fields in htup_details.h should be
> >> updated either.
> >
> > Why not? Any not broken code should already use MinHeapTupleSize and
> > similar macros.
> 
> Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
> similarly a couple of redo routines in heapam.c using
> HeapTupleHeaderData in a couple of structures not placing it at the
> end (compiler complains).

The compiler will complain if you use a FLEXIBLE_ARRAY_MEMBER in the
middle of a struct but not when when you embed a struct that uses it
into the middle another struct. At least gcc doesn't and I think it'd be
utterly broken if another compiler did that. If there's a compiler that
does so, we need to make it define FLEXIBLE_ARRAY_MEMBER to 1.

Code embedding structs using *either* [FLEXIBLE_ARRAY_MEMBER] or [1] for
variable length obviously has to take care where the variable length
part goes. And that what the structs you removed where doing - that's
where the t_bits et al go:
{
...
HeapTupleHeaderData header;
chardata[MaxHeapTupleSize];
...
}
the 'data' bit is just the t_bits + data together. And similar in
-   struct
-   {
-   struct varlena hdr;
-   chardata[TOAST_MAX_CHUNK_SIZE]; /* make struct big 
enough */
-   int32   align_it;   /* ensure struct is aligned 
well enough */
-   }   chunk_data;

The 'data' is where the varlena's vl_dat stuff is stored.
> >> Places using a variable-length variable not at the end of a structure
> >> are changed with workaround, without impacting what those features do.
> >
> > I vote for rejecting most of this, except a (corrected version) of the
> > pg_authid change. Which doesn't really belong to the rest of the patch
> > anyway ;)x
> 
> Changing bytea to use FLEXIBLE_ARRAY_MEMBER requires those changes, or
> at least some changes in this area as something with
> FLEXIBLE_ARRAY_MEMBER can only be placed at the end of a structure.

Again, I think you're confusing things that may not be be done in a
single struct, and structs that are embedded in other places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Neil Tiffin

Stephen,

I meant it to go to the list, but hit the wrong button.

> On Feb 17, 2015, at 7:01 PM, Stephen Frost  wrote:
> 
> Neil,
> 
> I noticed that you email'd me directly on this reply.  Not sure if you
> intended to or not, but I'm fine with my response going to the list.
> 
> * Neil Tiffin (ne...@neiltiffin.com) wrote:
>>> On Feb 17, 2015, at 1:10 PM, Stephen Frost  wrote:
>>> If the DB account isn't a superuser then everything changes.  There's no
>>> point fighting with the OS user- they can run some other PG binary which
>>> they've copied over locally and run SQL with that, or copy all the files
>>> over to another server which they have complete access to.  The fact
>>> that they can also connect to the DB and run SQL isn’t really an issue.
>> 
>> Thats the point. If this environment matters then the db superuser would not 
>> be an authorized os superuser (with root or even close privileges). And no, 
>> they could not be running some other binary.
>> 
>> One way to do pg superuser auditing is to utilize a file (outside of the pg 
>> data directory, which probably violates something in pg) like 
>> /etc/pg_su_audit that has os root rw and r for all others (or the equivalent 
>> for other os’s) containing a URL.  If this file is present, send all db 
>> superuser usage to the URL.  If this file is present with the wrong 
>> privileges, then don’t start pg. Done.  No configuration in pg config files, 
>> no GUCs, no nothing for the pg superuser to mess around with, not tables, no 
>> ability for any pg superuser to configure or control.  
> 
> This approach doesn't violate anything in PG and can be used with any of
> the pgaudit approaches being discussed.  The fact that it's
> postgresql.conf, which represents GUCs, doesn't change anything
> regarding what you’re getting it.
> 

It changes everything, pg superusers have complete control of files in the pg 
data directory.  If set up correctly pg superusers have no control in /etc. 

>> If they can copy or install a PG binary, then the OS auditing and security 
>> failed. PG code need not be concerned.
>> 
>> Sure someone could still break access to the URL, but again, not PG’s 
>> concern.  Other security and auditing would have responsibility to pick that 
>> up.
>> 
>> It is a really simple use case, record everything any db superuser does and 
>> send it to the audit system.  Done.  Then a designated role can control what 
>> gets audited in production.  As long as everything the db superuser does can 
>> be written to an audit log, then it no longer matters technically if the db 
>> superuser can change the rest of the auditing.  If they do and it violates 
>> policy, then when the audit picks it up, they lose their job plus depending 
>> on the environment. 
> 
> The issue is really around what we claim to provide with this auditing.
> We can't claim to provide *full* superuser auditing with any of these
> approaches since the superuser can disable auditing.  We can, however,
> provide auditing up until that happens, which is likely to be sufficient
> in many environments.  For those environments where full superuser
> auditing is required, an independent system must be used.
> 
> Of course, it's important that any auditing mechanism which is used to
> audit superusers be impossible for the superuser to modify after the
> fact, meaning that syslog or similar needs to be used.
> 

I’m still confused since you do do not differentiate between db superuser and 
os superuser and what you mean by an independent system?

With the scheme I described above, how does the db superuser disable auditing 
without os root privileges?  If they can, then pg security is fundamentally 
broken, and I don’t believe it is.

How can an independent system monitor what commands are issued inside the 
database?

I understand my comments do not cover what is being proposed or worked on and 
that is fine.  But saying it does not have value because the superuser could 
disable any system in pg, is wrong IMO.  Being able to reliability log db 
superusers without their ability to change the logging would be a fundamentally 
good security tool as companies become more serious about internal security.  
This is, and will be more, important since a lot of people consider insider 
breaches the biggest security challenge today.

Neil






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Stephen Frost

Etsuro,

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:
> On 2015/02/18 7:44, Stephen Frost wrote:
> >Attached is a patch which should address this.  Would love your (or
> >anyone else's) feedback on it.  It appears to address the issue which
> >you raised and the regression test changes are all in-line with
> >inserting a LockRows into the subquery, as anticipated.
> 
> I've looked into the patch.
> 
> * The patch applies to the latest head, 'make' passes successfully,
> but 'make check' fails in the rowsecurity test.

Apologies for not being clear- the patch was against 9.4, where it
passes all the regression tests (at least for me- if you see
differently, please let me know!).

> * I found one place in expand_security_qual that I'm concerned about:
> 
> + if (targetRelation)
> + applyLockingClause(subquery, 1, LCS_FORUPDATE,
> +false, 
> false);
> 
> ISTM that it'd be better to use LockWaitBlock as the fourth argument
> of applyLockingClause.

LockWaitBlock isn't in 9.4. :)  Otherwise, I'd agree, and it's what I
plan to do for master.

> Other than that, the patch looks good to me.

Great, thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel Seq Scan

2015-02-18 Thread Amit Kapila

On Tue, Feb 17, 2015 at 9:52 PM, Andres Freund 
wrote:
>
> On 2015-02-11 15:49:17 -0500, Robert Haas wrote:
>
> A query whose runetime is dominated by a sequential scan (+ attached
> filter) is certainly going to require a bigger prefetch size than one
> that does other expensive stuff.
>
> Imagine parallelizing
> SELECT * FROM largetable WHERE col = low_cardinality_value;
> and
> SELECT *
> FROM largetable JOIN gigantic_table ON (index_nestloop_condition)
> WHERE col = high_cardinality_value;
>
> The first query will be a simple sequential and disk reads on largetable
> will be the major cost of executing it.  In contrast the second query
> might very well sensibly be planned as a parallel sequential scan with
> the nested loop executing in the same worker. But the cost of the
> sequential scan itself will likely be completely drowned out by the
> nestloop execution - index probes are expensive/unpredictable.
>

I think the work/task given to each worker should be as granular
as possible to make it more predictable.
I think the better way to parallelize such a work (Join query) is that
first worker does sequential scan and filtering on large table and
then pass it to next worker for doing join with gigantic_table.

> >
> > I think it makes sense to think of a set of tasks in which workers can
> > assist.  So you a query tree which is just one query tree, with no
> > copies of the nodes, and then there are certain places in that query
> > tree where a worker can jump in and assist that node.  To do that, it
> > will have a copy of the node, but that doesn't mean that all of the
> > stuff inside the node becomes shared data at the code level, because
> > that would be stupid.
>
> My only "problem" with that description is that I think workers will
> have to work on more than one node - it'll be entire subtrees of the
> executor tree.
>

There could be some cases where it could be beneficial for worker
to process a sub-tree, but I think there will be more cases where
it will just work on a part of node and send the result back to either
master backend or another worker for further processing.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-02-18 Thread Fujii Masao

On Wed, Feb 18, 2015 at 1:26 AM, David Steele  wrote:
> On 2/17/15 10:23 AM, Simon Riggs wrote:
>> I vote to include pgaudit in 9.5, albeit with any changes. In
>> particular, David may have some changes to recommend, but I haven't
>> seen a spec or a patch, just a new version of code (which isn't how we
>> do things...).
>
> I submitted the new patch in my name under a separate thread "Auditing
> extension for PostgreSQL (Take 2)" (54e005cc.1060...@pgmasters.net)

I played the patch version of pg_audit a bit and have basic comments about
its spec.

The pg_audit doesn't log BIND parameter values when prepared statement is used.
Seems this is an oversight of the patch. Or is this intentional?

The pg_audit cannot log the statement like "SELECT 1" which doesn't access to
the database object. Is this intentional? I think that there are many users who
want to audit even such statement.

Imagine the case where you call the user-defined function which executes
many nested statements. In this case, pg_audit logs only top-level statement
(i.e., issued directly by client) every time nested statement is executed.
In fact, one call of such UDF can cause lots of *same* log messages. I think
this is problematic.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-18 Thread Thom Brown

On 9 February 2015 at 20:52, Thom Brown  wrote:

>
> Hi all,
>
> Google Summer of Code 2015 is approaching.  I'm intending on registering
> PostgreSQL again this year.
>
> Before I do that, I'd like to have an idea of how many people are
> interested in being either a student or a mentor.
>
> I've volunteered to be admin again, but if anyone else has a strong
> interest of seeing the projects through this year, please let yourself be
> known.
>
> Who would be willing to mentor projects this year?
>
> Project ideas are welcome, and I've copied many from last year's
> submissions into this year's wiki page.  Please feel free to add more (or
> delete any that stand no chance or are redundant):
> https://wiki.postgresql.org/wiki/GSoC_2015
>
> Students can find more information about GSoC here:
> http://www.postgresql.org/developer/summerofcode
>

I'd also like to ask whether anyone would be willing to be a backup admin
for this year's GSoC?  Please message me directly if you're interested.  I
need to know A.S.A.P though.

Thanks

Thom

Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-02-18 Thread Etsuro Fujita

On 2015/02/18 8:13, Tom Lane wrote:
> So I went back to your v1 patch and have now committed that with some
> cosmetic modifications.  Sorry for making you put time into a dead end.

I don't mind it.  Thanks for committing the patch!

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-18 Thread Etsuro Fujita


On 2015/02/18 7:44, Stephen Frost wrote:

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

On 2015/02/11 4:06, Stephen Frost wrote:

I had been trying to work out an FDW-specific way to address this, but I
think Dean's right that this should be addressed in
expand_security_qual(), which means it'll apply to all cases and not
just these FDW calls.  I don't think that's actually an issue though and
it'll match up to how SELECT FOR UPDATE is handled today.


Sorry, my explanation was not accurate, but I also agree with Dean's
idea.  In the above, I just wanted to make it clear that such a lock
request done by expand_security_qual() should be limited to the case
where the relation that is a former result relation is a foreign
table.


Attached is a patch which should address this.  Would love your (or
anyone else's) feedback on it.  It appears to address the issue which
you raised and the regression test changes are all in-line with
inserting a LockRows into the subquery, as anticipated.


I've looked into the patch.

* The patch applies to the latest head, 'make' passes successfully, but 
'make check' fails in the rowsecurity test.


* I found one place in expand_security_qual that I'm concerned about:

+   if (targetRelation)
+   applyLockingClause(subquery, 1, LCS_FORUPDATE,
+  false, 
false);

ISTM that it'd be better to use LockWaitBlock as the fourth argument of 
applyLockingClause.


Other than that, the patch looks good to me.

Thanks for the work!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-18 Thread Kyotaro HORIGUCHI

Sorry, I sent the previous mail without patches by accident. The
patches are attached to this mail.


Hello, this is the patchset v2 of this feature.

0001-Add-regrole.patch
  Adding regrole as the name says.

0002-Add-regnamespace.patch
  Adding regnamespace. This depends on 0001 patch.

0003-Check-new-reg-types-are-not-used-as-default-values.patch
  Inhibiting the new OID aliss types from being used as constants
  in default values, which make dependencies on the other
  (existing) OID alias types.

0004-Documentation-for-new-OID-alias-types.patch
  Documentation patch for this new types.

regards,

At Tue, 17 Feb 2015 20:19:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20150217.201911.239516619.horiguchi.kyot...@lab.ntt.co.jp>
> Hello, thank you for the comment.
> 
> The current patch lacks change in documentation and dependency
> stuff. Current framework doesn't consider changing pg_shdepend
> from column default expressions so the possible measures are the
> followings, I think.
> 
> 1. Make pg_shdepend to have refobjsubid and add some deptypes of
>pg_depend, specifically DEPENDENCY_NORMAL is needed. Then,
>change the dependency stuff.
> 
> 2. Simplly inhibit specifying them in default
>   expressions. Integer or OID can act as the replacement.
> 
>=# create table t1 (a int, b regrole default 'horiguti'::regrole);
>ERROR:  Type 'regrole' cannot have a default value
> 
> 1 is quite overkill but hardly seems to be usable. So I will go
> on 2 and post additional patch.
> 
> 
> At Thu, 12 Feb 2015 17:12:24 -0600, Jim Nasby  
> wrote in <54dd3358.9030...@bluetreble.com>
> > On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote:
> > > Hello, I changed the subject.
> 
> # Ouch! the subject remaines wrong..
> 
> > > 1. Expected frequency of use
> > ...
> > > For that reason, although the current policy of deciding whether
> > > to have reg* types seems to be whether they have schema-qualified
> > > names, I think regrole and regnamespace are valuable to have.
> > 
> > Perhaps schema qualification was the original intent, but I think at
> > this point everyone uses them for convenience. Why would I want to
> > type out JOIN pg_namespace n ON n.oid = blah.???namespace when I could
> > simply do ???namespace::regnamespace?
> 
> Yes, that is the most important point of this patch.
> 
> > > 2. Anticipaed un-optimizability
> > >
> > > Tom pointed that these reg* types prevents planner from
> > > optimizing the query, so we should refrain from having such
> > > machinary. It should have been a long-standing issue but reg*s
> > > sees to be rather faster than joining corresponding catalogs
> > > for moderate number of the result rows, but this raises another
> > > more annoying issue.
> > >
> > >
> > > 3. Potentially breakage of MVCC
> > >
> > > The another issue Tom pointed is potentially breakage of MVCC by
> > > using these reg* types. Syscache is invalidated on updates so
> > > this doesn't seem to be a problem on READ COMMITTED mode, but
> > > breaks SERIALIZABLE mode. But IMHO it is not so serious problem
> > > as long as such inconsistency occurs only on system catalog and
> > > it is explicitly documented togethee with the un-optimizability
> > > issue. I'll add a patch for this later.
> > 
> > The way I look at it, all these arguments went out the window when
> > regclass was introduced.
> > 
> > The reality is that reg* is *way* more convenient than manual
> > joins. The arguments about optimization and MVCC presumably apply to
> > all reg* casts, which means that ship has long since sailed.
> 
> I agree basically, but I think some caveat is needed. I'll
> include that in the coming documentation patch.
> 
> > If we offered views that made it easier to get at this data then maybe
> > this wouldn't matter so much. But DBA's don't want to join 3 catalogs
> > together to try and answer a simple question.
> 
> It has been annoying me these days.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 2a6f689afdc8197c2fe2fc235a4819ce1a5e9928 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 18 Feb 2015 14:38:32 +0900
Subject: [PATCH 1/4] Add regrole

Add new regtype regrole. For the development reason, the system OIDs
used for the new procs, types, casts are making a gap from existing
OIDs.
---
 src/backend/bootstrap/bootstrap.c |   2 +
 src/backend/utils/adt/regproc.c   | 101 ++
 src/backend/utils/adt/selfuncs.c  |   2 +
 src/backend/utils/cache/catcache.c|   1 +
 src/include/catalog/pg_cast.h |   7 +++
 src/include/catalog/pg_proc.h |  10 
 src/include/catalog/pg_type.h |   5 ++
 src/include/utils/builtins.h  |   5 ++
 src/test/regress/expected/regproc.out |  26 -
 src/test/regress/sql/regproc.sql  |   7 +++
 10 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index bc66eac..11e40ee 100644
--

Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-18 Thread Kyotaro HORIGUCHI

Hello, this is the patchset v2 of this feature.

0001-Add-regrole.patch
  Adding regrole as the name says.

0002-Add-regnamespace.patch
  Adding regnamespace. This depends on 0001 patch.

0003-Check-new-reg-types-are-not-used-as-default-values.patch
  Inhibiting the new OID aliss types from being used as constants
  in default values, which make dependencies on the other
  (existing) OID alias types.

0004-Documentation-for-new-OID-alias-types.patch
  Documentation patch for this new types.

regards,

At Tue, 17 Feb 2015 20:19:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20150217.201911.239516619.horiguchi.kyot...@lab.ntt.co.jp>
> Hello, thank you for the comment.
> 
> The current patch lacks change in documentation and dependency
> stuff. Current framework doesn't consider changing pg_shdepend
> from column default expressions so the possible measures are the
> followings, I think.
> 
> 1. Make pg_shdepend to have refobjsubid and add some deptypes of
>pg_depend, specifically DEPENDENCY_NORMAL is needed. Then,
>change the dependency stuff.
> 
> 2. Simplly inhibit specifying them in default
>   expressions. Integer or OID can act as the replacement.
> 
>=# create table t1 (a int, b regrole default 'horiguti'::regrole);
>ERROR:  Type 'regrole' cannot have a default value
> 
> 1 is quite overkill but hardly seems to be usable. So I will go
> on 2 and post additional patch.
> 
> 
> At Thu, 12 Feb 2015 17:12:24 -0600, Jim Nasby  
> wrote in <54dd3358.9030...@bluetreble.com>
> > On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote:
> > > Hello, I changed the subject.
> 
> # Ouch! the subject remaines wrong..
> 
> > > 1. Expected frequency of use
> > ...
> > > For that reason, although the current policy of deciding whether
> > > to have reg* types seems to be whether they have schema-qualified
> > > names, I think regrole and regnamespace are valuable to have.
> > 
> > Perhaps schema qualification was the original intent, but I think at
> > this point everyone uses them for convenience. Why would I want to
> > type out JOIN pg_namespace n ON n.oid = blah.???namespace when I could
> > simply do ???namespace::regnamespace?
> 
> Yes, that is the most important point of this patch.
> 
> > > 2. Anticipaed un-optimizability
> > >
> > > Tom pointed that these reg* types prevents planner from
> > > optimizing the query, so we should refrain from having such
> > > machinary. It should have been a long-standing issue but reg*s
> > > sees to be rather faster than joining corresponding catalogs
> > > for moderate number of the result rows, but this raises another
> > > more annoying issue.
> > >
> > >
> > > 3. Potentially breakage of MVCC
> > >
> > > The another issue Tom pointed is potentially breakage of MVCC by
> > > using these reg* types. Syscache is invalidated on updates so
> > > this doesn't seem to be a problem on READ COMMITTED mode, but
> > > breaks SERIALIZABLE mode. But IMHO it is not so serious problem
> > > as long as such inconsistency occurs only on system catalog and
> > > it is explicitly documented togethee with the un-optimizability
> > > issue. I'll add a patch for this later.
> > 
> > The way I look at it, all these arguments went out the window when
> > regclass was introduced.
> > 
> > The reality is that reg* is *way* more convenient than manual
> > joins. The arguments about optimization and MVCC presumably apply to
> > all reg* casts, which means that ship has long since sailed.
> 
> I agree basically, but I think some caveat is needed. I'll
> include that in the coming documentation patch.
> 
> > If we offered views that made it easier to get at this data then maybe
> > this wouldn't matter so much. But DBA's don't want to join 3 catalogs
> > together to try and answer a simple question.
> 
> It has been annoying me these days.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup may fail to send feedbacks.

2015-02-18 Thread Kyotaro HORIGUCHI

Hello, this is the last patch for pg_basebackup/pg_receivexlog on
master (9.5). Preor versions don't have this issue.

4. basebackup_reply_fix_mst_v2.patch
  receivelog.c patch applyable on master.

This is based on the same design with
walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday().

regards,

At Tue, 17 Feb 2015 19:45:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20150217.194519.58137941.horiguchi.kyot...@lab.ntt.co.jp>
> Thank you for the comment. Three new patches are attached. I
> forgot to give a revision number on the previous patch, but I
> think this is the 2nd version.
> 
> 1. walrcv_reply_fix_94_v2.patch
>Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE
> 
> 2. walrcv_reply_fix_92_v2.patch
>Walreceiver patch applyable on REL9_2_STABLE
> 
> 3. walrcv_reply_fix_91_v2.patch
>Walreceiver patch applyable on REL9_1_STABLE
> 
> 
> At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao  wrote 
> in 
> > On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
> >  wrote:
> > >> Introduce the maximum number of records that we can receive and
> > >> process between feedbacks? For example, we can change
> > >> XLogWalRcvSendHSFeedback() so that it's called per at least
> > >> 8 records. I'm not sure what number is good, though...
> > >
> > > At the beginning of the "while(len > 0){if (len > 0){" block,
> > > last_recv_timestamp is always kept up to date (by using
> > > gettimeotda():). So we can use the variable instead of
> > > gettimeofday() in my previous proposal.
> > 
> > Yes, but something like last_recv_timestamp doesn't exist in
> > REL9_1_STABLE. So you don't think that your proposed fix
> > should be back-patched to all supported versions. Right?
> 
> Back to 9.3 has the loop with the same structure. 9.2 is in a bit
> differenct shape but looks to have the same issue. 9.1 and 9.0
> has the same staps with 9.2 but 9.0 doesn't has wal sender
> timeout and 9.1 does not have a variable having current time.
> 
> 9.3 and later: the previous patch works, but revised as your
>   comment.
> 
> 9.2: The time of the last reply is stored in the file-scope
>   variable reply_message, and the current time is stored in
>   walrcv->lastMsgReceiptTime. The timeout is determined using
>   theses variables.
> 
> 9.1: walrcv doesn't have lastMsgReceiptTime. The current time
>   should be acquired explicitly in the innermost loop. I tried
>   calling gettimeofday() once per several loops. The skip count
>   is determined by anticipated worst duration to process a wal
>   record and wal_receiver_status_interval. However, I doubt the
>   necessity to do so.. The value of the worst duration is simply
>   a random guess.
> 
> 9.0: No point to do this kind of fix.
> 
> 
> > > The start time of the timeout could be last_recv_timestamp at the
> > > beginning of the while loop, since it is a bit earlier than the
> > > actual time at the point.
> > 
> > Sounds strange to me. I think that last_recv_timestamp should be
> > compared with the time when the last feedback message was sent,
> > e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
> > as global variable, and use it to compare with last_recv_timestamp.
> 
> You're right to do exactly right thing, but, as you mentioned,
> exposing sendTime is requied to do so. The solution I proposed is
> the second-best assuming that wal_sender_timeout is recommended
> to be longer enough (several times longer) than
> wal_receiver_status_interval. If no one minds to expose sendTime,
> it is the best solution. The attached patch does it.
> 
> # The added comment in the patch was wrong, though.
> 
> > When we get out of the WAL receive loop due to the timeout check
> > which your patch added, XLogWalRcvFlush() is always executed.
> > I don't think that current WAL file needs to be flushed at that time.
> 
> Thank you for pointing it out. Run XLogWalRcvSendReply(force =
> true) immediately instead of breaking from the loop.
> 
> > > The another solution would be using
> > > RegisterTimeout/enable_timeout_after and it seemed to be work for
> > > me. It can throw out the time counting stuff in the loop we are
> > > talking about and that of XLogWalRcvSendHSFeedback and
> > > XLogWalRcvSendReply, but it might be a bit too large for the
> > > gain.
> > 
> > Yes, sounds overkill.
> 
> However, gettimeofday() for each recv is also a overkill for most
> cases. I'll post the patches for receivelog.c based on the patch
> for 9.1 wal receiver.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From cfe01eadd4d8567f4410bccb8334c52fc897c002 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 18 Feb 2015 12:30:58 +0900
Subject: [PATCH] Make pg_bascbackup and pg_receivexlog to keep sending WAL
 receive feedback regularly on heavy load v2.

pg_basebackup and pg_receivexlog fail to send receiver reply message
while they are receiving continuous WAL stream caused by heavy load or
something else. This patch makes them to send reply messa

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-18 Thread Michael Paquier

On Tue, Feb 17, 2015 at 9:02 AM, Andres Freund wrote:
> On 2015-02-17 05:51:22 +0900, Michael Paquier wrote:
>> -- inv_api.c uses bytea in an internal structure without putting it at
>> the end of the structure. For clarity I think that we should just use
>> a bytea pointer and do a sufficient allocation for the duration of the
>> lobj manipulation.
>
> Hm. I don't really see the problem tbh.
>
> There's actually is a reason the bytea is at the beginning - the other
> fields are *are* the data part of the bytea (which is just the varlena
> header).
>
>> -- Similarly, tuptoaster.c needed to be patched for toast_save_datum
>
> I'm not a big fan of these two changes. This adds some not that small
> memory allocations to a somewhat hot path. Without a big win in
> clarity. And it doesn't seem to have anything to do with with
> FLEXIBLE_ARRAY_MEMBER.

We could replace those palloc calls with a simple buffer that has a
predefined size, but this somewhat reduces the readability of the
code.

>> There are as well couple of things that are not changed on purpose:
>> - in namespace.h for FuncCandidateList. I tried manipulating it but it
>> resulted in allocation overflow in PortalHeapMemory
>
> Did you change the allocation in FuncnameGetCandidates()? Note the -
> sizeof(Oid) there.

Yeah. Missed it.

>> - I don't think that the t_bits fields in htup_details.h should be
>> updated either.
>
> Why not? Any not broken code should already use MinHeapTupleSize and
> similar macros.

Changing t_bits impacts HeapTupleHeaderData, ReorderBufferTupleBuf and
similarly a couple of redo routines in heapam.c using
HeapTupleHeaderData in a couple of structures not placing it at the
end (compiler complains). We could use for each of them a buffer that
has enough room with sizeof(HeapTupleHeaderData) + MaxHeapTupleSize,
but wouldn't it reduce the readability of the current code? Opinions
welcome.

>> diff --git a/src/include/catalog/pg_attribute.h 
>> b/src/include/catalog/pg_attribute.h
> [...]
> Generally the catalog changes are much less clearly a benefit imo.

OK, let's drop them then.

>> From ad8f54cdb5776146f17d1038bb295b5f13b549f1 Mon Sep 17 00:00:00 2001
>> From: Michael Paquier 
>> Date: Mon, 16 Feb 2015 03:53:38 +0900
>> Subject: [PATCH 3/3] Update varlena in c.h to use FLEXIBLE_ARRAY_MEMBER
>>
>> Places using a variable-length variable not at the end of a structure
>> are changed with workaround, without impacting what those features do.
>
> I vote for rejecting most of this, except a (corrected version) of the
> pg_authid change. Which doesn't really belong to the rest of the patch
> anyway ;)x

Changing bytea to use FLEXIBLE_ARRAY_MEMBER requires those changes, or
at least some changes in this area as something with
FLEXIBLE_ARRAY_MEMBER can only be placed at the end of a structure.
But I guess that we can do fine as well by replacing those structures
with some buffers with a pre-defined size. I'll draft an additional
patch on top of 0001 with all those less-trivial changes implemented.

>>  #define VARHDRSZ ((int32) sizeof(int32))
>> diff --git a/src/include/catalog/pg_authid.h 
>> b/src/include/catalog/pg_authid.h
>> index e01e6aa..d8789a5 100644
>> --- a/src/include/catalog/pg_authid.h
>> +++ b/src/include/catalog/pg_authid.h
>> @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION 
>> BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC
>>   int32   rolconnlimit;   /* max connections allowed (-1=no 
>> limit) */
>>
>>   /* remaining fields may be null; use heap_getattr to read them! */
>> - textrolpassword;/* password, if any */
>>   timestamptz rolvaliduntil;  /* password expiration time, if any */
>> +#ifdef CATALOG_VARLEN
>> + textrolpassword;/* password, if any */
>> +#endif
>>  } FormData_pg_authid;
>
> That change IIRC is wrong, because it'll make rolvaliduntil until NOT
> NULL (any column that's fixed width and has only fixed with columns
> before it is marked as such).

This sounds better as a separate patch...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2015-02-18 Thread Kouhei Kaigai

Hi Rowley,

Let me put some comments on this patch.

This patch itself looks good as an infrastructure towards
the big picture, however, we still don't reach the consensus
how combined functions are used instead of usual translation
functions.

Aggregate function usually consumes one or more values extracted
from a tuple, then it accumulates its internal state according
to the argument. Exiting transition function performs to update
its internal state with assumption of a function call per records.
On the other hand, new combined function allows to update its
internal state with partial aggregated values which is processed
by preprocessor node.
An aggregate function is represented with Aggref node in plan tree,
however, we have no certain way to determine which function shall
be called to update internal state of aggregate.

For example, avg(float) has an internal state with float[3] type
for number of rows, sum of X and X^2. If combined function can
update its internal state with partially aggregated values, its
argument should be float[3]. It is obviously incompatible to
float8_accum(float) that is transition function of avg(float).
I think, we need a new flag on Aggref node to inform executor
which function shall be called to update internal state of
aggregate. Executor cannot decide it without this hint.

Also, do you have idea to push down aggregate function across
joins? Even though it is a bit old research, I could find
a systematic approach to push down aggregate across join.
https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf

I think, it is great if core functionality support this query
rewriting feature based on cost estimation, without external
modules.

How about your opinions?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Rowley
> Sent: Friday, December 19, 2014 8:39 PM
> To: Simon Riggs
> Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; Amit Kapila
> Subject: Re: [HACKERS] Combining Aggregates
> 
> On 18 December 2014 at 02:48, Simon Riggs  wrote:
> 
> 
>   David, if you can update your patch with some docs to explain the
>   behaviour, it looks complete enough to think about committing it
> in
>   early January, to allow other patches that depend upon it to stand
> a
>   chance of getting into 9.5. (It is not yet ready, but I see it could
>   be).
> 
> 
> 
> 
> Sure, I've more or less added the docs from your patch. I still need to
> trawl through and see if there's anywhere else that needs some additions.
> 
> 
>   The above example is probably the best description of the need,
> since
>   user defined aggregates must also understand this.
> 
>   Could we please call these "combine functions" or other? MERGE is
> an
>   SQL Standard statement type that we will add later, so it will be
>   confusing if we use the "merge" word in this context.
> 
> 
> 
> 
> Ok, changed.
> 
> 
>   David, your patch avoids creating any mergefuncs for existing
>   aggregates. We would need to supply working examples for at least
> a
>   few of the builtin aggregates, so we can test the infrastructure.
> We
>   can add examples for all cases later.
> 
> 
> 
> 
> In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
> types, cash and interval. I've now added combine functions for count(*)
> and count(col). It seems that int8pl() is suitable for this.
> 
> 
> Do you think it's worth adding any new functions at this stage, or is it
> best to wait until there's a patch which can use these?
> 
> Regards
> 
> David Rowley

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

74 matches

Mail list logo