date:20150120

On Sat, Jan 17, 2015 at 3:40 AM, Robert Haas robertmh...@gmail.com wrote:

 New patch attached.  I'm going to take the risk of calling this v1
 (previous versions have been 0.x), since I've now done something about
 the heavyweight locking issue, as well as fixed the message-looping
 bug Amit pointed out.  It doubtless needs more work, but it's starting
 to smell a bit more like a real patch.


I need some clarification regarding below code:

+BgwHandleStatus
+WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
+{
+ BgwHandleStatus
status;
+ int rc;
+ bool save_set_latch_on_sigusr1;
+
+
save_set_latch_on_sigusr1 = set_latch_on_sigusr1;
+ set_latch_on_sigusr1 = true;
+
+ PG_TRY();
+ {
+
for (;;)
+ {
+ pid_t pid;
+
+
CHECK_FOR_INTERRUPTS();
+
+ status = GetBackgroundWorkerPid(handle, pid);
+
if (status == BGWH_STOPPED)
+ return status;
+
+ rc =
WaitLatch(MyProc-procLatch,
+   WL_LATCH_SET |
WL_POSTMASTER_DEATH, 0);
+
+ if (rc  WL_POSTMASTER_DEATH)
+
return BGWH_POSTMASTER_DIED;

It seems this code has possibility to wait forever.
Assume one of the worker is not able to start (not able to attach
to shared memory or some other reason), then status returned by
GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED
and after that it can wait forever in WaitLatch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Tom Lane

Michael Paquier michael.paqu...@gmail.com writes:
 On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund and...@2ndquadrant.com wrote:
 I think you misunderstood me ;). I was talking about the old CF
 application providing a RSS feed of all changes to all entries.
 https://commitfest-old.postgresql.org/action/commitfest_activity.rss

 Oh, I didn't know this one. That's indeed useful.

Personally, I never used the RSS feed as such, but I did often consult the
activity log pages, which I think are equivalent:

https://commitfest-old.postgresql.org/action/commitfest_activity?id=25

That feature seems to be gone too :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com
wrote:

Yeah, you need two separate global variables pointing to shm_mq
objects, one of which gets used by pqmq.c for errors and the other of
which gets used by printtup.c for tuples.

Okay, I will try to change the way as suggested without doing
switching, but this way we need to do it separately for 'T', 'D', and
'C' messages.

I have taken care of integrating the parallel sequence scan with the
latest patch posted (parallel-mode-v1.patch) by Robert at below
location:
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

Changes in this version
---
1. As mentioned previously, I have exposed one parameter
ParallelWorkerNumber as used in parallel-mode patch.
2. Enabled tuple queue to be used for passing tuples from
worker backend to master backend along with error queue
as per suggestion by Robert in the mail above.
3. Involved master backend to scan the heap directly when
tuples are not available in any shared memory tuple queue.
4. Introduced 3 new parameters (cpu_tuple_comm_cost,
parallel_setup_cost, parallel_startup_cost) for deciding the cost
of parallel plan. Currently, I have kept the default values for
parallel_setup_cost and parallel_startup_cost as 0.0, as those
require some experiments.
5. Fixed some issues (related to memory increase as reported
upthread by Thom Brown and general feature issues found during
test)

Note - I have yet to handle the new node types introduced at some
of the places and need to verify prepared queries and some other
things, however I think it will be good if I can get some feedback
at current stage.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

parallel_seqscan_v4.patch
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c

2015-01-20 Thread Tom Lane

Michael Paquier michael.paqu...@gmail.com writes:
 Coverity is pointing out $subject, with the following stuff in gbt_var_same():
 ...
 As Heikki pointed me out on IM, the lack of crash report in this area,
 as well as similar coding style in cube/ seem to be sufficient
 arguments to simply remove those NULL checks instead of doing more
 solid checks on them. Patch is attached.

The way to form a convincing argument that these checks are unnecessary
would be to verify that (1) the SQL-accessible functions directly calling
gbt_var_same() are all marked STRICT, and (2) the core GIST code never
passes a NULL to these support functions.  I'm prepared to believe that
(1) and (2) are both true, but it merits checking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 Tom (tgl),
 Is my reasoning above acceptable?

Uh, sorry, I've not been paying any attention to this thread for awhile.
What's the remaining questions at issue?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Magnus Hagander

On Tue, Jan 20, 2015 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Michael Paquier michael.paqu...@gmail.com writes:
  On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  I think you misunderstood me ;). I was talking about the old CF
  application providing a RSS feed of all changes to all entries.
  https://commitfest-old.postgresql.org/action/commitfest_activity.rss

  Oh, I didn't know this one. That's indeed useful.

 Personally, I never used the RSS feed as such, but I did often consult the
 activity log pages, which I think are equivalent:

 https://commitfest-old.postgresql.org/action/commitfest_activity?id=25

 That feature seems to be gone too :-(


Yeah, that's annoying.

In fact, I'm the one who forced the RSS feed into the old system, and it's
something I use all the time myself. Oops.


Turns out I have those in a feature branch that it appears I forgot to
merge :( And it also no longer merges cleanly.

I've added this to my short-term TODO, and will look at it this evening.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

Hi,

On 20.1.2015 12:23, Ali Akbar wrote:
 2015-01-20 18:17 GMT+07:00 Ali Akbar the.ap...@gmail.com

 Sorry, there is another comment of makeMdArrayResult, i suggest also
 changing it like this:
 @@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate,
   * beware: no check that specified dimensions match the number of values
   * accumulated.
   *
 + * beware: if the astate was not initialized within a separate memory
 + * context (i.e. using subcontext=true when calling initArrayResult),
 + * using release=true is illegal as it releases the whole context,
 + * and that may include other memory still used elsewhere (instead use
 + * release=false and release the memory with the parent context later)
 + *
   *astate is working state (must not be NULL)
   *rcontext is where to construct result

I think both comment fixes are appropriate. I'll wait a bit and then
post an updated version of the patch (unless it gets commited with the
comment fixes before that).

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

On Tue, Jan 20, 2015 at 10:30 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Tom Lane wrote:
 Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches.
 With this optimization flag enabled, recent versions of gcc can generate
 incorrect code that assumes variable-length arrays (such as oidvector)
 are actually fixed-length because they're embedded in some larger struct.
 The known instance of this problem was fixed in 9.2 and up by commit
 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides
 actually-variable-length catalog fields from the compiler altogether.
 And we plan to gradually convert variable-length fields to official
 flexible array member notation over time, which should prevent this type
 of bug from reappearing as gcc gets smarter.  We're not going to try to
 back-port those changes into older branches, though, so apply this
 band-aid instead.

 Would anybody object to me pushing this commit to branches 8.2 and 8.3?

Since those branches are out of support, I am not sure what the point
is.  If we want people to be able to use those branches reasonably we
need to back-port fixes for critical security and stability issues,
not just this one thing.

But maybe I am missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)

On Mon, Jan 19, 2015 at 8:48 PM, Amit Langote
langote_amit...@lab.ntt.co.jp wrote:
 Specifically, do we regard a partitions as pg_inherits children of its
 partitioning parent?

 I don't think this is totally an all-or-nothing decision.  I think
 everyone is agreed that we need to not break things that work today --
 e.g. Merge Append.  What that implies for pg_inherits isn't altogether
 clear.

 One point is that an implementation may end up establishing the
 parent-partition hierarchy somewhere other than (or in addition to)
 pg_inherits. One intention would be to avoid tying partitioning scheme
 to certain inheritance features that use pg_inherits. For example,
 consider call sites of find_all_inheritors(). One notable example is
 Append/MergeAppend which would be of interest to partitioning. We would
 want to reuse that part of the infrastructure but we could might as well
 write an equivalent, say find_all_partitions() which scans something
 other than pg_inherits to get all partitions.

IMHO, there's little reason to avoid putting pg_inherits entries in
for the partitions, and then this just works.  We can find other ways
to make it work if that turns out to be better, but if we don't have
one, there's no reason to complicate things.

 Agree that some concrete idea of internal representation should help
 guide the catalog structure. If we are going to cache the partitioning
 info in relcache (which we most definitely will), then we should try to
 make sure to consider the scenario of having a lot of partitioned tables
 with a lot of individual partitions. It looks like it would be similar
 to a scenarios where there are a lot of inheritance hierarchies. But,
 availability of partitioning feature would definitely cause these
 numbers to grow larger. Perhaps this is an important point driving this
 discussion.

Yeah, it would be good if the costs of supporting, say, 1000
partitions were negligible.

 A primary question for me about partition-pruning is when do we do it?
 Should we model it after relation_excluded_by_constraints() and hence
 totally plan-time? But, the tone of the discussion is that we postpone
 partition-pruning to execution-time and hence my perhaps misdirected
 attempts to inject it into some executor machinery.

It's useful to prune partitions at plan time, because then you only
have to do the work once.  But sometimes you don't know enough to do
it at plan time, so it's useful to do it at execution time, too.
Then, you can do it differently for every tuple based on the actual
value you have.  There's no point in doing 999 unnecessary relation
scans if we can tell which partition the actual run-time value must be
in.  But I think execution-time pruning can be a follow-on patch.  If
you don't restrict the scope of the first patch as much as possible,
you're not going to have much luck getting this committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

On Tue, Jan 20, 2015 at 9:43 PM, Thom Brown t...@linux.com wrote:

On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote:

On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com
wrote:

Yeah, you need two separate global variables pointing to shm_mq
objects, one of which gets used by pqmq.c for errors and the other of
which gets used by printtup.c for tuples.

Okay, I will try to change the way as suggested without doing
switching, but this way we need to do it separately for 'T', 'D', and
'C' messages.

I have taken care of integrating the parallel sequence scan with the
latest patch posted (parallel-mode-v1.patch) by Robert at below
location:

http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

Which commit is this based against? I'm getting errors with the latest
master:

It seems to me that you have not applied parallel-mode patch
before applying this patch, can you try once again by first applying
the patch posted by Robert at below link:
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

commit-id used for this patch - 0b49642

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Thom Brown

On 20 January 2015 at 16:55, Amit Kapila amit.kapil...@gmail.com wrote:

On Tue, Jan 20, 2015 at 9:43 PM, Thom Brown t...@linux.com wrote:

On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com
wrote:

On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com
wrote:

Yeah, you need two separate global variables pointing to shm_mq
objects, one of which gets used by pqmq.c for errors and the other
of
which gets used by printtup.c for tuples.

Okay, I will try to change the way as suggested without doing
switching, but this way we need to do it separately for 'T', 'D', and
'C' messages.

I have taken care of integrating the parallel sequence scan with the
latest patch posted (parallel-mode-v1.patch) by Robert at below
location:

http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

Which commit is this based against? I'm getting errors with the latest
master:

It seems to me that you have not applied parallel-mode patch
before applying this patch, can you try once again by first applying
the patch posted by Robert at below link:

http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

commit-id used for this patch - 0b49642

D'oh. Yes, you're completely right. Works fine now.

Thanks.

Thom

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Thom Brown

On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote:

 On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com
 wrote:
  
   Yeah, you need two separate global variables pointing to shm_mq
   objects, one of which gets used by pqmq.c for errors and the other of
   which gets used by printtup.c for tuples.
  
 
  Okay, I will try to change the way as suggested without doing
  switching, but this way we need to do it separately for 'T', 'D', and
  'C' messages.
 

 I have taken care of integrating the parallel sequence scan with the
 latest patch posted (parallel-mode-v1.patch) by Robert at below
 location:

 http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

 Changes in this version
 ---
 1. As mentioned previously, I have exposed one parameter
 ParallelWorkerNumber as used in parallel-mode patch.
 2. Enabled tuple queue to be used for passing tuples from
 worker backend to master backend along with error queue
 as per suggestion by Robert in the mail above.
 3. Involved master backend to scan the heap directly when
 tuples are not available in any shared memory tuple queue.
 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
 parallel_setup_cost, parallel_startup_cost) for deciding the cost
 of parallel plan.  Currently, I have kept the default values for
 parallel_setup_cost and parallel_startup_cost as 0.0, as those
 require some experiments.
 5. Fixed some issues (related to memory increase as reported
 upthread by Thom Brown and general feature issues found during
 test)

 Note - I have yet to handle the new node types introduced at some
 of the places and need to verify prepared queries and some other
 things, however I think it will be good if I can get some feedback
 at current stage.


I'm getting an issue:

 ➤ psql://thom@[local]:5488/pgbench

# set parallel_seqscan_degree = 8;
SET
Time: 0.248 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain select c1 from t1;
  QUERY PLAN
--
 Parallel Seq Scan on t1  (cost=0.00..21.22 rows=100 width=4)
   Number of Workers: 8
   Number of Blocks Per Worker: 11
(3 rows)

Time: 0.322 ms

# explain analyse select c1 from t1;
QUERY
PLAN
---
 Parallel Seq Scan on t1  (cost=0.00..21.22 rows=100 width=4) (actual
time=0.024..13.468 rows=100 loops=1)
   Number of Workers: 8
   Number of Blocks Per Worker: 11
 Planning time: 0.040 ms
 Execution time: 13.862 ms
(5 rows)

Time: 14.188 ms

 ➤ psql://thom@[local]:5488/pgbench

# set parallel_seqscan_degree = 10;
SET
Time: 0.219 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain select c1 from t1;
  QUERY PLAN
--
 Parallel Seq Scan on t1  (cost=0.00..19.18 rows=100 width=4)
   Number of Workers: 10
   Number of Blocks Per Worker: 9
(3 rows)

Time: 0.375 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain analyse select c1 from t1;


So setting parallel_seqscan_degree above max_worker_processes causes the
CPU to max out, and the query never returns, or at least not after waiting
2 minutes.  Shouldn't it have a ceiling of max_worker_processes?

The original test I performed where I was getting OOM errors now appears to
be fine:

# explain (analyse, buffers, timing) select distinct bid from
pgbench_accounts;
   QUERY
PLAN

 HashAggregate  (cost=1400411.11..1400412.11 rows=100 width=4) (actual
time=8504.333..8504.335 rows=13 loops=1)
   Group Key: bid
   Buffers: shared hit=32 read=18183
   -  Parallel Seq Scan on pgbench_accounts  (cost=0.00..1375411.11
rows=1000 width=4) (actual time=0.054..7183.494 rows=1000 loops=1)
 Number of Workers: 8
 Number of Blocks Per Worker: 18215
 Buffers: shared hit=32 read=18183
 Planning time: 0.058 ms
 Execution time: 8876.967 ms
(9 rows)

Time: 8877.366 ms

Note that I increased seq_page_cost to force a parallel scan in this case.

Thom

Re: [HACKERS] proposal: searching in array function - array_position

Hi

I am sending updated version - it allow third optional argument that
specify where searching should to start. With it is possible repeatably
call this function.

Regards

Pavel

2015-01-17 23:43 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 here is a proof concept of array_offset function

 possible question:

 * used comparation = or IS NOT DISTINCT FROM

 In this initial proof concept I used IS NOT DISTINCT FROM operator - but
 my opinion is not strong in this question. Both has some advantages and
 disadvantages.

 Regards

 Pavel


 2015-01-16 19:12 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-01-16 18:37 GMT+01:00 Jim Nasby jim.na...@bluetreble.com:

 On 1/16/15 11:16 AM, Pavel Stehule wrote:



 2015-01-16 17:57 GMT+01:00 Jim Nasby jim.na...@bluetreble.com mailto:
 jim.na...@bluetreble.com:

 On 1/16/15 3:39 AM, Pavel Stehule wrote:

 I am proposing a simple function, that returns a position of
 element in array.


 Yes please!

 FUNCTION array_position(anyarray, anyelement) RETURNS int


 That won't work on a multi-dimensional array. Ideally it needs to
 accept a slice or an element and return the specifier for the slice.


 It is question, what is a result - probably, there can be a
 multidimensional variant, where result will be a array

 array_position([1,2,3],2) -- 2
 array_position([[1,2],[2,3],[3,4]], [2,3]) -- 2 /* 2nd parameter
 should to have N-1 dimension of first parameter */


 The problem with that is you can't actually use '2' to get [2,3] back:

 select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
  ?column?
 --
  t
 (1 row)


 yes, but when you are searching a array in array you can use a full slice
 selection:

 postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a
 constant every time in this case -- so it should not be returned
   array
 -
  {{1,2}}
 (1 row)





 I think the bigger problem here is we need something better than slices
 for handling subsets of arrays. Even if the function returned [2:2] it's
 still going to behave differently than it will in the non-array case
 because you won't be getting the expected number of dimensions back. :(


 you cannot to return a slice and I don't propose it, although we can
 return a range type or array of range type - but still we cannot to use
 range for a arrays.


  array_position_md([1,2,3],2) -- [2]
 array_position_md([[1,2],[2,3],[3,4]], 2) -- [2,1]

 another question is how to solve more than one occurrence on one value
 - probably two sets of functions - first returns first occurrence of value,
 second returns set of occurrence


 Gee, if only way had some way to return multiple elements of
 something... ;P

 In other words, I think all of these should actually return an array of
 positions. I think it's OK for someone that only cares about the first
 instance to just do [1].


 there can be two functions - position - returns first and positions
 returns all as a array



 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com




diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
new file mode 100644
index 9ea1068..b3630b4
*** a/doc/src/sgml/array.sgml
--- b/doc/src/sgml/array.sgml
*** SELECT * FROM sal_emp WHERE pay_by_quart
*** 600,605 
--- 600,614 
index, as described in xref linkend=indexes-types.
   /para
  
+  para
+   You can also search any value in array using the functionarray_offset/
+   function (It returns a position of first occurrence of value in the array):
+ 
+ programlisting
+ SELECT array_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon');
+ /programlisting
+  /para
+ 
   tip
para
 Arrays are not sets; searching for specific array elements
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 5e7b000..62c9f7f
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT NULLIF(value, '(none)') ...
*** 11474,11479 
--- 11474,11482 
  primaryarray_lower/primary
/indexterm
indexterm
+ primaryarray_offset/primary
+   /indexterm
+   indexterm
  primaryarray_prepend/primary
/indexterm
indexterm
*** SELECT NULLIF(value, '(none)') ...
*** 11592,11597 
--- 11595,11613 
 /row
 row
  entry
+  literal
+   functionarray_offset/function(typeanyarray/type, typeanyelement/type optional, typeint/type/optional)
+  /literal
+ /entry
+ entrytypeint/type/entry
+ entryreturns a offset of first occurrence of some element in a array. It uses
+ a literalIS NOT DISTINCT FROM/ operator for comparation. Third
+ optional argument can specify a initial offset when searching starts. /entry
+ entryliteralarray_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon')/literal/entry
+ entryliteral2/literal/entry
+/row
+

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Alvaro Herrera

Robert Haas wrote:

  Would anybody object to me pushing this commit to branches 8.2 and 8.3?
 
 Since those branches are out of support, I am not sure what the point
 is.  If we want people to be able to use those branches reasonably we
 need to back-port fixes for critical security and stability issues,
 not just this one thing.
 
 But maybe I am missing something.

I just want to make it easy to compile those branches with current
toolset so that I can study their behavior to suggest workarounds for
customer problems etc -- nothing more.  I am not proposing to open them
up again for support.  Of course, I can carry the patched branches
locally if there is strong opposition, but since it's harmless, I don't
see why would there be any such.  Another easy workaround is to add -O0
to CFLAGS, and I can script that easily too.

Without this patch or -O0, initdb fails with 

inicializando pg_authid ... FATAL:  wrong number of index expressions
SENTENCIA:  CREATE TRIGGER pg_sync_pg_database   AFTER INSERT OR UPDATE OR 
DELETE ON pg_database   FOR EACH STATEMENT EXECUTE PROCEDURE 
flatfile_update_trigger();


There is the additional problem that contrib/cube fails to compile, but
I don't care enough about that one:

/pgsql/source/REL8_3_STABLE/contrib/cube/cubeparse.y:61:17: error: ‘result’ 
undeclared (first use in this function)
  *((void **)result) = write_box( dim, $2, $4 );
 ^

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

On Mon, Jan 19, 2015 at 10:39 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think whichever process reads postgresql.conf/postgresql.auto.conf have
 to do this (unless we restrict that this will be done at some other time)
 and
 postmaster is one of them.  It seems to me that it is not good idea unless
 we do it at other time like populating entries for a view.

Well, the postmaster does not have database access, and neither do
most of the auxiliary processes.  The autovacuum launcher has very
limited database access (shared catalogs only).

Making the postmaster access the database is a complete non-starter;
we have worked very hard to avoid that, for reasons of overall system
robustness.  If the postmaster crashes or locks up, manual
intervention is required; if any other process crashes, the postmaster
can reset the whole system.

 Independently of that, it sounds like solving the problem from the
 wrong end.  I think the idea of ALTER SYSTEM .. SET ought to be that
 you should EITHER edit postgresql.conf for all your configuration
 changes, OR you should use ALTER SYSTEM .. SET for all of your
 changes.  If you mix-and-match, well, that's certainly allowed and it
 will work and everything, but you - as a human being - might get
 confused.

 Right, but we can't completely eliminate such a possibility (as an
 example we have some default settings like max_connections,
 shared_buffers, etc).  I agree with you that users should use only
 way of changing the settings, however for certain rare cases (default
 settings or some other) we can provide a way for user to know which
 of the settings are duplicate.  I think if we can provide such an
 information via some view with ease then it's worth to have it.

I'd suggest abandoning the idea of storing anything in a persistent
basis in a system catalog and look for some way for the backend to
which the user is connected to expose its own state instead.  For
example, pg_settings already exposes sourcefile and sourceline
settings.  Actually, I'm not quite sure why that's not sufficient
here, but if it isn't, it could be extended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 It seems [WaitForBackgroundWorkerShutdown] has possibility to wait forever.
 Assume one of the worker is not able to start (not able to attach
 to shared memory or some other reason), then status returned by
 GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED
 and after that it can wait forever in WaitLatch.

I don't think that's right.  The status only remains
BGWH_NOT_YET_STARTED until the postmaster forks the child process.  At
that point it immediately changes to BGWH_STARTED.  If it starts up
and then dies early on, for example because it can't attach to shared
memory or somesuch, the status will change to BGWH_STOPPED.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-20 Thread Andres Freund

Hi,

I'm analyzing a problem in which a customer had a pg_basebackup (from
standby) created 9.2 cluster that failed with WAL contains references to
invalid pages. The failed record was a xlog redo visible
i.e. XLOG_HEAP2_VISIBLE.

First I thought there might be another bug along the line of
17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
seem to have any problems.

Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
the basebackup was started and finished *before* pg_basebackup finished.

movedb() basically works in these steps:
1) lock out users of the database
2) RequestCheckpoint(IMMEDIATE|WAIT)
3) DropDatabaseBuffers()
4) copydir()
5) XLogInsert(XLOG_DBASE_CREATE)
6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
7) rmtree(src_dbpath)
8) XLogInsert(XLOG_DBASE_DROP)
9) unlock database

If a basebackup starts while 4) is in progress and continues until 7)
happens I think a pretty wide race opens: The basebackup can end up with
a partial copy of the database in the old tablespace because the
rmtree(old_path) concurrently was in progress.  Normally such races are
fixed during replay. But in this case, the replay of the
XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
fixing nothing.

Besides making AD .. ST use sane WAL logging, which doesn't seem
backpatchable, I don't see what could be done against this except
somehow making basebackups fail if a AD .. ST is in progress. Which
doesn't look entirely trivial either.

This is a pretty nasty bug imo, because you're in no way guaranteed to
be noticed. If the problem happens only in some large, seldomly read,
table, the database might appear to be in a correct state.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that the attached patch should at least fix that much. Maybe
 the problem on the other animal is also explained by the lack of this,
 since there could also be a MinGW-ish strxfrm_l(), I suppose.

Committed that, rather blindly, since it looks like a reasonable fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

On 25.12.2014 22:28, Tomas Vondra wrote:
 On 25.12.2014 21:14, Andres Freund wrote:

 That's indeed odd. Seems to have been lost when the statsfile was
 split into multiple files. Alvaro, Tomas?
 
 The goal was to keep the logic as close to the original as possible.
 IIRC there were pgstat wait timeout issues before, and in most cases
 the conclusion was that it's probably because of overloaded I/O.
 
 But maybe there actually was another bug, and it's entirely possible
 that the split introduced a new one, and that's what we're seeing now.
 The strange thing is that the split happened ~2 years ago, which is
 inconsistent with the sudden increase of this kind of issues. So maybe
 something changed on that particular animal (a failing SD card causing
 I/O stalls, perhaps)?
 
 Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
 and analyze the issue locally. But that won't happen until January.

I've tried to reproduce this on my Raspberry PI 'machine' and it's not
very difficult to trigger this. About 7 out of 10 'make check' runs fail
because of 'pgstat wait timeout'.

All the occurences I've seen were right after some sort of VACUUM
(sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
looked something like this:

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0   0.0075.000.008.00 0.0036.00
9.00 5.73 15633.750.00 15633.75 125.00 100.00

So pretty terrible (this is a Class 4 SD card, supposedly able to handle
4 MB/s). If hamster had faulty SD card, it might have been much worse, I
guess.

This of course does not prove the absence of a bug - I plan to dig into
this a bit more. Feel free to point out some suspicious scenarios that
might be worth reproducing and analyzing.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Thom Brown

On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote:

On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com
wrote:

Yeah, you need two separate global variables pointing to shm_mq
objects, one of which gets used by pqmq.c for errors and the other of
which gets used by printtup.c for tuples.

Okay, I will try to change the way as suggested without doing
switching, but this way we need to do it separately for 'T', 'D', and
'C' messages.

I have taken care of integrating the parallel sequence scan with the
latest patch posted (parallel-mode-v1.patch) by Robert at below
location:

http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

Which commit is this based against? I'm getting errors with the latest
master:

thom@swift:~/Development/postgresql$ patch -p1
~/Downloads/parallel_seqscan_v4.patch
patching file src/backend/access/Makefile
patching file src/backend/access/common/printtup.c
patching file src/backend/access/shmmq/Makefile
patching file src/backend/access/shmmq/shmmqam.c
patching file src/backend/commands/explain.c
Hunk #1 succeeded at 721 (offset 8 lines).
Hunk #2 succeeded at 918 (offset 8 lines).
Hunk #3 succeeded at 1070 (offset 8 lines).
Hunk #4 succeeded at 1337 (offset 8 lines).
Hunk #5 succeeded at 2239 (offset 83 lines).
patching file src/backend/executor/Makefile
patching file src/backend/executor/execProcnode.c
patching file src/backend/executor/execScan.c
patching file src/backend/executor/execTuples.c
patching file src/backend/executor/nodeParallelSeqscan.c
patching file src/backend/executor/nodeSeqscan.c
patching file src/backend/libpq/pqmq.c
Hunk #1 succeeded at 23 with fuzz 2 (offset -3 lines).
Hunk #2 FAILED at 63.
Hunk #3 succeeded at 132 (offset -31 lines).
1 out of 3 hunks FAILED -- saving rejects to file
src/backend/libpq/pqmq.c.rej
patching file src/backend/optimizer/path/Makefile
patching file src/backend/optimizer/path/allpaths.c
patching file src/backend/optimizer/path/costsize.c
patching file src/backend/optimizer/path/parallelpath.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/optimizer/plan/setrefs.c
patching file src/backend/optimizer/util/pathnode.c
patching file src/backend/postmaster/Makefile
patching file src/backend/postmaster/backendworker.c
patching file src/backend/postmaster/postmaster.c
patching file src/backend/tcop/dest.c
patching file src/backend/tcop/postgres.c
Hunk #1 succeeded at 54 (offset -1 lines).
Hunk #2 succeeded at 1132 (offset -1 lines).
patching file src/backend/utils/misc/guc.c
patching file src/backend/utils/misc/postgresql.conf.sample
can't find file to patch at input line 2105
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
|index 761ba1f..00ad468 100644
|--- a/src/include/access/parallel.h
|+++ b/src/include/access/parallel.h
--
File to patch:

--
Thom

Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

On Mon, Jan 19, 2015 at 10:53 PM, tim_wilson tim.wil...@telogis.com wrote:
 Was slightly optimistic that this comment in the release notes might mean
 that my bug with bloat on hot tables might have been fixed in 9.4

 /Make VACUUM properly report dead but not-yet-removable rows to the
 statistics collector (Hari Babu)

 Previously these were reported as live rows./

 Unfortunately that is not the case, and we still have the problem in 9.4

As far as I can tell from reviewing the thread, what we need to do
here is basically invent HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.
There was a lot of concern about doing that in the back-branches, but
I'm skeptical that the concern is justified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 19:16 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 I cannot to set my name as author for patch:
 https://commitfest.postgresql.org/4/112/


It is solved now - I don't understand a autocomplete in first moment

All works well

Regards

Pavel




 Regards

 Pavel

 2015-01-13 6:35 GMT+01:00 Magnus Hagander mag...@hagander.net:

 Hi!

 Last I said something about the new CF app I said I was planning to
 deploy it over the holidays, and that clearly did not happen.

 But I've been talking to Michael, as the current cf-chief, and doing some
 further testing with it, and we think now is a good time to go for it :) As
 the plan is to close out the current CF just a few days from now, we're
 going to use that and try to swap it out when traffic is at least at it's
 lowest (even if we're well aware it's not going to be zero).

 So based on this, we plan to:

 In the late evening on Jan 19th (evening European time that is), I will
 make the current CF app readonly, and move it to a new url where it will
 remain available for the foreseeable future (we have a lot of useful data
 in it after all).

 Once this is done, Michael (primarily) will start working on syncing up
 the information about the latest patches into the new app. Much info is
 already synced there, but to make sure all the latest changes are included.

 In the morning European time on the 20th, I'll take over and try to
 finish up what's left. And sometime during the day when things are properly
 in sync, we'll open up the new app for business to all users.

 There are surely some bugs remaining in the system, so please have some
 oversight with that over the coming days/weeks. I'll try to get onto fixing
 them as soon as I can - and some others can look at that as well if it's
 something critical at least.

 Further status updates to come as we start working on it...

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/

Re: [HACKERS] Reducing buildfarm disk usage: remove temp installs when done

2015-01-20 Thread Andrew Dunstan



On 01/19/2015 09:53 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

But I'm wondering if we should look at using the tricks git-new-workdir
uses, setting up symlinks instead of a full clone. Then we'd have one
clone with a bunch of different work dirs. That plus a but of explicitly
done garbage collection and possibly a periodic re-clone might do the trick.

Yeah, I was wondering whether it'd be okay to depend on git-new-workdir.
That would fix the problem pretty nicely.  But in the installations I've
seen, that's not in PATH but squirreled away in some hard-to-guess library
directory ...




We should move this discussion to the buildfarm members list.

I'll be publishing a patch there.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Alvaro Herrera

Tom Lane wrote:
 Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches.
 
 With this optimization flag enabled, recent versions of gcc can generate
 incorrect code that assumes variable-length arrays (such as oidvector)
 are actually fixed-length because they're embedded in some larger struct.
 The known instance of this problem was fixed in 9.2 and up by commit
 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides
 actually-variable-length catalog fields from the compiler altogether.
 And we plan to gradually convert variable-length fields to official
 flexible array member notation over time, which should prevent this type
 of bug from reappearing as gcc gets smarter.  We're not going to try to
 back-port those changes into older branches, though, so apply this
 band-aid instead.

Would anybody object to me pushing this commit to branches 8.2 and 8.3?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Andres Freund

On 2015-01-20 11:10:53 -0500, Robert Haas wrote:
 On Tue, Jan 20, 2015 at 10:30 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Tom Lane wrote:
  Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches.
  With this optimization flag enabled, recent versions of gcc can generate
  incorrect code that assumes variable-length arrays (such as oidvector)
  are actually fixed-length because they're embedded in some larger struct.
  The known instance of this problem was fixed in 9.2 and up by commit
  8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides
  actually-variable-length catalog fields from the compiler altogether.
  And we plan to gradually convert variable-length fields to official
  flexible array member notation over time, which should prevent this type
  of bug from reappearing as gcc gets smarter.  We're not going to try to
  back-port those changes into older branches, though, so apply this
  band-aid instead.
 
  Would anybody object to me pushing this commit to branches 8.2 and 8.3?
 
 Since those branches are out of support, I am not sure what the point
 is.  If we want people to be able to use those branches reasonably we
 need to back-port fixes for critical security and stability issues,
 not just this one thing.
 
 But maybe I am missing something.

Supporting and being able to compile and run 'make check' (which doesn't
complete = gcc 4.8) aren't the same thing though. And we e.g. try to
provide pg_dump and libpq support for older versions, which is hard to
ensure if you can't run them.

I personally think that being able to at least compile/make check old
versions a bit longer is a good idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

On Tue, Jan 20, 2015 at 11:18 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
  Would anybody object to me pushing this commit to branches 8.2 and 8.3?

 Since those branches are out of support, I am not sure what the point
 is.  If we want people to be able to use those branches reasonably we
 need to back-port fixes for critical security and stability issues,
 not just this one thing.

 But maybe I am missing something.

 I just want to make it easy to compile those branches with current
 toolset so that I can study their behavior to suggest workarounds for
 customer problems etc -- nothing more.  I am not proposing to open them
 up again for support.

Oh, I see.  Well, that doesn't bother me, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

Hi

I cannot to set my name as author for patch:
https://commitfest.postgresql.org/4/112/

Regards

Pavel

2015-01-13 6:35 GMT+01:00 Magnus Hagander mag...@hagander.net:

 Hi!

 Last I said something about the new CF app I said I was planning to deploy
 it over the holidays, and that clearly did not happen.

 But I've been talking to Michael, as the current cf-chief, and doing some
 further testing with it, and we think now is a good time to go for it :) As
 the plan is to close out the current CF just a few days from now, we're
 going to use that and try to swap it out when traffic is at least at it's
 lowest (even if we're well aware it's not going to be zero).

 So based on this, we plan to:

 In the late evening on Jan 19th (evening European time that is), I will
 make the current CF app readonly, and move it to a new url where it will
 remain available for the foreseeable future (we have a lot of useful data
 in it after all).

 Once this is done, Michael (primarily) will start working on syncing up
 the information about the latest patches into the new app. Much info is
 already synced there, but to make sure all the latest changes are included.

 In the morning European time on the 20th, I'll take over and try to finish
 up what's left. And sometime during the day when things are properly in
 sync, we'll open up the new app for business to all users.

 There are surely some bugs remaining in the system, so please have some
 oversight with that over the coming days/weeks. I'll try to get onto fixing
 them as soon as I can - and some others can look at that as well if it's
 something critical at least.

 Further status updates to come as we start working on it...

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Jeff Davis

On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 Tom (tgl),
 Is my reasoning above acceptable?

 Uh, sorry, I've not been paying any attention to this thread for awhile.
 What's the remaining questions at issue?

This patch is trying to improve the array_agg case where there are
many arrays being constructed simultaneously, such as in HashAgg. You
strongly suggested that a commitable patch would differentiate between
the grouped and ungrouped cases (or perhaps you meant differentiate
between HashAgg and sorted aggregation?). Tomas's current patch does
not do so; it does two main things:

   1. Uses a common memory context for arrays being constructed by
array_agg in any context (ungrouped, sorted agg, and HashAgg)
   2. Reduces the initial array allocation to 8 elements from 64, but
preserves doubling behavior

I don't see either as a big problem, but perhaps there are some
downsides in some cases. I think a worst case would be a slowdown in
the sorted agg case where every group has 64 elements, so I'll try to
see if that's a real problem or not. If you saw a bigger problem,
please let me know; and if not, I'll proceed with the review.

There are also some other concerns I have about the API ugliness,
which I believe is the reason there's so much discussion about making
the comments better. The reason the API is ugly is for backwards
compatibility, so there's no perfect solution. Your opinion is welcome
here too, but I mainly need to see if your objection above has been
addressed.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 So when I'm trying to decide what to audit, I have to:

 (a) check if the current user is mentioned in .roles; if so, audit.

 (b) check if the current user is a descendant of one of the roles
 mentioned in .roles; if not, no audit.

 (c) check for permissions granted to the root role and see if that
 tells us to audit.

 Is that right? If so, it would work fine. I don't look forward to trying
 to explain it to people, but yes, it would work (for anything you could
 grant permissions for).

I think this points to fundamental weakness in the idea of doing this
through the GRANT system.  Some people are going want to audit
everything a particular user does, some people are going to want to
audit all access to particular objects, and some people will have more
complicated requirements.  Some people will want to audit even
super-users, others especially super-users, others only non
super-users.  None of this necessarily matches up to the usual
permissions framework.

 Have you considered splitting pgaudit.c into multiple files, perhaps
 along the pre/post deparse lines?

 If such a split were done properly, then could the backwards-compatible
 version be more acceptable for inclusion in contrib in 9.5? If so, I'll
 look into it.

We're not going to include code in contrib that has leftovers in it
for compatibility with earlier source trees.  That's been discussed on
this mailing list many times and the policy is clear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf


On 1/16/15 10:32 PM, David G Johnston wrote:

One thought I have in this line is that currently there doesn't seem

to

 be
 a way to know if the setting has an entry both in postgresql.conf and
 postgresql.auto.conf, if we can have some way of knowing the same
 (pg_settings?), then it could be convenient for user to decide if the
 value
 in postgresql.auto.conf is useful or not and if it's not useful then

use

 Alter System .. Reset command to remove the same from
 postgresql.auto.conf.


I think one way is that pg_settings has file name of variables,  But
It would not affect to currently status of postgresql.conf
So we would need to parse postgresql.conf again at that time.



Yeah that could be a possibility, but I think that will break the
existing
command('s) as this is the common infrastructure used for SHOW ..
commands as well which displays the guc value that is used by
current session rather than the value in postgresql.conf.


You're right.
pg_setting and SHOW command use value in current session rather than
config file.
It might break these common infrastructure.

Two changes solve this problem in what seems to be a clean way.
1) Upon each parsing of postgresql.conf we store all assigned variables
somewhere


Parsing is relatively cheap, and it's not like we need high performance from 
this. So, -1 on permanent storage.


2) We display these assignments in a new pg_settings column named
system_reset_val

I would also extend this to include:
a) upon each parsing of postgresql.auto.conf we store all assigned variables
somewhere (maybe the same place as postgresql.conf and simply label the file
source)


You can not assume there are only postgresql.conf and postgresql.auto.conf. 
Complex environments will have multiple included files.


b) add an alter_system_val field to show that value (or null)
c) add a db_role_val to show the current value for the session via
pg_db_role_setting


You're forgetting that there are also per-role settings. And I'm with Robert; 
what's wrong with sourcefile and sourceline? Perhaps we just need to teach 
those about ALTER ROLE SET and ALTER DATABASE SET (if they don't already know 
about them).


c.1) add a db_role_id to show the named user that is being used for the
db_role_val lookup

The thinking for c.1 is that in situations with role hierarchies and SET
ROLE usage it would be nice to be able to query what the connection role -
the one used during variable lookup - is.


I'm losing track of exactly what we're trying to solve here, but...

If the goal is to figure out what settings would be in place for a specific 
user connecting to a specific database, then we should create a SRF that does 
just that (accepting a database name and role name). You could then do...

SELECT * FROM pg_show_all_settings( 'database', 'role' ) a;


I'm probably going overkill on this but there are not a lot of difference
sources nor do they change frequently so extending the pg_settings view to
be more of a one-stop-shopping for this information seems to make sense.


Speaking of overkill... one thing that you currently can't do is find out what 
#includes have been processed. Perhaps it's worth having a SRF that would 
return that info...


As it relates back to this thread the desired merging ends up being done
inside this view and at least gives the DBA a central location (well, along
with pg_db_role_setting) to go and look at the configuration landscape for
the system.


I think the goal is good, but the interface needs to be rethought.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread David Johnston

On Tue, Jan 20, 2015 at 1:24 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 1/16/15 10:32 PM, David G Johnston wrote:



 Two changes solve this problem in what seems to be a clean way.
 1) Upon each parsing of postgresql.conf we store all assigned variables
 somewhere


 Parsing is relatively cheap, and it's not like we need high performance
 from this. So, -1 on permanent storage.


OK





  2) We display these assignments in a new pg_settings column named
 system_reset_val

 I would also extend this to include:
 a) upon each parsing of postgresql.auto.conf we store all assigned
 variables
 somewhere (maybe the same place as postgresql.conf and simply label the
 file
 source)


 You can not assume there are only postgresql.conf and
 postgresql.auto.conf. Complex environments will have multiple included
 files.


#include files still appear to come from postgresql.conf; I'm not
proposing we try and memorize every single instance of a variable
declaration and provide a global overlaps query - though that piece
already seems to be in place...


  b) add an alter_system_val field to show that value (or null)
 c) add a db_role_val to show the current value for the session via
 pg_db_role_setting


 You're forgetting that there are also per-role settings. And I'm with
 Robert; what's wrong with sourcefile and sourceline? Perhaps we just need
 to teach those about ALTER ROLE SET and ALTER DATABASE SET (if they don't
 already know about them).


Actually, there are not separate PER ROLE and PER DATABASE settings even
though there are different SQL commands.  The catalog is simply
pg_db_role_setting with the use of all tags (i.e., 0) as necessary.
But I see where you are going and do not disagree that precedence
information could be useful.

sourceline and sourcefile pertain only to the current value while the
point of adding these other pieces is to provide a snapshot of all the
different mappings that the system knows about; instead of having to tell a
user to go look in two different files (and associated includes) and a
database catalog to find out what possible values are in place.  That
doesn't solve the need to scan the catalog to see other possible values -
though you could at least put a counter in pg_settings that indicates how
many pg_db_role_setting entries reference the given variable so that if
non-zero the user is clued into the fact that they need to check out said
catalog table.




  c.1) add a db_role_id to show the named user that is being used for the
 db_role_val lookup

 The thinking for c.1 is that in situations with role hierarchies and SET
 ROLE usage it would be nice to be able to query what the connection role -
 the one used during variable lookup - is.


 I'm losing track of exactly what we're trying to solve here, but...

 If the goal is to figure out what settings would be in place for a
 specific user connecting to a specific database, then we should create a
 SRF that does just that (accepting a database name and role name). You
 could then do...

 SELECT * FROM pg_show_all_settings( 'database', 'role' ) a;


This is fine - but I'm thinking about situations where a user immediately
SET ROLE on their session and typically operate as said user; if they try
doing an ALTER ROLE SET for this SET ROLE user it will not work because
their login user is what matters.  This is probably a solution looking for
a problem but it is a dynamic that one needs to be aware of.


  I'm probably going overkill on this but there are not a lot of difference
 sources nor do they change frequently so extending the pg_settings view to
 be more of a one-stop-shopping for this information seems to make sense.


 Speaking of overkill... one thing that you currently can't do is find out
 what #includes have been processed. Perhaps it's worth having a SRF that
 would return that info...

  As it relates back to this thread the desired merging ends up being done
 inside this view and at least gives the DBA a central location (well,
 along
 with pg_db_role_setting) to go and look at the configuration landscape for
 the system.


 I think the goal is good, but the interface needs to be rethought.
 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com


My main concern with the UI would be too-much-information; but otherwise
if we accept the premise that we should include as much as possible and
then let the user (or us) provide more useful subsets based upon common
use-cases the main issue is making sure we've identified all of the
relevant information that needs to be captured - even if it is something as
minor as the whole logon vs. current role dynamic.  I'm ignoring the
cost/benefit aspect of implementation for the moment largely because I do
known enough about the backend to make reasonable comments.  But much of
the data is already present in various views and catalogs - I just think
having one-stop-shop would be useful and go a long

Re: [HACKERS] logical column ordering

2015-01-20 Thread Alvaro Herrera

I've decided to abandon this patch.  I have spent too much time looking
at it now.

If anyone is interested in trying to study, I can provide the patches I
came up with, explanations, and references to prior discussion -- feel
free to ask.

My main motivation for this work is to enable a later patch for column
stores.  Right now, since columns have monotonically increasing attnums,
it's rather difficult to have columns that are stored elsewhere.  My
plan for that now is much more modest, something like adding a constant
1 to attnums and that would let us identify columns that are outside
the heap -- or something like that.  I haven't fully worked it out yet.


Just a few quick notes about this patch: last thing I was doing was mess
with setrefs.c so that Var nodes carry varphysnum annotations, which
are set to 0 during initial planner phases, and are turned into the
correct attphysnum (the value extracted from catalogs) so that
TupleDescs constructed from targetlists by ExecTypeFromTL and friends
can have the correct attphysnum too.  I think this part works correctly,
with the horrible exception that I had to do a relation_open() in
setrefs.c to get hold of the right attphysnum from a tupledesc obtained
from catalogs.  That's not acceptable at all; I think the right way to
do this would be to store a list of numbers earlier (not sure when) and
store it in RelOptInfo or RangeTableEntry; that would be accesible
during setrefs.c.

The other bit I did was modify all the heaptuple.c code so that it could
deal correctly with tupledescs that have attphysnums and attlognum in an
order different from stock attnum.  That took some time to get right,
but I think it's also correct now.

One issue I had was finding places that use attnum as an index into
the tupledesc attrs array.  I had to examine all these places and
change them to use a physattrs array, which is an array that has been
sorted by physical number.  I don't think all the changes are correct,
and I'm not sure that I caught them all.


Anyway it seems to me that this is mostly there.  If somebody is
interested in learning executor code, this project would be damn cool to
get done.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

Hi,

On 20.1.2015 21:13, Jeff Davis wrote:
 On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 Tom (tgl),
 Is my reasoning above acceptable?

 Uh, sorry, I've not been paying any attention to this thread for awhile.
 What's the remaining questions at issue?
 
 This patch is trying to improve the array_agg case where there are 
 many arrays being constructed simultaneously, such as in HashAgg.
 You strongly suggested that a commitable patch would differentiate
 between the grouped and ungrouped cases (or perhaps you meant
 differentiate between HashAgg and sorted aggregation?). Tomas's
 current patch does not do so; it does two main things:

I don't think that's entirely true. The problem with the initial
(experimental) patch was that while it fixed aggregate queries, it
mostly ignored all the other callers, and either resulted in memory
corruption (unexpected pfree) or bloat (when not doint the pfree).

Tom's message where he points that out is here:
http://www.postgresql.org/message-id/20707.1396372...@sss.pgh.pa.us

The current patch does that distinction properly (IMHO), because it does
this distinction - all the callers using the underlying array functions
will use the original approach (with subcontexts), and only the
array_agg uses the new API and forces subcontext=false.

1. Uses a common memory context for arrays being constructed by
 array_agg in any context (ungrouped, sorted agg, and HashAgg)
2. Reduces the initial array allocation to 8 elements from 64, but
 preserves doubling behavior

Yes, that's true. I'd like to point out that while the current code uses
64 items, it also uses 8kB per-grop contexts. That's slightly overkill I
guess ...

 I don't see either as a big problem, but perhaps there are some 
 downsides in some cases. I think a worst case would be a slowdown in 
 the sorted agg case where every group has 64 elements, so I'll try
 to see if that's a real problem or not. If you saw a bigger problem, 
 please let me know; and if not, I'll proceed with the review.

FWIW I've done a fair amount of measurements and not noticed any
measurable difference (unless using a rather crazy testcase, IIRC).
Certainly the issues with excessive memory consumption (and swapping)
were much worse.

 There are also some other concerns I have about the API ugliness,
 which I believe is the reason there's so much discussion about making
 the comments better. The reason the API is ugly is for backwards
 compatibility, so there's no perfect solution. Your opinion is welcome
 here too, but I mainly need to see if your objection above has been
 addressed.

I generally agree that having two API 'facets' with different behavior
is slightly awkward and assymetric, but I wouldn't call that ugly. I
actually modified both APIs initially, but I think Ali is right that not
breaking the existing API (and keeping the original behavior in that
case) is better. We can break it any time we want in the future, but
it's impossible to unbreak it ;-)

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 3:34 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote:
 Dear me.  Peter, can you fix this RSN?

 Investigating.

It's certainly possible to fix Andrew's test case with the attached.
I'm not sure that that's the appropriate fix, though: there is
probably a case to be made for not bothering with abbreviation once
we've read tuples in for the final merge run. More likely, the
strongest case is for storing the abbreviated keys on disk too, and
reading those back.

-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6d3aa88..adf4c4d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3148,6 +3148,7 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	MinimalTuple tuple = (MinimalTuple) palloc(tuplen);
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
 	HeapTupleData htup;
+	Datum			original;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 	/* read in the tuple proper */
@@ -3161,10 +3162,29 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	/* set up first-column key value */
 	htup.t_len = tuple-t_len + MINIMAL_TUPLE_OFFSET;
 	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
-	stup-datum1 = heap_getattr(htup,
-state-sortKeys[0].ssup_attno,
-state-tupDesc,
-stup-isnull1);
+	/* Once again, store abbreviated key representation */
+	original = heap_getattr(htup,
+			state-sortKeys[0].ssup_attno,
+			state-tupDesc,
+			stup-isnull1);
+
+	if (!state-sortKeys-abbrev_converter || stup-isnull1)
+	{
+		/*
+		 * Store ordinary Datum representation, or NULL value.  If there is a
+		 * converter it won't expect NULL values, and cost model is not
+		 * required to account for NULL, so in that case we avoid calling
+		 * converter and just set datum1 to void representation (to be
+		 * consistent).
+		 */
+		stup-datum1 = original;
+	}
+	else
+	{
+		/* Store abbreviated key representation */
+		stup-datum1 = state-sortKeys-abbrev_converter(original,
+		 state-sortKeys);
+	}
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

On 21.1.2015 00:38, Michael Paquier wrote:
 On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra

 I've tried to reproduce this on my Raspberry PI 'machine' and it's not
 very difficult to trigger this. About 7 out of 10 'make check' runs fail
 because of 'pgstat wait timeout'.

 All the occurences I've seen were right after some sort of VACUUM
 (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
 looked something like this:

 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
 mmcblk0   0.0075.000.008.00 0.0036.00
 9.00 5.73 15633.750.00 15633.75 125.00 100.00

 So pretty terrible (this is a Class 4 SD card, supposedly able to
 handle 4 MB/s). If hamster had faulty SD card, it might have been
 much worse, I guess.

 By experience, a class 10 is at least necessary, with a minimum
 amount of memory to minimize the apparition of those warnings,
 hamster having now a 8GB class 10 card.

Well, my goal was exactly to produce those warnings ;-) and see if I can
identify some strange cases. That's why I chose just class 4. But even
then it produces rather low number of those warnings (one or two per
check run), and mostly at the expected places with significant I/O
overload. So I'm not any wiser :-(

regards
Tomas




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 6:27 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Robert == Robert Haas robertmh...@gmail.com writes:
  Robert All right, it seems Tom is with you on that point, so after
  Robert some study, I've committed this with very minor modifications.

 While hacking up a patch to demonstrate the simplicity of extending this
 to the Datum sorter, I seem to have run into a fairly major issue with
 this: there seems to be no attempt whatsoever to handle spilling to disk
 correctly. The data spilled to disk only has the un-abbreviated values,
 but nothing tries to re-abbreviate it (or disable abbreviations) when it
 is read back in, and chaos ensues:

Dear me.  Peter, can you fix this RSN?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 The comment in tuplesort_begin_datum that abbreviation can't be used
 seems wrong to me; why is the copy of the original value pointed to by
 stup-tuple (in the case of by-reference types, and abbreviation is
 obviously not needed for by-value types) not sufficient?

We haven't formalized the idea that pass-by-value types are not
targets for abbreviation (it's just that the practical application of
abbreviated keys is likely to be limited to pass-by-reference types,
generating a compact pass-by-value abbreviated representation). That
could be a useful restriction to formalize, and certainly seems likely
to be a harmless one, but for now that's the way it is.

It might be sufficient for some tuplesort_begin_datum() callers. Datum
tuple sorts require the original values. Aside from the formalization
of abbreviation only applying to pass-by-value types, you'd have to
teach tuplesort_getdatum() to reconstruct the non-abbreviated
representation transparently from each SortTuple's tuple proper.
However, the actual tuplesort_getdatum() calls could be the dominant
cost, not the sort  (I'm not sure of that offhand - that's total
speculation).

Basically, the intersection of the datum sort case with abbreviated
keys seems complicated. I tended to think that the solution was to
force a heaptuple sort instead (where abbreviation naturally can be
used), since clearly that could work in some important cases like
nodeAgg.c, iff the gumption to do it that way was readily available.
Rightly or wrongly, I preferred that idea to the idea of teaching the
Datum case to handle abbreviation across the board. Maybe that's the
wrong way of fixing that, but for now I don't think it's acceptable
that abbreviation isn't always used in certain cases where it could
make sense (e.g. not for simple GroupAggregates with a single
attribute -- only multiple attribute GroupAggregates). After all, most
sort cases (e.g. B-Tree builds) didn't use SortSupport for several
years, simply because no one got around to it until I finally did a
few months back.

Note that most tuplesort non-users of abbreviation don't use
abbreviation for sensible reasons. For example, abbreviation simply
doesn't make sense for Top-N heap sorts, or MJExamineQuals(). The
non-single-attribute GroupAggregate/nodeAgg.c case seems bad, but I
don't have a good sense of how bad things are with orderedsetaggs.c
non-use is...it might matter less than the other case.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Andrew Dunstan



On 01/20/2015 01:26 PM, Arne Scheffer wrote:

Interesting patch.
I did a quick review looking only into the patch file.

The sum of variances variable contains
the sum of squared differences instead, I think.


Umm, no. It's not.

   e-counters.sum_var_time +=
   (total_time - old_mean) * (total_time - e-counters.mean_time);

This is not a square that's being added. old_mean is not the same as 
e-counters.mean_time.


Since the variance is this value divided by (n - 1), AIUI, I think sum 
of variances isn't a bad description. I'm open to alternative suggestions.





And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?



I don't understand. I'm following pretty exactly the calculations stated 
at http://www.johndcook.com/blog/standard_deviation/



I'm not a statistician. Perhaps others who are more literate in 
statistics can comment on this paragraph.





And I'm sure I'm missing C specifics (again)
(or it's the reduced patch file scope),
but you introduce sqrtd, but sqrt is called?



Good catch. Will fix.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Andrew Gierth

 Robert == Robert Haas robertmh...@gmail.com writes:

 Robert All right, it seems Tom is with you on that point, so after
 Robert some study, I've committed this with very minor modifications.

While hacking up a patch to demonstrate the simplicity of extending this
to the Datum sorter, I seem to have run into a fairly major issue with
this: there seems to be no attempt whatsoever to handle spilling to disk
correctly. The data spilled to disk only has the un-abbreviated values,
but nothing tries to re-abbreviate it (or disable abbreviations) when it
is read back in, and chaos ensues:

set work_mem = 64;
select v, v  lag(v) over (order by v)
  from (select 'B'||i as v from generate_series(1,1) i
union all select 'a'||i from generate_series(1,1) i offset
  0) s
  order by v limit 20;

   v| ?column? 
+--
 a1 | 
 B1 | f
 a1000  | t
 a1001  | t
 a1002  | t
 a1003  | t
 B1000  | f
 B1001  | t
 B1002  | t
 B1003  | t
 B1004  | t
 B1005  | t
 a1004  | t
 a1005  | t
 a1006  | t
 a1007  | t
 a1008  | t
 B1 | f
 B10| t
 B100   | t
(20 rows)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Michael Paquier

On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
tomas.von...@2ndquadrant.com wrote:
 On 25.12.2014 22:28, Tomas Vondra wrote:
 On 25.12.2014 21:14, Andres Freund wrote:

 That's indeed odd. Seems to have been lost when the statsfile was
 split into multiple files. Alvaro, Tomas?

 The goal was to keep the logic as close to the original as possible.
 IIRC there were pgstat wait timeout issues before, and in most cases
 the conclusion was that it's probably because of overloaded I/O.

 But maybe there actually was another bug, and it's entirely possible
 that the split introduced a new one, and that's what we're seeing now.
 The strange thing is that the split happened ~2 years ago, which is
 inconsistent with the sudden increase of this kind of issues. So maybe
 something changed on that particular animal (a failing SD card causing
 I/O stalls, perhaps)?

 Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
 and analyze the issue locally. But that won't happen until January.

 I've tried to reproduce this on my Raspberry PI 'machine' and it's not
 very difficult to trigger this. About 7 out of 10 'make check' runs fail
 because of 'pgstat wait timeout'.

 All the occurences I've seen were right after some sort of VACUUM
 (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
 looked something like this:

 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
 mmcblk0   0.0075.000.008.00 0.0036.00
 9.00 5.73 15633.750.00 15633.75 125.00 100.00

 So pretty terrible (this is a Class 4 SD card, supposedly able to handle
 4 MB/s). If hamster had faulty SD card, it might have been much worse, I
 guess.
By experience, a class 10 is at least necessary, with a minimum amount
of memory to minimize the apparition of those warnings, hamster having
now a 8GB class 10 card.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 2:00 PM, Peter Geoghegan p...@heroku.com wrote:
 Maybe that's the
 wrong way of fixing that, but for now I don't think it's acceptable
 that abbreviation isn't always used in certain cases where it could
 make sense (e.g. not for simple GroupAggregates with a single
 attribute -- only multiple attribute GroupAggregates). After all, most
 sort cases (e.g. B-Tree builds) didn't use SortSupport for several
 years, simply because no one got around to it until I finally did a
 few months back.


Exuse me. I mean that this *is* an acceptable restriction for the time being.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote:
 Dear me.  Peter, can you fix this RSN?

Investigating.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 3:33 PM, Robert Haas robertmh...@gmail.com wrote:
 Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's
 failing make check.  Ditto hamerkop (Windows 2k8/VC++) and currawong
 (Windows XP Pro/MSVC++).  jacana (Windows 8/gcc) and brolga (Windows
 XP Pro/cygwin) are unhappy too, although the failures are showing up
 in different build stages rather than in 'make check'.  narwhal
 (Windows 2k3/mingw) and frogmouth (Windows XP Pro/gcc) are happy,
 though, so it's not affecting ALL of the Windows critters.  Still, I'm
 leaning toward the view that we should disable this optimization
 across-the-board on Windows until somebody has time to do the legwork
 to figure out what it takes to make it work, and what makes it work on
 some of these critters and fail on others.  We can't leave the
 buildfarm red for long periods of time.

Fair enough.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Andrew Dunstan



On 01/20/2015 06:32 PM, David G Johnston wrote:

Andrew Dunstan wrote

On 01/20/2015 01:26 PM, Arne Scheffer wrote:

And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?


I don't understand. I'm following pretty exactly the calculations stated
at lt;http://www.johndcook.com/blog/standard_deviation/gt;


I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening.  In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance.  In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether future queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.  Note point 3 in the
linked Wikipedia article.





Thanks. Still not quite sure what to do, though :-) I guess in the end 
we want the answer to come up with similar results to the builtin stddev 
SQL function. I'll try to set up a test program, to see if we do.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote:
 It's certainly possible to fix Andrew's test case with the attached.
 I'm not sure that that's the appropriate fix, though: there is
 probably a case to be made for not bothering with abbreviation once
 we've read tuples in for the final merge run. More likely, the
 strongest case is for storing the abbreviated keys on disk too, and
 reading those back.

Maybe not, though: An extra 8 bytes per tuple on disk is not free.
OTOH, if we're I/O bound on the final merge, as we ought to be, then
recomputing the abbreviated keys could make sense, since there may
well be an idle CPU core anyway.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Magnus Hagander

On Tue, Jan 20, 2015 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Michael Paquier michael.paqu...@gmail.com writes:
  On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  I think you misunderstood me ;). I was talking about the old CF
  application providing a RSS feed of all changes to all entries.
  https://commitfest-old.postgresql.org/action/commitfest_activity.rss

  Oh, I didn't know this one. That's indeed useful.

 Personally, I never used the RSS feed as such, but I did often consult the
 activity log pages, which I think are equivalent:

 https://commitfest-old.postgresql.org/action/commitfest_activity?id=25

 That feature seems to be gone too :-(


Activity log is back, clickable from the top right corner when viewing the
frontpage or a commitfest.

RSS feed is there as well.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 10:54 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote:
 I think that the attached patch should at least fix that much. Maybe
 the problem on the other animal is also explained by the lack of this,
 since there could also be a MinGW-ish strxfrm_l(), I suppose.

 Committed that, rather blindly, since it looks like a reasonable fix.

Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's
failing make check.  Ditto hamerkop (Windows 2k8/VC++) and currawong
(Windows XP Pro/MSVC++).  jacana (Windows 8/gcc) and brolga (Windows
XP Pro/cygwin) are unhappy too, although the failures are showing up
in different build stages rather than in 'make check'.  narwhal
(Windows 2k3/mingw) and frogmouth (Windows XP Pro/gcc) are happy,
though, so it's not affecting ALL of the Windows critters.  Still, I'm
leaning toward the view that we should disable this optimization
across-the-board on Windows until somebody has time to do the legwork
to figure out what it takes to make it work, and what makes it work on
some of these critters and fail on others.  We can't leave the
buildfarm red for long periods of time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread David G Johnston

Andrew Dunstan wrote
On 01/20/2015 01:26 PM, Arne Scheffer wrote:

And a very minor aspect:
The term standard deviation in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix sample
to indicate this beiing the estimator?

I don't understand. I'm following pretty exactly the calculations stated
at lt;http://www.johndcook.com/blog/standard_deviation/gt;

I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening. In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance. In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary. I guess it
boils down to whether future queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance. Note point 3 in the
linked Wikipedia article.

David J.

--
View this message in context:
http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
 I was assuming we were going to fix this by undoing the abbreviation
 (as in the abort case) when we spill to disk, and not bothering with
 it thereafter.

The spill-to-disk case is at least as compelling at the internal sort
case. The overhead of comparisons is much higher for tapesort.

Attached patch serializes keys. On reflection, I'm inclined to go with
this approach. Even if the CPU overhead of reconstructing strxfrm()
blobs is acceptable for text, it might be much more expensive for
other types. I'm loathe to throw away those abbreviated keys
unnecessarily.

We don't have to worry about having aborted abbreviation, since once
we spill to disk we've effectively committed to abbreviation. This
patch formalizes the idea that there is strictly a pass-by-value
representation required for such cases (but not that the original
Datums must be of a pass-by-reference, which is another thing
entirely). I've tested it some, obviously with Andrew's testcase and
the regression tests, but also with my B-Tree verification tool.
Please review it.

Sorry about this.
-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6d3aa88..a442617 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -413,10 +413,13 @@ struct Tuplesortstate
  *
  * If state-randomAccess is true, then the stored representation of the
  * tuple must be followed by another unsigned int that is a copy of the
- * length --- so the total tape space used is actually sizeof(unsigned int)
- * more than the stored length value.  This allows read-backwards.  When
- * randomAccess is not true, the write/read routines may omit the extra
- * length word.
+ * length (less any abbreviated key that is stored, which is also possible) ---
+ * so the total tape space used is actually sizeof(unsigned int) more than the
+ * stored length value, as well as another sizeof(Datum) overhead for storing
+ * abbreviated keys.  This allows read-backwards, and avoids the need to
+ * re-compute abbreviated keys.  When randomAccess is not true, the write/read
+ * routines may omit the extra length word.  Also, when abbreviation is not in
+ * play, abbreviated keys are not stored.
  *
  * writetup is expected to write both length words as well as the tuple
  * data.  When readtup is called, the tape is positioned just after the
@@ -3124,11 +3127,15 @@ writetup_heap(Tuplesortstate *state, int tapenum, SortTuple *stup)
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
 	unsigned int tupbodylen = tuple-t_len - MINIMAL_TUPLE_DATA_OFFSET;
 
-	/* total on-disk footprint: */
+	/* total on-disk footprint for tuple: */
 	unsigned int tuplen = tupbodylen + sizeof(int);
 
 	LogicalTapeWrite(state-tapeset, tapenum,
 	 (void *) tuplen, sizeof(tuplen));
+	/* Store abbreviated key, if any (not accounted for by tuplen) */
+	if (state-sortKeys-abbrev_converter)
+		LogicalTapeWrite(state-tapeset, tapenum,
+		 (void *) stup-datum1, sizeof(Datum));
 	LogicalTapeWrite(state-tapeset, tapenum,
 	 (void *) tupbody, tupbodylen);
 	if (state-randomAccess)	/* need trailing length word? */
@@ -3150,6 +3157,10 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	HeapTupleData htup;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
+	/* read in the tuple abbreviated key, if any */
+	if (state-sortKeys-abbrev_converter)
+		LogicalTapeReadExact(state-tapeset, tapenum, (void *) stup-datum1,
+			 sizeof(Datum));
 	/* read in the tuple proper */
 	tuple-t_len = tuplen;
 	LogicalTapeReadExact(state-tapeset, tapenum,
@@ -3161,10 +3172,12 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	/* set up first-column key value */
 	htup.t_len = tuple-t_len + MINIMAL_TUPLE_OFFSET;
 	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
-	stup-datum1 = heap_getattr(htup,
-state-sortKeys[0].ssup_attno,
-state-tupDesc,
-stup-isnull1);
+	/* set up first-column key value (for non-abbreviated case) */
+	if (!state-sortKeys-abbrev_converter)
+		stup-datum1 = heap_getattr(htup,
+	state-sortKeys[0].ssup_attno,
+	state-tupDesc,
+	stup-isnull1);
 }
 
 /*
@@ -3355,12 +3368,20 @@ writetup_cluster(Tuplesortstate *state, int tapenum, SortTuple *stup)
 {
 	HeapTuple	tuple = (HeapTuple) stup-tuple;
 	unsigned int tuplen = tuple-t_len + sizeof(ItemPointerData) + sizeof(int);
+	AttrNumber	leading = state-indexInfo-ii_KeyAttrNumbers[0];
 
-	/* We need to store t_self, but not other fields of HeapTupleData */
+	/*
+	 * We need to store t_self, but not other fields of HeapTupleData (although
+	 * possibly abbreviated key value)
+	 */
 	LogicalTapeWrite(state-tapeset, tapenum,
 	 tuplen, sizeof(tuplen));
 	LogicalTapeWrite(state-tapeset, tapenum,
 	 tuple-t_self, sizeof(ItemPointerData));
+	/* Store abbreviated key, if any (not accounted for by tuplen) */
+	if

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen a...@2ndquadrant.com 
 wrote:
  So when I'm trying to decide what to audit, I have to:
 
  (a) check if the current user is mentioned in .roles; if so, audit.
 
  (b) check if the current user is a descendant of one of the roles
  mentioned in .roles; if not, no audit.
 
  (c) check for permissions granted to the root role and see if that
  tells us to audit.
 
  Is that right? If so, it would work fine. I don't look forward to trying
  to explain it to people, but yes, it would work (for anything you could
  grant permissions for).
 
 I think this points to fundamental weakness in the idea of doing this
 through the GRANT system.  Some people are going want to audit
 everything a particular user does, some people are going to want to
 audit all access to particular objects, and some people will have more
 complicated requirements.  Some people will want to audit even
 super-users, others especially super-users, others only non
 super-users.  None of this necessarily matches up to the usual
 permissions framework.

I'm hopeful that my email from a few minutes ago provides a way to cover
all of the above, while still making it easy for users who just want to
say audit everything this user does or audit everything which touches
this column.

Any auditing of superusers is going to be circumventable by those same
superusers if it's happening in the context of the PG process, so I'm
not sure why they would be any different under this scheme vs. some
other scheme.

Don't get me wrong- I agree completely that using the GRANT system isn't
ideal, but I don't see anyone proposing another way for an extension to
store metadata on catalog tables with the same granularity the GRANT
system provides and its dependency handling and ability to have default
values.  We've been over that ground a number of times and I certainly
don't feel like we're any closer to solving it and I'd hate to see that
block pgaudit.

Put another way, if the requirement for pgaudit is that it has to solve
the metadata-on-catalogs-for-extensions problem, I think I'd rather just
move pgaudit into core instead, give it catalog tables, make it part of
the dependency system, provide syntax for it, etc.  I'm pretty sure
that'd be easier and happen a lot faster.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 9:33 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote:
 I don't want to change the on-disk format for tapes without a lot more
 discussion.  Can you come up with a fix that avoids that for now?

 A more conservative approach would be to perform conversion on-the-fly
 once more. That wouldn't be patently unacceptable from a performance
 perspective, and would also not change the on-disk format for tapes.

 Thoughts?

That might be OK.  Probably needs a bit of performance testing to see
how it looks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

On Tue, Jan 20, 2015 at 5:03 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 +1. In particular I'm very concerned with the idea of doing this via roles,
 because that would make it trivial for any superuser to disable auditing.
 The only good option I could see to provide this kind of flexibility would
 be allowing the user to provide a function that accepts role, object, etc
 and make return a boolean. The performance of that would presumably suck
 with anything but a C function, but we could provide some C functions to
 handle simple cases.

 That said, I think the best idea at this stage is either log everything or
 nothing. We can always expand upon that later.

I think this is throwing the baby out with the bathwater.  Neither C
functions nor all-or-nothing are going to be of any practical use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
 I was assuming we were going to fix this by undoing the abbreviation
 (as in the abort case) when we spill to disk, and not bothering with
 it thereafter.

 The spill-to-disk case is at least as compelling at the internal sort
 case. The overhead of comparisons is much higher for tapesort.

First, we need to unbreak this.  Then, we can look at optimizing it.
The latter task will require performance testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

On Tue, Jan 20, 2015 at 4:07 PM, David Johnston
david.g.johns...@gmail.com wrote:
 sourceline and sourcefile pertain only to the current value while the point
 of adding these other pieces is to provide a snapshot of all the different
 mappings that the system knows about; instead of having to tell a user to go
 look in two different files (and associated includes) and a database catalog
 to find out what possible values are in place.  That doesn't solve the need
 to scan the catalog to see other possible values - though you could at least
 put a counter in pg_settings that indicates how many pg_db_role_setting
 entries reference the given variable so that if non-zero the user is clued
 into the fact that they need to check out said catalog table.

This last proposal seems pointless to me.  If the user knows about
pg_db_role_setting, they will know to check it; if they don't, a
counter won't fix that.  I can see that there might be some utility to
a query that would tell you, for a given setting, all sources of that
setting the system knew about, whether in configuration files,
pg_db_role_setting, or the current session.  But I don't see that
putting information that's already available via one system catalog
query into a different system catalog query helps anything - we should
presume DBAs know how to write SQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH CHECK and Column-Level Privileges

* Noah Misch (n...@leadboat.com) wrote:
 On Mon, Jan 19, 2015 at 11:05:22AM -0500, Stephen Frost wrote:
  One remaining question is about single-column key violations.  Should we
  special-case those and allow them to be shown or no?  I can't see a
  reason not to currently but I wonder if we might have cause to act
  differently in the future (not that I can think of a reason we'd ever
  need to).
 
 Keep them hidden.  Attempting to delete a PK row should not reveal
 otherwise-inaccessible values involved in any constraint violation.  It's
 tempting to make an exception for single-column UNIQUE constraints, but some
 of the ensuing data disclosures are questionable.  What if the violation arose
 from a column default expression or from index corruption?

Interesting point.  I've kept them hidden in this latest version.

 On Mon, Jan 19, 2015 at 11:46:35AM -0500, Stephen Frost wrote:
  Right, ExecBuildSlotValueDescription() was made to be consistent with
  BuildIndexValueDescription.  The reason for BuildIndexValueDescription
  returning an empty string is different from why you hypothosize though-
  it's exported and I was a bit worried that existing external callers
  might not be prepared for it to return a NULL instead of a string of
  some kind.  If this was a green field instead of a back-patch fix, I'd
  certainly return NULL instead.
  
  If others feel that's not a good reason to avoid returning NULL, I can
  certainly change it around.
 
 I won't lose sleep if it does return  for that reason, but I'm relatively
 unconcerned about extension API compatibility here.  That function is called
 from datatype-independent index uniqueness checks.  This mailing list has
 discussed the difficulties of implementing an index access method in an
 extension, and no such extension has come forward.

Alright, I've reworked this to have those functions return NULL instead.

 Your latest patch has whitespace errors; visit git diff --check.

Yeah, I had done that but then made a few additional tweaks and didn't
recheck, sorry about that.  I'm playing around w/ my git workflow a bit
and hopefully it won't happen again. :)

Updated patch attached.

Thanks!

Stephen
From d9f0e186a74e4d11e9c7f3181dbf98bae3224111 Mon Sep 17 00:00:00 2001
From: Stephen Frost sfr...@snowman.net
Date: Mon, 12 Jan 2015 17:04:11 -0500
Subject: [PATCH] Fix column-privilege leak in error-message paths

While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.

Instead, include only those columns which the user is providing or which
the user has select rights on.  If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription.  Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.

Back-patch all the way, as column-level privileges are now in all
supported versions.

This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
---
 src/backend/access/index/genam.c |  59 ++-
 src/backend/access/nbtree/nbtinsert.c|  12 ++-
 src/backend/commands/copy.c  |   6 +-
 src/backend/commands/matview.c   |   7 ++
 src/backend/executor/execMain.c  | 171 ---
 src/backend/executor/execUtils.c |  12 ++-
 src/backend/utils/adt/ri_triggers.c  |  78 ++
 src/backend/utils/sort/tuplesort.c   |   9 +-
 src/test/regress/expected/privileges.out |  31 ++
 src/test/regress/sql/privileges.sql  |  24 +
 10 files changed, 338 insertions(+), 71 deletions(-)

diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 850008b..69cbbd5 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -25,10 +25,12 @@
 #include lib/stringinfo.h
 #include miscadmin.h
 #include storage/bufmgr.h
+#include utils/acl.h
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/rel.h
 #include utils/snapmgr.h
+#include utils/syscache.h
 #include utils/tqual.h
 
 
@@ -154,6 +156,9 @@ IndexScanEnd(IndexScanDesc scan)
  * form (key_name, ...)=(key_value, ...).  This is currently used
  * for building unique-constraint and exclusion-constraint error messages.
  *
+ * Note that if the user does not have permissions to view the columns
+ * involved, a NULL is returned.
+ *
  * The passed-in values/nulls arrays are the raw input to the index AM,
  * e.g. results of FormIndexDatum --- this is not necessarily what is stored
  * in

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote:
 I don't want to change the on-disk format for tapes without a lot more
 discussion.  Can you come up with a fix that avoids that for now?

A more conservative approach would be to perform conversion on-the-fly
once more. That wouldn't be patently unacceptable from a performance
perspective, and would also not change the on-disk format for tapes.

Thoughts?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

Abhijit,

* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
 At 2015-01-19 08:26:59 -0500, sfr...@snowman.net wrote:
  I'm confused by this statement..
 
 Let me see if I've understood your clarification:

Thanks much for the example use-case and for working this through with
me.  I actually think I've come up with a further specification which
might allow us to make this extremely flexible, but also simple for
those who want to keep it simple.

Consider this:

Everything is single-level to the roles mentioned in the GUC.

Is the logged in role a member of one of the GUC roles?
  Yes - Audit

Now to cover the user X for table Y case:

Did any of the GUC value roles grant SELECT rights for this table to the
current role?
  Yes - Audit SELECT on the table by the current role

Did any of the GUC value roles grant INSERT rights for this table to the
current role?
  Yes - Audit INSERT on the table by the current role

etc.

For the 'log all access to an object' case, under this scheme, I'm
afraid we'd need some special role to GRANT the access to.  We wouldn't
want that to simply be 'public' since then it might actually be
granted access rights that we don't want to.  We can't simply use the
same role because you need to grant that role whatever access 'with
grant option' in order for it to be able to re-grant the privilege.

With the special role, it becomes:

Does the special role have SELECT rights on the table?
  Yes - Audit SELECTs on the table

Does the special role have INSERT rights on the table?
  Yes - Audit INSERTs on the table

 Suppose you have pgaudit.roles set to 'r0, r1', and that you have
 granted r0 to u0.

Not quite- I wasn't thinking you'd grant r0 to u0 but rather the other
way around- u0 is granted to r0.  If you granted r0 to u0, then u0 would
have all of r0's rights which could be quite a bit larger than you want
u0 to have.  It only works in the other direction.

 Now, if you're connected to the database as r0 or r1, then everything
 you do is logged. But if you're connected as u0, then only those things
 are logged that can be derived (in a manner discussed later) from the
 permissions explicitly granted to r0 (but not u0)?

 So when I'm trying to decide what to audit, I have to:
 
 (a) check if the current user is mentioned in .roles; if so, audit.
 
 (b) check if the current user is a descendant of one of the roles
 mentioned in .roles; if not, no audit.
 
 (c) check for permissions granted to the root role and see if that
 tells us to audit.
 
 Is that right? If so, it would work fine. I don't look forward to trying
 to explain it to people, but yes, it would work (for anything you could
 grant permissions for).

This is pretty close- (a) and (b) are mostly correct, though I would
strongly discourage users from actually logging in as an audit role.
The one caveat with (b) is that 'if not, no audit' is not correct- all
cases are essentially OR'd together when it comes to auditing.  Roles
can be audited even if they are not descendants of the roles mentioned
in .roles.

Review the opening of this email though and consider that we could look
at what privileges has the audit role granted to the current role? as
defining what is to be audited.

  You can't say that you want r1 to have X actions logged, with r2
  having Y actions logged.
 
 Right. But how do you do that for non-GRANT-able actions in your scheme?
 For example, what if I want to see all the tables created and dropped by
 a particular user?

I hadn't been intending to address that with this scheme, but I think we
have that by looking for privilege grants where the audit role is the
grantee and the role-to-be-audited the grantor.

  Have you considered splitting pgaudit.c into multiple files, perhaps
  along the pre/post deparse lines?
 
 If such a split were done properly, then could the backwards-compatible
 version be more acceptable for inclusion in contrib in 9.5? If so, I'll
 look into it.

As Robert says, the short answer is 'no'- but it might make it easier to
get the 9.5 bits into 9.5.. :)

  The key part above is any.  We're using the grant system as a proxy
  for saying I want to audit column X.  That's different from I want
  to audit commands which would be allowed if I *only* had access to
  column X.  If I want to audit access to column X, then:
  
  select A, B, X from mytable;
  
  Should be audited, even if the audit role doesn't have access to
  columns A or B.
 
 Yes, that's what the code already does (modulo handling of the audit
 role's oid, which I tweaked to match the behaviour described earlier
 in this mail). I did the following:
 
 create table x(a int,b int,c int);
 insert into x(a,b,c) values (1,2,3);
 
 create user y;
 grant select on x to y;
 
 create role x;
 grant select (a) on table x to x;
 grant x to y;
 
 Then, as user y, I did «select a,b,c from x» and «select b,c from x».
 Only the former was logged:
 
 LOG:  AUDIT,2015-01-20

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 7:07 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote:
 It's certainly possible to fix Andrew's test case with the attached.
 I'm not sure that that's the appropriate fix, though: there is
 probably a case to be made for not bothering with abbreviation once
 we've read tuples in for the final merge run. More likely, the
 strongest case is for storing the abbreviated keys on disk too, and
 reading those back.

 Maybe not, though: An extra 8 bytes per tuple on disk is not free.
 OTOH, if we're I/O bound on the final merge, as we ought to be, then
 recomputing the abbreviated keys could make sense, since there may
 well be an idle CPU core anyway.

I was assuming we were going to fix this by undoing the abbreviation
(as in the abort case) when we spill to disk, and not bothering with
it thereafter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Arne Scheffer



Andrew Dunstan schrieb am 2015-01-20:

 On 01/20/2015 01:26 PM, Arne Scheffer wrote:
 Interesting patch.
 I did a quick review looking only into the patch file.

 The sum of variances variable contains
 the sum of squared differences instead, I think.

 Umm, no. It's not.

Umm, yes, i think, it is ;-)

   e-counters.sum_var_time +=
   (total_time - old_mean) * (total_time - e-counters.mean_time);
 This is not a square that's being added.

That's correct.
Nevertheless it's the difference between the computed sum of squared
differences
and the preceeding one, added in every step.

 old_mean is not the same as
 e-counters.mean_time.

 Since the variance is this value divided by (n - 1), AIUI, I think
 sum
 of variances isn't a bad description. I'm open to alternative
 suggestions.

 And a very minor aspect:
 The term standard deviation in your code stands for
 (corrected) sample standard deviation, I think,
 because you devide by n-1 instead of n to keep the
 estimator unbiased.
 How about mentioning the prefix sample
 to indicate this beiing the estimator?

 I don't understand. I'm following pretty exactly the calculations
 stated
 at http://www.johndcook.com/blog/standard_deviation/

(There is nothing bad about that calculations, Welford's algorithm
 is simply sequently adding the differences mentioned above.)

VlG-Arne


 I'm not a statistician. Perhaps others who are more literate in
 statistics can comment on this paragraph.

 And I'm sure I'm missing C specifics (again)
 (or it's the reduced patch file scope),
 but you introduce sqrtd, but sqrt is called?


 Good catch. Will fix.

 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 +1. In particular I'm very concerned with the idea of doing this via roles, 
 because that would make it trivial for any superuser to disable auditing. The 
 only good option I could see to provide this kind of flexibility would be 
 allowing the user to provide a function that accepts role, object, etc and 
 make return a boolean. The performance of that would presumably suck with 
 anything but a C function, but we could provide some C functions to handle 
 simple cases.

Superusers will be able to bypass, trivially, anything that's done in
the process space of PG.  The only possible exception to that being an
SELinux or similar solution, but I don't think that's what you were
getting at.

I certainly don't think having the user provide a C function to specify
what should be audited as making any sense- if they can do that, they
can use the same hooks pgaudit is using and skip the middle-man.  As for
the performance concern you raise, I actually don't buy into it at all.
It's not like we worry about the performance of checking permissions on
objects in general and, for my part, I like to think that's because it's
pretty darn quick already.

 That said, I think the best idea at this stage is either log everything or 
 nothing. We can always expand upon that later.

We've already got those options and they are, clearly, insufficient for
a large number of our users.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel Seq Scan

On Tue, Jan 20, 2015 at 10:59 PM, Thom Brown t...@linux.com wrote:

 On 20 January 2015 at 14:29, Amit Kapila amit.kapil...@gmail.com wrote:

 Note - I have yet to handle the new node types introduced at some
 of the places and need to verify prepared queries and some other
 things, however I think it will be good if I can get some feedback
 at current stage.


 I'm getting an issue:



 # set parallel_seqscan_degree = 10;
 SET
 Time: 0.219 ms

  ➤ psql://thom@[local]:5488/pgbench


  ➤ psql://thom@[local]:5488/pgbench

 # explain analyse select c1 from t1;


 So setting parallel_seqscan_degree above max_worker_processes causes the
CPU to max out, and the query never returns, or at least not after waiting
2 minutes.  Shouldn't it have a ceiling of max_worker_processes?


Yes, it should behave that way, but this is not handled in
patch as still we have to decide on what is the best execution
strategy (block-by-block or fixed chunks for different workers)
and based on that I can handle this scenario in patch.

I could return an error for such a scenario or do some work
to handle it seamlessly, but it seems to me that I have to
rework on the same if we select different approach for doing
execution than used in patch, so I am waiting for that to get
decided.  I am planing to work on getting the performance data for
both the approaches, so that we can decide which is better
way to go-ahead.

 The original test I performed where I was getting OOM errors now appears
to be fine:


Thanks for confirming the same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread David Johnston

On Tue, Jan 20, 2015 at 8:46 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jan 20, 2015 at 4:07 PM, David Johnston
 david.g.johns...@gmail.com wrote:
  sourceline and sourcefile pertain only to the current value while the
 point
  of adding these other pieces is to provide a snapshot of all the
 different
  mappings that the system knows about; instead of having to tell a user
 to go
  look in two different files (and associated includes) and a database
 catalog
  to find out what possible values are in place.  That doesn't solve the
 need
  to scan the catalog to see other possible values - though you could at
 least
  put a counter in pg_settings that indicates how many pg_db_role_setting
  entries reference the given variable so that if non-zero the user is
 clued
  into the fact that they need to check out said catalog table.

 This last proposal seems pointless to me.  If the user knows about
 pg_db_role_setting, they will know to check it; if they don't, a
 counter won't fix that.  I can see that there might be some utility to
 a query that would tell you, for a given setting, all sources of that
 setting the system knew about, whether in configuration files,
 pg_db_role_setting, or the current session.  But I don't see that
 putting information that's already available via one system catalog
 query into a different system catalog query helps anything - we should
 presume DBAs know how to write SQL.


While this is not exactly a high-traffic catalog/view why have them write
a likely 4-way join query (pg_db_role_setting is not the friendly in its
current form), or even two separate queries, when we can give them a solid
and comprehensive view standard.  I guess part of the mixup is that I am
basically talking about a view of multiple catalogs as a DBA UI as opposed
to really even caring what specific catalogs (existing or otherwise) or
functions are needed

to make the whole thing work.  Maybe it does all fit directly on
pg_settings but tacking on some read-only columns to this updateable
view/table doesn't come across as something that should be forbidden in
general.

Maybe I am imagining a use-case that just isn't there but if there are two
separate queries needed, and we call one consolidated, then having that
query indicate whether the other query has useful information, and it is
quite likely that it will not, avoids the user expending the effort to run
the wasteful secondary query.

David J.

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

On Tue, Jan 20, 2015 at 9:38 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jan 19, 2015 at 10:39 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  I think whichever process reads postgresql.conf/postgresql.auto.conf
have
  to do this (unless we restrict that this will be done at some other
time)
  and
  postmaster is one of them.  It seems to me that it is not good idea
unless
  we do it at other time like populating entries for a view.

 Well, the postmaster does not have database access, and neither do
 most of the auxiliary processes.  The autovacuum launcher has very
 limited database access (shared catalogs only).

 Making the postmaster access the database is a complete non-starter;


Okay and I was also not in favour of this approach.

  Right, but we can't completely eliminate such a possibility (as an
  example we have some default settings like max_connections,
  shared_buffers, etc).  I agree with you that users should use only
  way of changing the settings, however for certain rare cases (default
  settings or some other) we can provide a way for user to know which
  of the settings are duplicate.  I think if we can provide such an
  information via some view with ease then it's worth to have it.

 I'd suggest abandoning the idea of storing anything in a persistent
 basis in a system catalog and look for some way for the backend to
 which the user is connected to expose its own state instead.

Agreed.

  For
 example, pg_settings already exposes sourcefile and sourceline
 settings.  Actually, I'm not quite sure why that's not sufficient
 here, but if it isn't, it could be extended.


The reason why sourcefile and sourceline are not sufficient is that
they can only give the information about the setting in last file it is
present.  Assume max_connections (or any other setting) is available
in both postgresql.conf and postgresql.auto.conf, then it will display
the information about the setting in postgresql.auto.conf, so now user
might not be able to decide whether that is the setting he want to retain
unless he knows the information about setting in postgresql.conf.

Now as I have suggested upthread, that we can have a new view
pg_file_settings which will display information about settings even
when there exists multiple entries for the same in different files.

I think adding such information to existing view pg_settings would
be difficult as the same code is used for show commands which
we don't want to change.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL


On 1/20/15 2:20 PM, Robert Haas wrote:

On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sena...@2ndquadrant.com  wrote:

So when I'm trying to decide what to audit, I have to:

 (a) check if the current user is mentioned in .roles; if so, audit.

 (b) check if the current user is a descendant of one of the roles
 mentioned in .roles; if not, no audit.

 (c) check for permissions granted to the root role and see if that
 tells us to audit.

Is that right? If so, it would work fine. I don't look forward to trying
to explain it to people, but yes, it would work (for anything you could
grant permissions for).

I think this points to fundamental weakness in the idea of doing this
through the GRANT system.  Some people are going want to audit
everything a particular user does, some people are going to want to
audit all access to particular objects, and some people will have more
complicated requirements.  Some people will want to audit even
super-users, others especially super-users, others only non
super-users.  None of this necessarily matches up to the usual
permissions framework.


+1. In particular I'm very concerned with the idea of doing this via roles, 
because that would make it trivial for any superuser to disable auditing. The 
only good option I could see to provide this kind of flexibility would be 
allowing the user to provide a function that accepts role, object, etc and make 
return a boolean. The performance of that would presumably suck with anything 
but a C function, but we could provide some C functions to handle simple cases.

That said, I think the best idea at this stage is either log everything or 
nothing. We can always expand upon that later.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-20 21:32 GMT+01:00 Jim Nasby jim.na...@bluetreble.com:

 On 1/20/15 11:12 AM, Pavel Stehule wrote:

 I am sending updated version - it allow third optional argument that
 specify where searching should to start. With it is possible repeatably
 call this function.


 What happened to returning an array of offsets? I think that would be both
 easier to use than this version as well as performing better.


I have still thinking about this idea. It needs a different function and I
didn't start with this.

Implementation a optional start parameter to array_offset is cheap - and I
am thinking so it can be useful for some use cases.



 I see you dropped multi-dimension support, but I think that's fine.


It can be implemented later. There is no any barriers to later
implementation.


 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 6:39 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote:
 That might be OK.  Probably needs a bit of performance testing to see
 how it looks.

 Well, we're still only doing it when we do our final merge. So that's
 only doubling the number of conversions required, which if we're
 blocked on I/O might not matter at all.

You'll probably prefer the attached. This patch works by disabling
abbreviation, but only after writing out runs, with the final merge
left to go. That way, it doesn't matter when abbreviated keys are not
read back from disk (or regenerated).

I believe this bug was missed because it only occurs when there are
multiple runs, and not in the common case where there is one big
initial run that is found already sorted when we reach mergeruns().
-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6d3aa88..b6e302f 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2172,6 +2172,22 @@ mergeruns(Tuplesortstate *state)
 		return;
 	}
 
+	if (state-sortKeys-abbrev_converter)
+	{
+		/*
+		 * If there are multiple runs to be merged, when we go to read back
+		 * tuples from disk, abbreviated keys will not have been stored, and we
+		 * don't care to regenerate them.  Disable abbreviation from this point
+		 * on.
+		 */
+		state-sortKeys-abbrev_converter = NULL;
+		state-sortKeys-comparator = state-sortKeys-abbrev_full_comparator;
+
+		/* Not strictly necessary, but be tidy */
+		state-sortKeys-abbrev_abort = NULL;
+		state-sortKeys-abbrev_full_comparator = NULL;
+	}
+
 	/* End of step D2: rewind all output tapes to prepare for merging */
 	for (tapenum = 0; tapenum  state-tapeRange; tapenum++)
 		LogicalTapeRewind(state-tapeset, tapenum, false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-20 Thread Matt Kelly

I'm trying to compare v5 and v6 in my laptop right now.  Apparently my
laptop is quite a bit faster than your machine because the tests complete
in roughly 3.3 seconds.

I added more data and didn't see anything other than noise.  (Then again
the queries were dominated by the disk sort so I should retry with larger
work_mem).  I'll try it again when I have more time to play with it.  I
suspect the benefits would be more clear over a network.

Larger than default work_mem yes, but I think one of the prime use case for
the fdw is for more warehouse style situations (PostgresXL style use
cases).  In those cases, work_mem might reasonably be set to 1GB.  Then
even if you have 10KB rows you can fetch a million rows and still be using
less than work_mem.  A simpler change would be to vary it with respect to
work_mem.

Half baked idea: I know its the wrong time in the execution phase, but if
you are using remote estimates for cost there should also be a row width
estimate which I believe is based from pg_statistic and its mean column
width.

Its actually a pity that there is no way to set fetch sizes based on give
me as many tuples as will fit in less than x amount of memory.  Because
that is almost always exactly what you want.  Even when writing application
code, I've never actually wanted precisely 10,000 rows; I've always wanted
a reasonable size chunk that could fit into memory and then backed my way
into how many rows I wanted.  If we were to extend FETCH to support syntax
like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
estimate the value on the fly.

The async stuff, however, is a huge improvement over the last time I played
with the fdw.  The two active postgres processes were easily consuming a
core and half of CPU.  I think its not worth tying these two things
together.  Its probably worth it to make these two separate discussions and
separate patches.

- Matt Kelly

*Just sanity checking myself: Shutting down the server, applying the
different patch, 'make clean install' in postgres_fdw, and then restarting
the server should obviously be sufficient to make sure its running the new
code because that is all linked at runtime, right?

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Abhijit Menon-Sen

At 2015-01-20 20:36:39 -0500, robertmh...@gmail.com wrote:

 I think this is throwing the baby out with the bathwater.  Neither C
 functions nor all-or-nothing are going to be of any practical use.

Do you see some approach that has a realistic chance of making 9.5 and
would also actually be worth putting into 9.5? Suggestions very welcome.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
 I was assuming we were going to fix this by undoing the abbreviation
 (as in the abort case) when we spill to disk, and not bothering with
 it thereafter.

 The spill-to-disk case is at least as compelling at the internal sort
 case. The overhead of comparisons is much higher for tapesort.

 First, we need to unbreak this.  Then, we can look at optimizing it.
 The latter task will require performance testing.

I don't see that any alternative isn't a performance trade-off. My
patch accomplishes unbreaking this. I agree that it needs still needs
review from that perspective, but it doesn't seem any worse than any
other alternative. Would you prefer it if the spill-to-disk case
aborted in the style of low entropy keys? That doesn't seem
significantly safer than this, and it certainly not acceptable from a
performance perspective.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote:
 Would you prefer it if the spill-to-disk case
 aborted in the style of low entropy keys? That doesn't seem
 significantly safer than this, and it certainly not acceptable from a
 performance perspective.

BTW, I can write that patch if that's your preference. Should I?

I just don't favor that even as a short term correctness fix, because
it seems unacceptable to throw away all the strxfrm() work where
that's a very predictable and even likely outcome. I suggest reviewing
and committing my fix as a short term fix, that may well turn out to
be generally acceptable upon further consideration. Yes, we'll need to
make a point of reviewing an already committed patch, but there is a
precedent for that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
 I was assuming we were going to fix this by undoing the abbreviation
 (as in the abort case) when we spill to disk, and not bothering with
 it thereafter.

 The spill-to-disk case is at least as compelling at the internal sort
 case. The overhead of comparisons is much higher for tapesort.

 Attached patch serializes keys. On reflection, I'm inclined to go with
 this approach. Even if the CPU overhead of reconstructing strxfrm()
 blobs is acceptable for text, it might be much more expensive for
 other types. I'm loathe to throw away those abbreviated keys
 unnecessarily.

 We don't have to worry about having aborted abbreviation, since once
 we spill to disk we've effectively committed to abbreviation. This
 patch formalizes the idea that there is strictly a pass-by-value
 representation required for such cases (but not that the original
 Datums must be of a pass-by-reference, which is another thing
 entirely). I've tested it some, obviously with Andrew's testcase and
 the regression tests, but also with my B-Tree verification tool.
 Please review it.

 Sorry about this.

I don't want to change the on-disk format for tapes without a lot more
discussion.  Can you come up with a fix that avoids that for now?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote:
 That might be OK.  Probably needs a bit of performance testing to see
 how it looks.

Well, we're still only doing it when we do our final merge. So that's
only doubling the number of conversions required, which if we're
blocked on I/O might not matter at all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c

2015-01-20 Thread Michael Paquier

On Tue, Jan 20, 2015 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 Coverity is pointing out $subject, with the following stuff in 
 gbt_var_same():
 ...
 As Heikki pointed me out on IM, the lack of crash report in this area,
 as well as similar coding style in cube/ seem to be sufficient
 arguments to simply remove those NULL checks instead of doing more
 solid checks on them. Patch is attached.

 The way to form a convincing argument that these checks are unnecessary
 would be to verify that (1) the SQL-accessible functions directly calling
 gbt_var_same() are all marked STRICT, and (2) the core GIST code never
 passes a NULL to these support functions.  I'm prepared to believe that
 (1) and (2) are both true, but it merits checking.

Sure. gbt_var_same is called by those functions in btree_gist/:
- gbt_bit_same
- gbt_bytea_same
- gbt_numeric_same
- gbt_text_same
=# select proname, proisstrict from pg_proc
where proname in ('gbt_bit_same', 'gbt_bytea_same',
'gbt_numeric_same', 'gbt_text_same');
 proname  | proisstrict
--+-
 gbt_text_same| t
 gbt_bytea_same   | t
 gbt_numeric_same | t
 gbt_bit_same | t
(4 rows)

For the second point, I have run regression tests with an assertion in
gbt_var_same checking if t1 or t2 are NULL and tests worked. Also,
looking at the code of gist, gbt_var_same is called through
gistKeyIsEQ, which is used for an insertion or for a split. The point
is that when doing an insertion, a call to gistgetadjusted is done and
we look if one of the keys is NULL. If one of them is, code does not
go through gistKeyIsEQ, so the NULL checks do not seem necessary in
gbt_var_same.

Btw, looking at the code of gist, I think that it would be interesting
to add an assertion in gistKeyIsEQ like that:
Assert(DatumGetPointer(a) != NULL  DatumGetPointer(b) != NULL);
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Tue, Jan 20, 2015 at 9:52 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  It seems [WaitForBackgroundWorkerShutdown] has possibility to wait
forever.
  Assume one of the worker is not able to start (not able to attach
  to shared memory or some other reason), then status returned by
  GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED
  and after that it can wait forever in WaitLatch.

 I don't think that's right.  The status only remains
 BGWH_NOT_YET_STARTED until the postmaster forks the child process.

I think the control flow can reach the above location before
postmaster could fork all the workers.  Consider a case that
we have launched all workers during ExecutorStart phase
and then before postmaster could start all workers an error
occurs in master backend and then it try to Abort the transaction
and destroy the parallel context, at that moment it will get the
above status and wait forever in above code.

I am able to reproduce above scenario with debugger by using
parallel_seqscan patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Amit Langote

On 20-01-2015 PM 11:29, Amit Kapila wrote:
 
 I have taken care of integrating the parallel sequence scan with the
 latest patch posted (parallel-mode-v1.patch) by Robert at below
 location:
 http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
 
 Changes in this version
 ---
 1. As mentioned previously, I have exposed one parameter
 ParallelWorkerNumber as used in parallel-mode patch.
 2. Enabled tuple queue to be used for passing tuples from
 worker backend to master backend along with error queue
 as per suggestion by Robert in the mail above.
 3. Involved master backend to scan the heap directly when
 tuples are not available in any shared memory tuple queue.
 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
 parallel_setup_cost, parallel_startup_cost) for deciding the cost
 of parallel plan.  Currently, I have kept the default values for
 parallel_setup_cost and parallel_startup_cost as 0.0, as those
 require some experiments.
 5. Fixed some issues (related to memory increase as reported
 upthread by Thom Brown and general feature issues found during
 test)
 
 Note - I have yet to handle the new node types introduced at some
 of the places and need to verify prepared queries and some other
 things, however I think it will be good if I can get some feedback
 at current stage.
 

I got an assertion failure:

In src/backend/executor/execTuples.c: ExecStoreTuple()

/* passing shouldFree=true for a tuple on a disk page is not sane */
Assert(BufferIsValid(buffer) ? (!shouldFree) : true);

when called from:

In src/backend/executor/nodeParallelSeqscan.c: ParallelSeqNext()

I think something like the following would be necessary (reading from
comments in the code):

--- a/src/backend/executor/nodeParallelSeqscan.c
+++ b/src/backend/executor/nodeParallelSeqscan.c
@@ -85,7 +85,7 @@ ParallelSeqNext(ParallelSeqScanState *node)
if (tuple)
ExecStoreTuple(tuple,
   slot,
-  scandesc-rs_cbuf,
+  fromheap ? scandesc-rs_cbuf : InvalidBuffer,
   !fromheap);

After fixing this, the assertion failure seems to be gone though I
observed the blocked (CPU maxed out) state as reported elsewhere by Thom
Brown.

What I was doing:

CREATE TABLE test(a) AS SELECT generate_series(1, 1000);

postgres=# SHOW max_worker_processes;
 max_worker_processes
--
 8
(1 row)

postgres=# SET seq_page_cost TO 100;
SET

postgres=# SET parallel_seqscan_degree TO 4;
SET

postgres=# EXPLAIN SELECT * FROM test;
   QUERY PLAN
-
 Parallel Seq Scan on test  (cost=0.00..1801071.27 rows=8981483 width=4)
   Number of Workers: 4
   Number of Blocks Per Worker: 8849
(3 rows)

Though, EXPLAIN ANALYZE caused the thing.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-20 Thread Kyotaro HORIGUCHI

Hello, thank you for looking this but sorry that the last patch
was buggy so that adaptive fetch size did not work.

 The attached is the fixed patch. It apparently improves the
performance for the test case shown in the previous mail, in
which the average tuple length is about 140 bytes.

21 Jan 2015 05:22:34 +, Matt Kelly mkell...@gmail.com wrote in 
ca+kcukg4cvdlf4v0m9_rvv_zuasg1odhpj_yvczja6w2nsk...@mail.gmail.com
 I'm trying to compare v5 and v6 in my laptop right now.  Apparently my
 laptop is quite a bit faster than your machine because the tests complete
 in roughly 3.3 seconds.
 
 I added more data and didn't see anything other than noise.  (Then again
 the queries were dominated by the disk sort so I should retry with larger
 work_mem).  I'll try it again when I have more time to play with it.  I
 suspect the benefits would be more clear over a network.
 
 Larger than default work_mem yes, but I think one of the prime use case for
 the fdw is for more warehouse style situations (PostgresXL style use
 cases).  In those cases, work_mem might reasonably be set to 1GB.  Then
 even if you have 10KB rows you can fetch a million rows and still be using
 less than work_mem.  A simpler change would be to vary it with respect to
 work_mem.

Agreed about the nature of the typical workload for postgres
FDW. But I think server itself including postgres_fdw should not
crash even by a sudden explosion of tuple length. The number 100
seems to be safe enough but 1000 seems suspicious, and 1 is
looks to be danger from such standpoint.

 Half baked idea: I know its the wrong time in the execution phase, but if
 you are using remote estimates for cost there should also be a row width
 estimate which I believe is based from pg_statistic and its mean column
 width.

It reduces the chance to claim unexpected amount of memory, but
still the chance remains.

 Its actually a pity that there is no way to set fetch sizes based on give
 me as many tuples as will fit in less than x amount of memory.  Because
 that is almost always exactly what you want.  Even when writing application
 code, I've never actually wanted precisely 10,000 rows; I've always wanted
 a reasonable size chunk that could fit into memory and then backed my way
 into how many rows I wanted.  If we were to extend FETCH to support syntax
 like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
 estimate the value on the fly.

I didn't think about hat. It makes sense, at least to me:) There
would be many cases that the *amount* of data is more crucial
than their number. I'll work on it.

 The async stuff, however, is a huge improvement over the last time I played
 with the fdw.  The two active postgres processes were easily consuming a
 core and half of CPU.  I think its not worth tying these two things
 together.  Its probably worth it to make these two separate discussions and
 separate patches.

Yes, they can be separated and also should be. I'll split them
after this.

 - Matt Kelly
 
 *Just sanity checking myself: Shutting down the server, applying the
 different patch, 'make clean install' in postgres_fdw, and then restarting
 the server should obviously be sufficient to make sure its running the new
 code because that is all linked at runtime, right?

Yes. it's enough and I also did so. This patch touches only
postgres_fdw.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 12d0e8666871e2272beb1b85903baf0be92881b5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Tue, 13 Jan 2015 19:20:35 +0900
Subject: [PATCH] Asynchronous execution of postgres_fdw v7

This is the buf fixed version of v6, which was modified version of
Asynchronous execution of postgres_fdw.
---
 contrib/postgres_fdw/Makefile   |   2 +-
 contrib/postgres_fdw/PgFdwConn.c| 200 ++
 contrib/postgres_fdw/PgFdwConn.h|  61 ++
 contrib/postgres_fdw/connection.c   |  82 
 contrib/postgres_fdw/postgres_fdw.c | 391 +---
 contrib/postgres_fdw/postgres_fdw.h |  15 +-
 6 files changed, 629 insertions(+), 122 deletions(-)
 create mode 100644 contrib/postgres_fdw/PgFdwConn.c
 create mode 100644 contrib/postgres_fdw/PgFdwConn.h

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..d0913e2 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS = postgres_fdw.o PgFdwConn.o option.o deparse.o connection.o $(WIN32RES)
 PGFILEDESC = postgres_fdw - foreign data wrapper for PostgreSQL
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/PgFdwConn.c b/contrib/postgres_fdw/PgFdwConn.c
new file mode 100644
index 000..b13b597
--- /dev/null
+++ b/contrib/postgres_fdw/PgFdwConn.c
@@ -0,0 +1,200 @@

Re: [HACKERS] hamerkop is stuck

2015-01-20 Thread Noah Misch

On Mon, Jan 19, 2015 at 10:44:37AM +0900, TAKATSUKA Haruka wrote:
 On Fri, 16 Jan 2015 03:25:00 -0500 Noah Misch n...@leadboat.com wrote:
  Buildfarm member hamerkop stopped reporting in after commit f6dc6dd.  After
  commit 8d9cb0b, it resumed reporting in for 9.3 and earlier branches.  It is
  still silent for HEAD and REL9_4_STABLE.  It may have stale processes or 
  stale
  lock files requiring manual cleanup.  Would you check it?

 Sorry about giving corrupted reports.
 We examine what it is caused by now.

It is good now.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position


On 1/20/15 11:12 AM, Pavel Stehule wrote:

I am sending updated version - it allow third optional argument that specify 
where searching should to start. With it is possible repeatably call this 
function.


What happened to returning an array of offsets? I think that would be both 
easier to use than this version as well as performing better.

I see you dropped multi-dimension support, but I think that's fine.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan