Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I still haven't categorically ruled out pl/sh yet; that's something to
 keep in mind.

Well, after bisection proved not to be fruitful, I replaced the pl/sh
calls with dummy calls that approximated the same behavior and the
problem went away.  So again, it looks like this might be a lot of
false alarm.  A pl/sh driven failure might still be interesting if
it's coming from the internal calls it's making, so I'm still chasing
it down.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Thom Brown
On 28 January 2015 at 14:03, Robert Haas robertmh...@gmail.com wrote:

 The problem here, as I see it, is that we're flying blind.  If there's
 just one spindle, I think it's got to be right to read the relation
 sequentially.  But if there are multiple spindles, it might not be,
 but it seems hard to predict what we should do.  We don't know what
 the RAID chunk size is or how many spindles there are, so any guess as
 to how to chunk up the relation and divide up the work between workers
 is just a shot in the dark.


Can't the planner take effective_io_concurrency into account?

Thom


Re: [HACKERS] Safe memory allocation functions

2015-01-28 Thread Michael Paquier
On Tue, Jan 27, 2015 at 5:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-27 17:27:53 +0900, Michael Paquier wrote:
 Alvaro Herrera wrote:
  So how about something like
 
  #define ALLOCFLAG_HUGE  0x01
  #define ALLOCFLAG_NO_ERROR_ON_OOM   0x02
  void *
  MemoryContextAllocFlags(MemoryContext context, Size size, int flags);
 The flag for huge allocations may be useful, but I don't actually see
 much value in the flag ALLOC_NO_OOM if the stuff in aset.c returns
 unconditionally NULL in case of an OOM and we let palloc complain
 about an OOM when allocation returns NULL. Something I am missing
 perhaps?

 I guess the idea is to have *user facing* MemoryContextAllocExtended()
 that can do both huge and no-oom allocations. Otherwise we need palloc
 like wrappers for all combinations.
 We're certainly not just going to ignore memory allocation failures
 generally in in MemoryContextAllocExtended()
As a result of all the comments on this thread, here are 3 patches
implementing incrementally the different ideas from everybody:
1) 0001 modifies aset.c to return unconditionally NULL in case of an
OOM instead of reporting an error. All the OOM error reports are moved
to mcxt.c (MemoryContextAlloc* and palloc*)
2) 0002 adds the noerror routines for frontend and backend.
3) 0003 adds MemoryContextAllocExtended that can be called with the
following control flags:
#define ALLOC_HUGE 0x01/* huge allocation */
#define ALLOC_ZERO 0x02/* clear allocated memory */
#define ALLOC_NO_OOM   0x04/* no failure if out-of-memory */
#define ALLOC_ALIGNED  0x08/* request length suitable for MemSetLoop */
This groups MemoryContextAlloc, MemoryContextAllocHuge,
MemoryContextAllocZero and MemoryContextAllocZeroAligned under the
same central routine.
Regards,
-- 
Michael
From 337c439554ce66486cf9d29dced6c72d034b8f8d Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Wed, 28 Jan 2015 22:10:13 +0900
Subject: [PATCH 1/3] Make allocation return functions return NULL on OOM

On counterpart, higher-level APIs in mcxt.c return an explicit error
message when memory requests cannot be completed.
---
 src/backend/utils/mmgr/aset.c | 30 ---
 src/backend/utils/mmgr/mcxt.c | 48 +++
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 85b3c9a..bf6f09a 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -642,8 +642,8 @@ AllocSetDelete(MemoryContext context)
 
 /*
  * AllocSetAlloc
- *		Returns pointer to allocated memory of given size; memory is added
- *		to the set.
+ *		Returns pointer to allocated memory of given size or NULL if
+ *		request could not be completed; memory is added to the set.
  *
  * No request may exceed:
  *		MAXALIGN_DOWN(SIZE_MAX) - ALLOC_BLOCKHDRSZ - ALLOC_CHUNKHDRSZ
@@ -673,10 +673,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 		if (block == NULL)
 		{
 			MemoryContextStats(TopMemoryContext);
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg(out of memory),
-	 errdetail(Failed on request of size %zu., size)));
+			return NULL;
 		}
 		block-aset = set;
 		block-freeptr = block-endptr = ((char *) block) + blksize;
@@ -867,10 +864,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 		if (block == NULL)
 		{
 			MemoryContextStats(TopMemoryContext);
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg(out of memory),
-	 errdetail(Failed on request of size %zu., size)));
+			return NULL;
 		}
 
 		block-aset = set;
@@ -1002,9 +996,10 @@ AllocSetFree(MemoryContext context, void *pointer)
 
 /*
  * AllocSetRealloc
- *		Returns new pointer to allocated memory of given size; this memory
- *		is added to the set.  Memory associated with given pointer is copied
- *		into the new memory, and the old memory is freed.
+ *		Returns new pointer to allocated memory of given size or NULL if
+ *		request could not be completed; this memory is added to the set.
+ *		Memory associated with given pointer is copied into the new memory,
+ *		and the old memory is freed.
  *
  * Without MEMORY_CONTEXT_CHECKING, we don't know the old request size.  This
  * makes our Valgrind client requests less-precise, hazarding false negatives.
@@ -1109,10 +1104,7 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size)
 		if (block == NULL)
 		{
 			MemoryContextStats(TopMemoryContext);
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg(out of memory),
-	 errdetail(Failed on request of size %zu., size)));
+			return NULL;
 		}
 		block-freeptr = block-endptr = ((char *) block) + blksize;
 
@@ -1179,6 +1171,10 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size)
 		/* allocate new chunk */
 		newPointer = AllocSetAlloc((MemoryContext) set, size);
 
+		/* leave immediately if request 

Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Petr Jelinek

On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:


As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where 
a  5;
QUERY PLAN

  Sample Scan on t1  (cost=0.00..301.00 rows=1 width=4) (actual 
time=0.015..2
.920 rows=4294 loops=1)
Filter: (a  5)
Rows Removed by Filter: 5876
Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel-tuples,
not using baserel-rows so it doesn't reflect the scan quals. Did
the code come from some other intention?



No, that's a bug.


 4.
 SampleNext()
 a.
 Isn't it better to code SampleNext() similar to SeqNext() and
 IndexNext(), which encapsulate the logic to get the tuple in
 another function(tbs_next() or something like that)?


 Possibly, my thinking was that unlike the index_getnext() and
 heap_getnext(), this function would not be called from any other
 place so adding one more layer of abstraction didn't seem useful.
 And it would mean creating new ScanDesc struct, etc.


I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.


Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.


As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.



Yes, that's my view too. I would generally be for that change also and 
it would be worth it if the code was used in more than one place, but as 
it is it seems like it will just add code/complexity for no real 
benefit. It would make sense in case we used sequential scan node 
instead of the new node as Amit also suggested, but I remain unconvinced 
that mixing sampling and sequential scan into single scan node would be 
a good idea.




How will it get smoothen for cases when let us say
more than 50% of tuples are not visible.  I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.



I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.


Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.



Yes it does, but as I am failing to explain even here, it's not clear to 
me what to write there. From my point of view it's just effect of the 
essential property of bernoulli sampling which is that every element has 
equal probability of being included in the sample. It comes from the 
fact that we do bernoulli trial (in the code it's the while 
(sampler_random_fract(sampler-randstate)  probability) loop) on every 
individual tuple.


This means that the ratio of visible and invisible tuples should be same 
in the sample as it is in the relation. We then just skip the invisible 
tuples and get the correct percentage of the visible ones (this has 
performance benefit of not having to do visibility check on all tuples).


If that wasn't true than the bernoulli sampling would just simply not 
work as intended as the same property is the reason why it's used in 
statistics - the trends should look the same in sample as they look in 
the source population.


Obviously there is some variation in practice as we don't have perfect 
random generator but that's independent of the algorithm.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 

Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Stephen Frost
* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 1/27/15 9:29 AM, Stephen Frost wrote:
 My point is that Bruce's patch suggests looking for remote_dir in
 the rsync documentation, but no such term appears there.
 Ah, well, perhaps we could simply add a bit of clarification to this:
 
 for details on specifying optionremote_dir/
 
 The whole remote_dir discussion made me think of something... would 
 --link-dest be any help here?

No.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Robert Haas
On Wed, Jan 28, 2015 at 2:08 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 OTOH, spreading the I/O across multiple files is not a good thing, if you
 don't have a RAID setup like that. With a single spindle, you'll just induce
 more seeks.

 Perhaps the OS is smart enough to read in large-enough chunks that the
 occasional seek doesn't hurt much. But then again, why isn't the OS smart
 enough to read in large-enough chunks to take advantage of the RAID even
 when you read just a single file?

Suppose we have N spindles and N worker processes and it just so
happens that the amount of computation is such that a each spindle can
keep one CPU busy.  Let's suppose the chunk size is 4MB.  If you read
from the relation at N staggered offsets, you might be lucky enough
that each one of them keeps a spindle busy, and you might be lucky
enough to have that stay true as the scans advance.  You don't need
any particularly large amount of read-ahead; you just need to stay at
least one block ahead of the CPU.  But if you read the relation in one
pass from beginning to end, you need at least N*4MB of read-ahead to
have data in cache for all N spindles, and the read-ahead will
certainly fail you at the end of every 1GB segment.

The problem here, as I see it, is that we're flying blind.  If there's
just one spindle, I think it's got to be right to read the relation
sequentially.  But if there are multiple spindles, it might not be,
but it seems hard to predict what we should do.  We don't know what
the RAID chunk size is or how many spindles there are, so any guess as
to how to chunk up the relation and divide up the work between workers
is just a shot in the dark.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Stephen Frost
Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
 On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote:
  The example listed works, but only when it's a local rsync:
  
  rsync --archive --hard-links --size-only old_dir new_dir remote_dir
  
  Perhaps a better example (or additional one) would be with a remote
  rsync, including clarification of old and new dir, like so:
  
  (run in /var/lib/postgresql)
  rsync --archive --hard-links --size-only \
9.3/main \
9.4/main \
server:/var/lib/postgresql/
  
  Note that 9.3/main and 9.4/main are two source directories for rsync to
  copy over, while server:/var/lib/postgresql/ is a remote destination
  directory.  The above directories match a default Debian/Ubuntu install.
 
 OK, sorry everyone was confused by 'remote_dir'.  Does this new patch
 help?

Looks better, but --links is not the same as --hard-links.  The example
is right, the but documentation below it mentions option--link/
which is for symlinks, not hard links.

This also should really include a discussion about dealing with
tablespaces, since the example command won't deal with them.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 Interesting problem, but doesn't rsync use sub-second accuracy?

No.  Simple test will show:

touch xx/aa ; rsync -avv xx yy ; sleep 0.5 ; touch xx/aa ; rsync -avv xx yy

Run that a few times and you'll see it report xx/aa is uptodate
sometimes, depending on when exactly where the sleep falls during the
second.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] ya comment typo

2015-01-28 Thread Erik Rijkers
'the the' in bufmgr.c--- src/backend/storage/buffer/bufmgr.c.orig	2015-01-28 09:13:39.681366670 +0100
+++ src/backend/storage/buffer/bufmgr.c	2015-01-28 09:14:14.225690845 +0100
@@ -138,7 +138,7 @@
 static void ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref);
 
 /*
- * Ensure that the the PrivateRefCountArray has sufficient space to store one
+ * Ensure that the PrivateRefCountArray has sufficient space to store one
  * more entry. This has to be called before using NewPrivateRefCountEntry() to
  * fill a new entry - but it's perfectly fine to not use a reserved entry.
  */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ya comment typo

2015-01-28 Thread Heikki Linnakangas

On 01/28/2015 10:16 AM, Erik Rijkers wrote:

'the the' in bufmgr.c


Fixed, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Kyotaro HORIGUCHI
Hello,

 On 19/01/15 07:08, Amit Kapila wrote:
  On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com
  mailto:p...@2ndquadrant.com wrote:
  No issues, but it seems we should check other paths where
  different handling could be required for tablesample scan.
  In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
  for rel size estimates where it checks the presence of partial indexes
  and that might effect the size estimates and that doesn't seem to
  be required when we have to perform scan based on TableSample
  method.
 
 I think that's actually good to have, because we still do costing and 
 the partial index might help produce better estimate of number of 
 returned rows for tablesample as well.

As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) 
where a  5;
   QUERY PLAN

 Sample Scan on t1  (cost=0.00..301.00 rows=1 width=4) (actual time=0.015..2
.920 rows=4294 loops=1)
   Filter: (a  5)
   Rows Removed by Filter: 5876
   Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel-tuples,
not using baserel-rows so it doesn't reflect the scan quals. Did
the code come from some other intention?

  4.
  SampleNext()
  a.
  Isn't it better to code SampleNext() similar to SeqNext() and
  IndexNext(), which encapsulate the logic to get the tuple in
  another function(tbs_next() or something like that)?
 
 
  Possibly, my thinking was that unlike the index_getnext() and
  heap_getnext(), this function would not be called from any other
  place so adding one more layer of abstraction didn't seem useful.
  And it would mean creating new ScanDesc struct, etc.
 
 
  I think adding additional abstraction would simplify the function
  and reduce the chance of discrepency, I think we need to almost
  create a duplicate version of code for heapgetpage() method.
  Yeah, I agree that we need to build structure like ScanDesc, but
  still it will be worth to keep code consistent.
 
 Well similar, not same as we are not always fetching whole page or doing 
 visibility checks on all tuples in the page, etc. But I don't see why it 
 can't be inside nodeSamplescan. If you look at bitmap heap scan, that 
 one is also essentially somewhat modified sequential scan and does 
 everything within the node nodeBitmapHeapscan because the algorithm is 
 not used anywhere else, just like sample scan.

As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.

  Another one is PageIsAllVisible() check.
 
 
 Done.
 
  Another thing is don't we want to do anything for sync scans
  for these method's as they are doing heap scan?
 
 
  I don't follow this one tbh.
 
 
  Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c.
  Basically during sequiantial scan on same table by different
  backends, we attempt to keep them synchronized to reduce the I/O.
 
 
 Ah this, yes, it makes sense for bernoulli, not for system though. I 
 guess it should be used for sampling methods that use SAS_SEQUENTIAL 
 strategy.
 
 
 
  c.
  for bernoulli method, it will first get the tupoffset with
  the help of function and then apply visibility check, it seems
  that could reduce the sample set way lower than expected
  in case lot of tuples are not visible, shouldn't it be done in
  reverse
  way(first visibility check, then call function to see if we want to
  include it in the sample)?
  I think something similar is done in acquire_sample_rows which
  seems right to me.
 
 
  I don't think so, the way bernoulli works is that it returns every
  tuple with equal probability, so the visible tuples have same chance
  of being returned as the invisible ones so the issue should be
  smoothed away automatically (IMHO).
 
 
  How will it get smoothen for cases when let us say
  more than 50% of tuples are not visible.  I think its
  due to Postgres non-overwritting storage architecture
  that we maintain multiple version of rows in same heap,
  otherwise for different kind of architecture (like mysql/oracle)
  where multiple row versions are not maintained in same
  heap, the same function (with same percentage) can return
  

Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c

2015-01-28 Thread Heikki Linnakangas

On 01/28/2015 02:47 AM, Michael Paquier wrote:

On Wed, Jan 28, 2015 at 3:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:

So I'm fine with taking out both this documentation text and the dead
null-pointer checks; but let's do that all in one patch not piecemeal.
In any case, this is just cosmetic cleanup; there's no actual hazard
here.

Attached is a patch with all those things done.


Thanks, applied.


I added as well an assertion in gistKeyIsEQ checking if the input
datums are NULL. I believe that this is still useful for developers,
feel free to rip it out from the patch if you think otherwise.


I ripped it out because I think was wrong. It assumed that the input 
Datums are pass-by-reference, which is not a given. It looks that's true 
for all the current opclasses, so I wouldn't be surprised if there are 
hidden assumptions on that elsewhere in the code, but it was wrong 
nevertheless.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH CHECK and Column-Level Privileges

2015-01-28 Thread Dean Rasheed
On 27 January 2015 at 22:45, Stephen Frost sfr...@snowman.net wrote:
 Here's the latest set with a few additional improvements (mostly
 comments but also a couple missed #include's and eliminating unnecessary
 whitespace changes).  Unless there are issues with my testing tonight or
 concerns raised, I'll push these tomorrow.


I spotted a couple of minor things reading the patches:

- There's a typo in the comment for the GetModifiedColumns() macros
(...stick in into...).

- The new regression test is not tidying up properly after itself,
because it's trying to drop the table t1 as the wrong user.

Other than that, this looks reasonable to me, and I think that for
most common situations it won't reduce the detail in errors.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-01-28 Thread Etsuro Fujita

On 2015/01/19 17:10, Etsuro Fujita wrote:

Attached is an updated version of the patch.


I'll add this to the next CF.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Andres Freund
On 2015-01-26 21:13:31 -0500, Robert Haas wrote:
 On Mon, Jan 26, 2015 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote:
  Contrary opinions? Robert?
 
  I'm totally OK with further aligning just that one allocation.
 
 Of course, now that I think about it, aligning it probably works
 mostly because the size is almost exactly one cache line.

Not just almost - it's precisely 64 byte on x86-64 linux. And I think
it'll also be that on other 64bit platforms. The only architecture
dependant thing in there is the spinlock and that already can be between
1 and 4 bytes without changing the size of the struct.

 If it were any bigger or smaller, aligning it more wouldn't help.

Yea.

 So maybe we should also do something like what LWLocks do, and make a
 union between the actual structure and an appropriate array of padding
 bytes - say either 64 or 128 of them.

Hm. That's a bit bigger patch. I'm inclined to just let it slide for the
moment. I still have plans to downsize some of sbufdesc's content (move
the io lock out) and move the content lwlock inline. Then we're not
going to have much choice but do this...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Petr Jelinek

On 28/01/15 08:23, Kyotaro HORIGUCHI wrote:

Hi, I took a look on this and found nice.

By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.



It wasn't my intention to support it, but you are correct, the code is 
generic enough that we can support it.



The following change seems enough.



Seems about right, thanks.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2015-01-28 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  What bothers me about this is that it punts SSL work to the application
  and requires that they be coded to work with both OpenSSL and whatever
  else we implement (eg: GnuTLS) to do anything but the most simple
  checks.  That's a problem because people are *not* going to want to
  #include both OpenSSL and GnuTLS headers into their applications because
  they don't know which PG will be compiled with..  Not to mention that
  it'd be darn awkward to do so.
 
 The point of this is to provide an escape hatch for people who really
 want to do XYZ even though we provide no API for XYZ in libpq.  Hopefully,
 those people will be few and far between, because anything that's a really
 common requirement should be catered for by libpq.

I understand that, but 4 variables is pretty darn far from what an
application developing for SSL is going to want.  As I've mentioned
before when this has been brought up, I'm of the opinion that we should
be providing, from the start, the same set as Apache's SSL environment
variables:

The mod_ssl (OpenSSL-based) documentation:
http://httpd.apache.org/docs/2.2/mod/mod_ssl.html

For mod_gnutls, this is the list of SSL variables provided:
http://www.outoforder.cc/projects/apache/mod_gnutls/docs/#environment-variables

Note that they're pretty much the same set, so providing them for
OpenSSL isn't closing off the ability to provide GnuTLS in the future.

To be clear, I'm not asking for all of this to happen in the first
patch, but I'd like whomever is going forward with this to at least
agree that they're going to try and cover the Apache set for whatever
libraries are supported in the first major release we put out with this.
Considering the example is already there, I'm really hopeful that isn't
too difficult to do..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote:
 So at this point I propose that we reject \u when de-escaping JSON.

 I would have agreed on 2014-12-09, and this release is the last chance to make
 such a change.  It is a bold wager that could pay off, but -1 from me anyway.

You only get to vote -1 if you have a credible alternative.  I don't see
one.

 I can already envision the blog post from the DBA staying on 9.4.0 because
 9.4.1 pulled his ability to store U+ in jsonb.

Those will be more or less the same people who bitch about text not
storing NULs; the world has not fallen.

 jsonb was *the* top-billed
 9.4 feature, and this thread started with Andrew conveying a field report of a
 scenario more obscure than storing U+.

There is a separate issue, which is that our earlier attempt to make the
world safe for \u actually broke other things.  We still need to fix
that, but I think the fix probably consists of reverting that patch and
instead disallowing \u.

 Anybody who's seriously unhappy with that can propose a patch to fix it
 properly in 9.5 or later.

 Someone can still do that by introducing a V2 of the jsonb binary format and
 preserving the ability to read both formats.  (Too bad Andres's proposal to
 include a format version didn't inform the final format, but we can wing it.)
 I agree that storing U+ as 0x00 is the best end state.

We will not need a v2 format, merely values that contain NULs.  Existing
data containing \u will be read as those six ASCII characters, which
is not really wrong since that might well be what it was anyway.

 We probably need to rethink the re-escaping behavior as well; I'm not
 sure if your latest patch is the right answer for that.

 Yes, we do.  No change to the representation of U+ is going to fix the
 following bug, but that patch does fix it:

 [local] test=# select
 test-# $$\\u05e2$$::jsonb = $$\\u05e2$$::jsonb,
 test-# $$\\u05e2$$::jsonb = $$\\u05e2$$::jsonb::text::jsonb;

The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d,
which I'm inclined to think we need to simply revert, not render even
more squirrely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-28 Thread Heikki Linnakangas

On 01/23/2015 02:34 AM, Petr Jelinek wrote:

On 22/01/15 17:02, Petr Jelinek wrote:


The new version (the one that is not submitted yet) of gapless sequence
is way more ugly and probably not best example either but does guarantee
gaplessness (it stores the last value in it's own value table). So I am
not sure if it should be included either...


Here it is as promised.


I generally like the division of labour between the seqam 
implementations and the API now.


I don't like the default_sequenceam GUC. That seems likely to create 
confusion. And it's not really enough for something like a remote 
sequence AM anyway that definitely needs some extra reloptions, like the 
hostname of the remote server. The default should be 'local', unless you 
specify something else with CREATE SEQUENCE USING - no GUCs.


Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number 
instead of checking whether pg_sequence_dump_state() function exists. 
That's what we usually do when new features are introduced. And there's 
actually a bug there: you have the check backwards. (try dumping a 
database with any sequences in it; it fails)


* pg_dump should not output a USING clause when a sequence uses the 
'local' sequence. No point in adding such noise - making the SQL command 
not standard-compatible - for the 99% of databases that don't use other 
sequence AMs.


* Be careful to escape all strings correctly in pg_dump. I think you're 
missing escaping for the 'state' variable at least.


In sequence_save_tuple:

else
{
/*
 * New tuple was not sent, so the original tuple was probably 
just
 * changed inline, all we need to do is mark the buffer dirty 
and
 * optionally log the update tuple.
 */
START_CRIT_SECTION();

MarkBufferDirty(seqh-buf);

if (do_wal)
log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, 
page);

END_CRIT_SECTION();
}


This is wrong when checksums are enabled and !do_wal. I believe this 
should use MarkBufferDirtyHint().



Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.


Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of 
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the 
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, 
updating a row in a table for every nextval() call is pretty darn 
expensive. But I don't actually see a problem with dump and restore.


Can you rewrite it without using the values table? AFAICS, there are a 
two of problems to solve:


1. If the top-level transaction aborts, you need to restore the old 
value. I suggest keeping two values in the sequence tuple: the old, 
definitely committed one, and the last value. The last value is only 
considered current if the associated XID committed; otherwise the old 
value is current. When a transaction is about to commit, it stores its 
top-level XID and the new value in the last field, and copies the 
previously current value to the old field.


2. You need to track the last value on a per-subtransaction basis, until 
the transaction commits/rolls back, in order to have rollback to 
savepoint to retreat the sequence's value. That can be done in 
backend-private memory, maintaining a stack of subtransactions and the 
last value of each. Use RegisterSubXactCallback to hook into subxact 
commit/abort. Just before top-level commit (in pre-commit callback), 
update the sequence tuple with the latest value, so that it gets 
WAL-logged before the commit record.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Andrew Dunstan


On 01/28/2015 12:50 AM, Noah Misch wrote:

On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 01/27/2015 02:28 PM, Tom Lane wrote:

Well, we can either fix it now or suffer with a broken representation
forever.  I'm not wedded to the exact solution I described, but I think
we'll regret it if we don't change the representation.

So at this point I propose that we reject \u when de-escaping JSON.

I would have agreed on 2014-12-09, and this release is the last chance to make
such a change.  It is a bold wager that could pay off, but -1 from me anyway.
I can already envision the blog post from the DBA staying on 9.4.0 because
9.4.1 pulled his ability to store U+ in jsonb.  jsonb was *the* top-billed
9.4 feature, and this thread started with Andrew conveying a field report of a
scenario more obscure than storing U+.  Therefore, we have to assume many
users will notice the change.  This move would also add to the growing
evidence that our .0 releases are really beta(N+1) releases in disguise.


Anybody who's seriously unhappy with that can propose a patch to fix it
properly in 9.5 or later.

Someone can still do that by introducing a V2 of the jsonb binary format and
preserving the ability to read both formats.  (Too bad Andres's proposal to
include a format version didn't inform the final format, but we can wing it.)
I agree that storing U+ as 0x00 is the best end state.





We need to make up our minds about this pretty quickly. The more radical 
move is likely to involve quite a bit of work, ISTM.


It's not clear to me how we should represent a unicode null. i.e. given 
a json of '[foo\ubar]', I get that we'd store the element as 
'foo\x00bar', but what is the result of


   (jsonb '[foo\ubar')-0

It's defined to be text so we can't just shove a binary null in the 
middle of it. Do we throw an error?


And I still want to hear more voices on the whole direction we want to 
take this.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2015-01-28 Thread Heikki Linnakangas

On 01/28/2015 06:58 PM, Stephen Frost wrote:

Although I think OpenSSL SSL is a little bit duplicatively
redundant.  Why not just OpenSSL?

I wondered also, but figured it was probably because it's OpenSSL's
ssl structure, which then made sense.


Right, that was the idea. I wanted it to include the word OpenSSL, to 
make it clear in the callers that it's specific to OpenSSL. And SSL, 
because that's the name of the struct. I agree it looks silly, though. 
One idea is to have two separate arguments: the implementation name, and 
the struct name. PQgetSSLstruct(ssl, OpenSSL, SSL) would look less 
silly.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 It's not clear to me how we should represent a unicode null. i.e. given 
 a json of '[foo\ubar]', I get that we'd store the element as 
 'foo\x00bar', but what is the result of

 (jsonb '[foo\ubar')-0

 It's defined to be text so we can't just shove a binary null in the 
 middle of it. Do we throw an error?

Yes, that is what I was proposing upthread.  Obviously, this needs some
thought to ensure that there's *something* useful you can do with a field
containing a nul, but we'd have little choice but to throw an error if
the user asks us to convert such a field to unescaped text.

I'd be a bit inclined to reject nuls in object field names even if we
allow them in field values, since just about everything you can usefully
do with a field name involves regarding it as text.

Another interesting implementation problem is what does indexing do with
such values --- ISTR there's an implicit conversion to C strings in there
too, at least in GIN indexes.

Anyway, there is a significant amount of work involved here, and there's
no way we're getting it done for 9.4.1, or probably 9.4.anything.  I think
our only realistic choice right now is to throw error for \u so that
we can preserve our options for doing something useful with it later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-28 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Uh, sorry, I've not been paying any attention to this thread for awhile.
 What's the remaining questions at issue?

 This patch is trying to improve the array_agg case where there are
 many arrays being constructed simultaneously, such as in HashAgg. You
 strongly suggested that a commitable patch would differentiate between
 the grouped and ungrouped cases (or perhaps you meant differentiate
 between HashAgg and sorted aggregation?). Tomas's current patch does
 not do so; it does two main things:
1. Uses a common memory context for arrays being constructed by
 array_agg in any context (ungrouped, sorted agg, and HashAgg)
2. Reduces the initial array allocation to 8 elements from 64, but
 preserves doubling behavior

Sorry for slow response on this.  I took a look at the v8 patch (that's
still latest no?).  I see that it only gets rid of the separate context
for the two callers array_agg_transfn and array_agg_array_transfn, for
which no memory release can happen anyway until the parent context is
reset (cf comments in corresponding finalfns).  So that's fine --- it is
not making any bloat case any worse, at least.

I'm not sure about whether reducing the initial Datum array size
across-the-board is a good thing or not.  One obvious hack to avoid
unexpected side effects on other callers would be

astate-alen = (subcontext ? 64 : 8);

The larger initial size is basically free when subcontext is true, anyway,
considering it will be coming out of an 8K initial subcontext allocation.

This still leaves us wondering whether the smaller initial size will
hurt array_agg itself, but that's at least capable of being tested
with a reasonably small number of test cases.

I strongly object to removing initArrayResultArr's element_type argument.
That's got nothing to do with the stated purpose of the patch, and it
forces a non-too-cheap catalog lookup to be done inside
initArrayResultArr.  It's true that some of the current call sites are
just doing the same lookup themselves anyway, but we should not foreclose
the possibility that the caller has the data already (as some others do)
or is able to cache it across multiple uses.  A possible compromise is
to add the convention that callers can pass InvalidOid if they want
initArrayResultArr to do the lookup.  (In any case, the function's header
comment needs adjustment if the API spec changes.)

Other than that, I think the API changes here are OK.  Adding a new
argument to existing functions is something we do all the time, and it's
clear how to modify any callers to get the same behavior as before.
We could perhaps clean things up with a more invasive redefinition, but
I doubt it's worth inflicting more pain on third party callers.

Another thing I'd change is this:
 
+   /* we can only release the context if it's a private one. */
+   Assert(! (release  !astate-private_cxt));
+
/* Clean up all the junk */
if (release)
MemoryContextDelete(astate-mcontext);

Why not this way:

/* Clean up all the junk */
if (release)
+   {
+   Assert(astate-private_cxt);
MemoryContextDelete(astate-mcontext);
+   }

Seems a lot more understandable, and less code too.

I concur with the concerns that the comments could do with more work,
but haven't attempted to improve them myself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 While I sympathize with Noah's sentiments, the only thing that makes sense to 
 me is that a JSON text field is treated the same way as we treat text. Right 
 now, that means NUL is not allowed, period.

 If no one has bitched about this with text, is it really that big a problem 
 with JSON?

Oh, people have bitched about it all right ;-).  But I think your real
question is how many applications find that restriction to be fatal.
The answer clearly is not a lot for text, and I don't see why the
answer would be different for JSON.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Andres Freund
On 2015-01-28 15:32:15 +, Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
  ISTM that the check is just overzelous and/or needs to be moved into
  ImportSnapshot(). There it then could be made to check if the exporting
  xact was also deferrable.
 
 That would be great if ImportSnapshot had access to that
 information; I don't see it, though.  Having pg_dump use repeatable
 read transactions for the processes that import the snapshot would
 work fine, as long as they are reading a snapshot which was
 captured by a serializable read only deferrable transaction.

Then add that information? The disk format for snapshot isn't persistent
across restarts, so we can just extend it.

I really don't like adding hacks like using a lower serializability
level than what's actually requested just because it happens to be
easier. Even if it's just in some backend.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make hba available to client code

2015-01-28 Thread Andrew Dunstan


On 01/28/2015 04:26 PM, David Fetter wrote:

On Wed, Jan 28, 2015 at 04:10:42PM -0500, Tom Lane wrote:

David Fetter da...@fetter.org writes:

While investigating another project, namely adding pg_hba.conf support
to pgbouncer, I ran into a stumbling block others probably will, too:
the hba code is backend-only, which means that if I were to do this
as-is, I would be cooking a batch of very unappetizing copypasta.
I'm allergic to copypasta, so unless there are big objections, I'd
like to export those functions to make hba available to other code.

How exactly would exporting those functions help anything client-side?

Right now, pgbouncer, and aspirational things like it--other
connection poolers, maybe distributed transaction managers, etc.--can
fairly easily act almost like a direct connection to PostgreSQL,
except for some important exceptions. One that's cropped up several
times is the ability to gate auth by network and user, that being what
pg_hba.conf allows.

A conversation with Andrew Dunstan since I posted convinced me that
the amount of work to separate this cleanly and have it perform
somewhere in the close range of as well as it does now could be pretty
significant.




I should add that I am working on an hba facility for pgbouncer. It's 
currently in trial. I did pull in a few pieces of the core hba code, but 
not a great deal - mainly to do with handling IP addresses and masks.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Jim Nasby

On 1/28/15 9:56 AM, Stephen Frost wrote:

* Robert Haas (robertmh...@gmail.com) wrote:

On Wed, Jan 28, 2015 at 10:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:

I thought the proposal to chunk on the basis of each worker processes
one 1GB-sized segment should work all right.  The kernel should see that
as sequential reads of different files, issued by different processes;
and if it can't figure out how to process that efficiently then it's a
very sad excuse for a kernel.


Agreed.


I agree.  But there's only value in doing something like that if we
have evidence that it improves anything.  Such evidence is presently a
bit thin on the ground.


You need an i/o subsystem that's fast enough to keep a single CPU busy,
otherwise (as you mentioned elsewhere), you're just going to be i/o
bound and having more processes isn't going to help (and could hurt).

Such i/o systems do exist, but a single RAID5 group over spinning rust
with a simple filter isn't going to cut it with a modern CPU- we're just
too darn efficient to end up i/o bound in that case.  A more complex
filter might be able to change it over to being more CPU bound than i/o
bound and produce the performance improvments you're looking for.


Except we're nowhere near being IO efficient. The vast difference between 
Postgres IO rates and dd shows this. I suspect that's because we're not giving 
the OS a list of IO to perform while we're doing our thing, but that's just a 
guess.


The caveat to this is if you have multiple i/o *channels* (which it
looks like you don't in this case) where you can parallelize across
those channels by having multiple processes involved.


Keep in mind that multiple processes is in no way a requirement for that. Async 
IO would do that, or even just requesting stuff from the OS before we need it.


 We only support
multiple i/o channels today with tablespaces and we can't span tables
across tablespaces.  That's a problem when working with large data sets,
but I'm hopeful that this work will eventually lead to a parallelized
Append node that operates against a partitioned/inheirited table to work
across multiple tablespaces.


Until we can get a single seqscan close to dd performance, I fear worrying 
about tablespaces and IO channels is entirely premature.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:

 Having pg_dump use repeatable read transactions for the processes
 that import the snapshot would work fine, as long as they are
 reading a snapshot which was captured by a serializable read only
 deferrable transaction.

It looks like the attached patch does it (although it is only
lightly tested so far and only on the master branch).  This seems
like a back-patchable bug fix (to 9.3).


Thoughts?


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c3ebb3a..65ccc93 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1011,7 +1011,7 @@ setup_connection(Archive *AH, DumpOptions *dopt, const char *dumpencoding,
 	ExecuteSqlStatement(AH, BEGIN);
 	if (AH-remoteVersion = 90100)
 	{
-		if (dopt-serializable_deferrable)
+		if (dopt-serializable_deferrable  AH-sync_snapshot_id == NULL)
 			ExecuteSqlStatement(AH,
 SET TRANSACTION ISOLATION LEVEL 
 SERIALIZABLE, READ ONLY, DEFERRABLE);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump issue - push useless statements REVOKE, GRANT

2015-01-28 Thread Pavel Stehule
Hi

I have a dump with thousands objects.I found often pattern in dump, that
has not any sense. These operations has zero sense, but it decrease a
database restore.

It is expected behave?

REVOKE ALL ON TABLE zobjrozp FROM PUBLIC;
REVOKE ALL ON TABLE zobjrozp FROM sitkhaso;
GRANT ALL ON TABLE zobjrozp TO sitkhaso;
GRANT ALL ON TABLE zobjrozp TO PUBLIC;

REVOKE ALL ON TABLE zppolozky FROM PUBLIC;
REVOKE ALL ON TABLE zppolozky FROM sitkhaso;
GRANT ALL ON TABLE zppolozky TO sitkhaso;
GRANT ALL ON TABLE zppolozky TO PUBLIC;

...

5000x

Regards

Pavel Stehule


Re: [HACKERS] Make hba available to client code

2015-01-28 Thread Tom Lane
David Fetter da...@fetter.org writes:
 While investigating another project, namely adding pg_hba.conf support
 to pgbouncer, I ran into a stumbling block others probably will, too:
 the hba code is backend-only, which means that if I were to do this
 as-is, I would be cooking a batch of very unappetizing copypasta.

 I'm allergic to copypasta, so unless there are big objections, I'd
 like to export those functions to make hba available to other code.

How exactly would exporting those functions help anything client-side?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Jim Nasby

On 1/28/15 11:36 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

It's not clear to me how we should represent a unicode null. i.e. given
a json of '[foo\ubar]', I get that we'd store the element as
'foo\x00bar', but what is the result of



 (jsonb '[foo\ubar')-0



It's defined to be text so we can't just shove a binary null in the
middle of it. Do we throw an error?


Yes, that is what I was proposing upthread.  Obviously, this needs some
thought to ensure that there's *something* useful you can do with a field
containing a nul, but we'd have little choice but to throw an error if
the user asks us to convert such a field to unescaped text.

I'd be a bit inclined to reject nuls in object field names even if we
allow them in field values, since just about everything you can usefully
do with a field name involves regarding it as text.

Another interesting implementation problem is what does indexing do with
such values --- ISTR there's an implicit conversion to C strings in there
too, at least in GIN indexes.

Anyway, there is a significant amount of work involved here, and there's
no way we're getting it done for 9.4.1, or probably 9.4.anything.  I think
our only realistic choice right now is to throw error for \u so that
we can preserve our options for doing something useful with it later.


My $0.01:

While I sympathize with Noah's sentiments, the only thing that makes sense to 
me is that a JSON text field is treated the same way as we treat text. Right 
now, that means NUL is not allowed, period.

If no one has bitched about this with text, is it really that big a problem 
with JSON?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make hba available to client code

2015-01-28 Thread Tom Lane
David Fetter da...@fetter.org writes:
 On Wed, Jan 28, 2015 at 04:10:42PM -0500, Tom Lane wrote:
 How exactly would exporting those functions help anything client-side?

 Right now, pgbouncer, and aspirational things like it--other
 connection poolers, maybe distributed transaction managers, etc.--can
 fairly easily act almost like a direct connection to PostgreSQL,
 except for some important exceptions. One that's cropped up several
 times is the ability to gate auth by network and user, that being what
 pg_hba.conf allows.

 A conversation with Andrew Dunstan since I posted convinced me that
 the amount of work to separate this cleanly and have it perform
 somewhere in the close range of as well as it does now could be pretty
 significant.

I'm not sure I understand what you mean by separate this cleanly,
but if what you mean is rewrite hba.c so that it works either in
frontend or backend, I don't think I'm going to like the result;
and I'm not convinced that client-side code would find it all that
useful either.  The API, error handling, and memory management would
probably all need to be a great deal different on client side.  And
serving two masters like that would result in an unreadable mess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Kevin Grittner kgri...@ymail.com wrote:
 Having pg_dump use repeatable read transactions for the processes
 that import the snapshot would work fine, as long as they are
 reading a snapshot which was captured by a serializable read only
 deferrable transaction.

 It looks like the attached patch does it (although it is only
 lightly tested so far and only on the master branch).  This seems
 like a back-patchable bug fix (to 9.3).

 Thoughts?

A comment seems essential here, because as written anybody would think
the test for a snapshot is a bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-28 15:32:15 +, Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
 ISTM that the check is just overzelous and/or needs to be moved into
 ImportSnapshot(). There it then could be made to check if the exporting
 xact was also deferrable.

 That would be great if ImportSnapshot had access to that
 information; I don't see it, though. Having pg_dump use repeatable
 read transactions for the processes that import the snapshot would
 work fine, as long as they are reading a snapshot which was
 captured by a serializable read only deferrable transaction.

 Then add that information? The disk format for snapshot isn't persistent
 across restarts, so we can just extend it.

 I really don't like adding hacks like using a lower serializability
 level than what's actually requested just because it happens to be
 easier. Even if it's just in some backend.

I see your point, and will look at that for the master branch, but
it hardly seems like something to back-patch; and the messy failure
of this combination of options on 9.3 and 9.4 seems like it
deserves a fix.


Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:

 Kevin Grittner kgri...@ymail.com wrote:
 Having pg_dump use repeatable read transactions for the processes
 that import the snapshot would work fine, as long as they are
 reading a snapshot which was captured by a serializable read only
 deferrable transaction.

 It looks like the attached patch does it (although it is only
 lightly tested so far and only on the master branch).  This seems
 like a back-patchable bug fix (to 9.3).

 Thoughts?

 A comment seems essential here, because as written anybody would think
 the test for a snapshot is a bug.

Good point.


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-28 Thread Peter Geoghegan
On Mon, Jan 26, 2015 at 11:21 PM, Andreas Karlsson andr...@proxel.se wrote:
 Do you also think the SQL functions should be named numeric_int128_sum,
 numeric_int128_avg, etc?

Some quick review comments. These apply to int128-agg-v5.patch.

* Why is there no declaration of the function
numeric_int16_stddev_internal() within numeric.c?

* I concur with others - we should stick to int16 for the SQL
interface. The inconsistency there is perhaps slightly confusing, but
that's historic.

* I'm not sure about the idea of polymorphic catalog functions (that
return the type internal, but the actual struct returned varying
based on build settings).

I tend to think that things would be better if there was always a
uniform return type for such internal type returning functions, but
that *its* structure varied according to the availability of int128
(i.e. HAVE_INT128) at compile time. What we should probably do is have
a third aggregate struct, that encapsulates this idea of (say)
int2_accum() piggybacking on one of either Int128AggState or
NumericAggState directly. Maybe that would be called PolyNumAggState.
Then this common code is all that is needed on both types of build (at
the top of int4_accum(), for example):

PolyNumAggState *state;

state = PG_ARGISNULL(0) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(0);

I'm not sure if I'd do this with a containing struct or a simple
#ifdef HAVE_INT128 typedef #else ... , but I think it's an
improvement either way.

* You didn't update this unequivocal comment to not be so strong:

 * Integer data types all use Numeric accumulators to share code and
 * avoid risk of overflow.

That's all I have for now...
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Make hba available to client code

2015-01-28 Thread David Fetter
Folks,

While investigating another project, namely adding pg_hba.conf support
to pgbouncer, I ran into a stumbling block others probably will, too:
the hba code is backend-only, which means that if I were to do this
as-is, I would be cooking a batch of very unappetizing copypasta.

I'm allergic to copypasta, so unless there are big objections, I'd
like to export those functions to make hba available to other code.

Objections?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-28 Thread Tomas Vondra
On 28.1.2015 21:28, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
 On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Uh, sorry, I've not been paying any attention to this thread for awhile.
 What's the remaining questions at issue?
 
 This patch is trying to improve the array_agg case where there are
 many arrays being constructed simultaneously, such as in HashAgg. You
 strongly suggested that a commitable patch would differentiate between
 the grouped and ungrouped cases (or perhaps you meant differentiate
 between HashAgg and sorted aggregation?). Tomas's current patch does
 not do so; it does two main things:
1. Uses a common memory context for arrays being constructed by
 array_agg in any context (ungrouped, sorted agg, and HashAgg)
2. Reduces the initial array allocation to 8 elements from 64, but
 preserves doubling behavior
 
 Sorry for slow response on this. I took a look at the v8 patch
 (that's still latest no?). I see that it only gets rid of the

Yes, v8 is the latest version I submitted.

 separate context for the two callers array_agg_transfn and
 array_agg_array_transfn, for which no memory release can happen
 anyway until the parent context is reset (cf comments in
 corresponding finalfns). So that's fine --- it is not making any
 bloat case any worse, at least.
 
 I'm not sure about whether reducing the initial Datum array size 
 across-the-board is a good thing or not. One obvious hack to avoid 
 unexpected side effects on other callers would be
 
   astate-alen = (subcontext ? 64 : 8);
 
 The larger initial size is basically free when subcontext is true,
 anyway, considering it will be coming out of an 8K initial subcontext
 allocation.

Seems like a good idea. If we're using 8kB contexts, we can preallocate
64 elements right away.

But maybe we could decrease the 8kB context size to 1kB? For 64 elements
is just 512B + 64B for the arrays, and a bit of space for the
ArrayBuildState. So maybe ~600B in total.

Of course, this does not include space for pass-by-ref values.

Anyway, I have no  problem with this - I mostly care just about the
array_agg() case. All the other places may adopt the  'common context'
approach to get the benefit.

 This still leaves us wondering whether the smaller initial size will
  hurt array_agg itself, but that's at least capable of being tested 
 with a reasonably small number of test cases.

I plan to do more testing to confirm this - my initial testing seemed to
confirm this, but I'll repeat that with the current patch.

 I strongly object to removing initArrayResultArr's element_type
 argument. That's got nothing to do with the stated purpose of the
 patch, and it forces a non-too-cheap catalog lookup to be done
 inside initArrayResultArr.  It's true that some of the current call
 sites are just doing the same lookup themselves anyway, but we should
 not foreclose the possibility that the caller has the data already
 (as some others do) or is able to cache it across multiple uses.  A
 possible compromise is to add the convention that callers can pass
 InvalidOid if they want initArrayResultArr to do the lookup.  (In any
 case, the function's header comment needs adjustment if the API spec
 changes.)

Fair point, and the InvalidOid approach seems sane to me.

 
 Other than that, I think the API changes here are OK. Adding a new 
 argument to existing functions is something we do all the time, and
 it's clear how to modify any callers to get the same behavior as
 before. We could perhaps clean things up with a more invasive
 redefinition, but I doubt it's worth inflicting more pain on third
 party callers.
 
 Another thing I'd change is this:
  
 + /* we can only release the context if it's a private one. */
 + Assert(! (release  !astate-private_cxt));
 +
   /* Clean up all the junk */
   if (release)
   MemoryContextDelete(astate-mcontext);
 
 Why not this way:
 
   /* Clean up all the junk */
   if (release)
 + {
 + Assert(astate-private_cxt);
   MemoryContextDelete(astate-mcontext);
 + }
 
 Seems a lot more understandable, and less code too.

Yeah, I agree it's easier to understand.

 I concur with the concerns that the comments could do with more
 work, but haven't attempted to improve them myself.

There were a few comments about this, after the v8 patch, with
recommended comment changes.

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make hba available to client code

2015-01-28 Thread David Fetter
On Wed, Jan 28, 2015 at 04:10:42PM -0500, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  While investigating another project, namely adding pg_hba.conf support
  to pgbouncer, I ran into a stumbling block others probably will, too:
  the hba code is backend-only, which means that if I were to do this
  as-is, I would be cooking a batch of very unappetizing copypasta.
 
  I'm allergic to copypasta, so unless there are big objections, I'd
  like to export those functions to make hba available to other code.
 
 How exactly would exporting those functions help anything client-side?

Right now, pgbouncer, and aspirational things like it--other
connection poolers, maybe distributed transaction managers, etc.--can
fairly easily act almost like a direct connection to PostgreSQL,
except for some important exceptions. One that's cropped up several
times is the ability to gate auth by network and user, that being what
pg_hba.conf allows.

A conversation with Andrew Dunstan since I posted convinced me that
the amount of work to separate this cleanly and have it perform
somewhere in the close range of as well as it does now could be pretty
significant.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v7

2015-01-28 Thread Giuseppe Broccolo
Hi Marco,

2015-01-27 19:04 GMT+01:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it
:

 I've done some test and it looks like that FSM nodes always have
 InvalidXLogRecPtr as LSN.

 Ive updated the patch to always include files if all their pages have
 LSN == InvalidXLogRecPtr

 Updated patch v7 attached.

 Regards,
 Marco

 --
 Marco Nenciarini - 2ndQuadrant Italy
 PostgreSQL Training, Services and Support
 marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it


I've tried again to replay a new test of the incremental backup introducing
a new tablespace after a base backup, considering the version 7 of the
patch and the new version of the restore script attached in
http://www.postgresql.org/message-id/54c7cdad.6060...@2ndquadrant.it:

# define here your work dir
WORK_DIR='/home/gbroccolo/pgsql'

# preliminary steps
rm -rf /tmp/data /tmp/tbls tbls/ backups/

# create a test db and a backup repository
psql -c DROP DATABASE IF EXISTS pgbench
psql -c CREATE DATABASE pgbench
pgbench -U postgres -i -s 5 -F 80 pgbench
mkdir -p backups

# a first base backup with pg_basebackup
BASE=$(mkdir -vp backups/$(date '+%d%m%y%H%M') | awk -F'[’‘]' '{print $2}')
echo start a base backup: $BASE
mkdir -vp $BASE/data
pg_basebackup -v -F p -D $BASE/data -x -c fast

# creation of a new tablespace, alter the table pgbench_accounts to
set the new tablespace
mkdir -p $WORK_DIR/tbls
CREATE_CMD=CREATE TABLESPACE tbls LOCATION '$WORK_DIR/tbls'
psql -c $CREATE_CMD
psql -c ALTER TABLE pgbench_accounts SET TABLESPACE tbls pgbench

# Doing some work on the database
pgbench -U postgres -T 120 pgbench

# a second incremental backup with pg_basebackup specifying the new
location for the tablespace through the tablespace mapping
INCREMENTAL=$(mkdir -vp backups/$(date '+%d%m%y%H%M') | awk -F'[’‘]'
'{print $2}')
echo start an incremental backup: $INCREMENTAL
mkdir -vp $INCREMENTAL/data $INCREMENTAL/tbls
pg_basebackup -v -F p -D $INCREMENTAL/data -x -I $BASE/data -T
$WORK_DIR/tbls=$WORK_DIR/$INCREMENTAL/tbls -c fast

# restore the database
./pg_restorebackup.py -T $WORK_DIR/$INCREMENTAL/tbls=/tmp/tbls
/tmp/data $BASE/data $INCREMENTAL/data
chmod 0700 /tmp/data/
echo port=  /tmp/data/postgresql.conf
pg_ctl -D /tmp/data start


now the restore works fine and pointing to tablespaces are preserved also
in the restored instance:

gbroccolo@arnold:~/pgsql (master %)$ psql -c \db+
   List of tablespaces
Name|  Owner   |  Location  | Access
privileges | Options |  Size  | Description
+--++---+-++-
 pg_default | postgres ||
 | | 37 MB  |
 pg_global  | postgres ||
 | | 437 kB |
 tbls   | postgres | /home/gbroccolo/pgsql/tbls |
 | | 80 MB  |
(3 rows)

gbroccolo@arnold:~/pgsql (master %)$ psql -p  -c \db+
  List of tablespaces
Name|  Owner   | Location  | Access privileges | Options |
Size  | Description
+--+---+---+-++-
 pg_default | postgres |   |   | | 37 MB  |
 pg_global  | postgres |   |   | | 437 kB |
 tbls   | postgres | /tmp/tbls |   | | 80 MB  |
(3 rows)

Thanks Marco for your reply.

Giuseppe.
-- 
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The problem here, as I see it, is that we're flying blind.  If there's
 just one spindle, I think it's got to be right to read the relation
 sequentially.  But if there are multiple spindles, it might not be,
 but it seems hard to predict what we should do.  We don't know what
 the RAID chunk size is or how many spindles there are, so any guess as
 to how to chunk up the relation and divide up the work between workers
 is just a shot in the dark.

I thought the proposal to chunk on the basis of each worker processes
one 1GB-sized segment should work all right.  The kernel should see that
as sequential reads of different files, issued by different processes;
and if it can't figure out how to process that efficiently then it's a
very sad excuse for a kernel.

You are right that trying to do any detailed I/O scheduling by ourselves
is a doomed exercise.  For better or worse, we have kept ourselves at
sufficient remove from the hardware that we can't possibly do that
successfully.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Robert Haas
On Wed, Jan 28, 2015 at 10:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The problem here, as I see it, is that we're flying blind.  If there's
 just one spindle, I think it's got to be right to read the relation
 sequentially.  But if there are multiple spindles, it might not be,
 but it seems hard to predict what we should do.  We don't know what
 the RAID chunk size is or how many spindles there are, so any guess as
 to how to chunk up the relation and divide up the work between workers
 is just a shot in the dark.

 I thought the proposal to chunk on the basis of each worker processes
 one 1GB-sized segment should work all right.  The kernel should see that
 as sequential reads of different files, issued by different processes;
 and if it can't figure out how to process that efficiently then it's a
 very sad excuse for a kernel.

I agree.  But there's only value in doing something like that if we
have evidence that it improves anything.  Such evidence is presently a
bit thin on the ground.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Amit Kapila
On Wed, Jan 28, 2015 at 7:46 AM, Robert Haas robertmh...@gmail.com wrote:


 All that aside, I still can't account for the numbers you are seeing.
 When I run with your patch and what I think is your test case, I get
 different (slower) numbers.  And even if we've got 6 drives cranking
 along at 400MB/s each, that's still only 2.4 GB/s, not 6 GB/s.  So
 I'm still perplexed.


I have tried the tests again and found that I have forgotten to increase
max_worker_processes due to which the data is so different.  Basically
at higher client count it is just scanning lesser number of blocks in
fixed chunk approach.  So today I again tried with changing
max_worker_processes and found that there is not much difference in
performance at higher client count.  I will take some more data for
both block_by_block and fixed_chunk approach and repost the data.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Robert Haas
On Wed, Jan 28, 2015 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2015-01-26 21:13:31 -0500, Robert Haas wrote:
 So maybe we should also do something like what LWLocks do, and make a
 union between the actual structure and an appropriate array of padding
 bytes - say either 64 or 128 of them.

 Hm. That's a bit bigger patch. I'm inclined to just let it slide for the
 moment. I still have plans to downsize some of sbufdesc's content (move
 the io lock out) and move the content lwlock inline. Then we're not
 going to have much choice but do this...

 Even if you didn't have plans like that, it would be entire folly to
 imagine that buffer headers will be exactly 64 bytes without some forcing
 function for that.  Accordingly, I vote against applying any patch that
 pretends to improve their alignment unless it also does something to
 ensure that the size is a power of 2.  Any notational changes that are
 forced by that would be much better done in a patch that does only that
 than in a patch that also makes functional changes to the header contents.

I entirely agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Andres Freund
On 2015-01-28 14:54:15 +, Kevin Grittner wrote:
 Kevin Grittner kgri...@ymail.com wrote:
  Alexander Korotkov aekorot...@gmail.com wrote:
 
  Could we start snapshot-importing transaction with repeatable
  read isolation level?
 
  You can if you don't use the option which specifies that you want
  serializable behavior.  Why specify --serializable-deferrable if
  you don't?
 
  AFAICS, they should read exactly same data as snapshot-exporting
  serializable transaction.
 
  Sort of.  The behavior once they have a snapshot and are running is
  the same; the difference is whether the snapshot can see a
  transient state which would not be consistent with some serial
  order of transaction execution.
 
 Oh, wait; on a re-read I think I may have misunderstood the question.
 
 If you are talking about having pg_dump acquire a safe snapshot and
 have cooperating processes in the same pg_dump run use that
 snapshot in repeatable read transactions, then yes -- that would
 work.  As long as a repeatable read transaction is using a safe
 snapshot it will not see any anomalies.  That would be a better
 solution if it can be done.  Do you have any code to suggest, or
 should I look at writing it?

Why could it be unsafe to import a snapshot that's been generated as
serializable deferrable into another backend? Doesn't the fact that it
has been exported out of a deferrable xact that's still running pretty
much guarantee that the other xact is also safe?

ISTM that the check is just overzelous and/or needs to be moved into
ImportSnapshot(). There it then could be made to check if the exporting
xact was also deferrable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 Alexander Korotkov aekorot...@gmail.com wrote:

 Could we start snapshot-importing transaction with repeatable
 read isolation level?

 If you are talking about having pg_dump acquire a safe snapshot and
 have cooperating processes in the same pg_dump run use that
 snapshot in repeatable read transactions, then yes -- that would
 work.  As long as a repeatable read transaction is using a safe
 snapshot it will not see any anomalies.

 Why could it be unsafe to import a snapshot that's been generated as
 serializable deferrable into another backend?

That wouldn't be unsafe, which is what I was saying the post you
responded to.

 Doesn't the fact that it
 has been exported out of a deferrable xact that's still running pretty
 much guarantee that the other xact is also safe?

As long as it is at least repeatable read, yes.

 ISTM that the check is just overzelous and/or needs to be moved into
 ImportSnapshot(). There it then could be made to check if the exporting
 xact was also deferrable.

That would be great if ImportSnapshot had access to that
information; I don't see it, though.  Having pg_dump use repeatable
read transactions for the processes that import the snapshot would
work fine, as long as they are reading a snapshot which was
captured by a serializable read only deferrable transaction.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Andres Freund
On 2015-01-28 10:35:28 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-01-26 21:13:31 -0500, Robert Haas wrote:
  So maybe we should also do something like what LWLocks do, and make a
  union between the actual structure and an appropriate array of padding
  bytes - say either 64 or 128 of them.
 
  Hm. That's a bit bigger patch. I'm inclined to just let it slide for the
  moment. I still have plans to downsize some of sbufdesc's content (move
  the io lock out) and move the content lwlock inline. Then we're not
  going to have much choice but do this...
 
 Even if you didn't have plans like that, it would be entire folly to
 imagine that buffer headers will be exactly 64 bytes without some forcing
 function for that.

Meh. The 128 byte additionally used by the alignment don't hurt in any
case. But forcing all buffer descriptors to 64bit on a 32bit platform
isn't guaranteed to be performance neutral.

So, no I don't think it's a folly to do so.

I'd rather make actual progress that improves the absurd situation today
(a factor of 2-3 by changing max_connections by one...) than argue
whether the impact on 32bit platforms is acceptable before doing so. If
we additionally decide to pad the struct, fine. But I don't see why this
has to be done at the same time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
 Such i/o systems do exist, but a single RAID5 group over spinning rust
 with a simple filter isn't going to cut it with a modern CPU- we're just
 too darn efficient to end up i/o bound in that case.

err, to *not* end up i/o bound.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:
 Alexander Korotkov aekorot...@gmail.com wrote:

 Could we start snapshot-importing transaction with repeatable
 read isolation level?

 You can if you don't use the option which specifies that you want
 serializable behavior.  Why specify --serializable-deferrable if
 you don't?

 AFAICS, they should read exactly same data as snapshot-exporting
 serializable transaction.

 Sort of.  The behavior once they have a snapshot and are running is
 the same; the difference is whether the snapshot can see a
 transient state which would not be consistent with some serial
 order of transaction execution.

Oh, wait; on a re-read I think I may have misunderstood the question.

If you are talking about having pg_dump acquire a safe snapshot and
have cooperating processes in the same pg_dump run use that
snapshot in repeatable read transactions, then yes -- that would
work.  As long as a repeatable read transaction is using a safe
snapshot it will not see any anomalies.  That would be a better
solution if it can be done.  Do you have any code to suggest, or
should I look at writing it?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-01-26 21:13:31 -0500, Robert Haas wrote:
 So maybe we should also do something like what LWLocks do, and make a
 union between the actual structure and an appropriate array of padding
 bytes - say either 64 or 128 of them.

 Hm. That's a bit bigger patch. I'm inclined to just let it slide for the
 moment. I still have plans to downsize some of sbufdesc's content (move
 the io lock out) and move the content lwlock inline. Then we're not
 going to have much choice but do this...

Even if you didn't have plans like that, it would be entire folly to
imagine that buffer headers will be exactly 64 bytes without some forcing
function for that.  Accordingly, I vote against applying any patch that
pretends to improve their alignment unless it also does something to
ensure that the size is a power of 2.  Any notational changes that are
forced by that would be much better done in a patch that does only that
than in a patch that also makes functional changes to the header contents.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Josh Berkus
Bruce, Stephen, etc.:

So, I did a test trial of this and it seems like it didn't solve the
issue of huge rsyncs.

That is, the only reason to do this whole business via rsync, instead of
doing a new basebackup of each replica, is to cut down on data transfer
time by not resyncing the data from the old base directory.  But in
practice, the majority of the database files seem like they get
transmitted anyway.  Maybe I'm misreading the rsync ouput?

Here's the setup:

3 Ubuntu 14.04 servers on AWS (tiny instance)
Running PostgreSQL 9.3.5
Set up in cascading replication

108 -- 107 -- 109

The goal was to test this with cascading, but I didn't get that far.

I set up a pgbench workload, read-write on the master and read-only on
the two replicas, to simulate a load-balanced workload.  I was *not*
logging hint bits.

I then followed this sequence:

1) Install 9.4 packages on all servers.
2) Shut down the master.
3) pg_upgrade the master using --link
4) shut down replica 107
5) rsync the master's $PGDATA from the replica:

rsync -aHv --size-only -e ssh --itemize-changes
172.31.4.108:/var/lib/postgresql/ /var/lib/postgresql/

... and got:

.d..t.. 9.4/main/pg_xlog/
f+ 9.4/main/pg_xlog/0007000100CB
.d..t.. 9.4/main/pg_xlog/archive_status/

sent 126892 bytes  received 408645000 bytes  7640596.11 bytes/sec
total size is 671135675  speedup is 1.64

So that's 390MB of data transfer.

If I look at the original directory:

postgres@paul: du --max-depth=1 -h
4.0K./.cache
20K ./.ssh
424M./9.3
4.0K./.emacs.d
51M ./9.4
56K ./bench
474M.

So 390MB were transferred out of a possible 474MB.  That certainly seems
like we're still transferring the majority of the data, even though I
verified that the hard links are being sent as hard links.  No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-28 Thread Tomas Vondra
Hi,

attached is v9 of the patch, modified along the lines of Tom's comments:

1) uses alen=64 for cases with private context, 8 otherwise

2) reverts removal of element_type from initArrayResultArr()

   When element_type=InvalidOid is passed to initArrayResultArr, it
   performs lookup using get_element_type(), otherwise reuses the value
   it receives from the caller.

3) moves the assert into the 'if (release)' branch

4) includes the comments proposed by Ali Akbar in his reviews

   Warnings at makeArrayResult/makeMdArrayResult about freeing memory
   with private subcontexts.


Regarding the performance impact of decreasing the size of the
preallocated array from 64 to just 8 elements, I tried this.

  CREATE TABLE test AS
  SELECT mod(i,10) a, i FROM generate_series(1,64*10) s(i);

  SELECT a, array_agg(i) AS x FRM test GROUP BY 1;

or actually (to minimize transfer overhead):

  SELECT COUNT(x) FROM (
 SELECT a, array_agg(i) AS x FRM test GROUP BY 1
  ) foo;

with work_mem=2GB (so that it really uses HashAggregate). The dataset is
constructed to have exactly 64 items per group, thus exploiting the
difference between alen=8 and alen=64.

With alen=8 I get these timings:

Time: 1892,681 ms
Time: 1879,046 ms
Time: 1892,626 ms
Time: 1892,155 ms
Time: 1880,282 ms
Time: 1868,344 ms
Time: 1873,294 ms

and with alen=64:

Time: 1888,244 ms
Time: 1882,991 ms
Time: 1885,157 ms
Time: 1868,935 ms
Time: 1878,053 ms
Time: 1894,871 ms
Time: 1871,571 ms

That's 1880 vs 1882 on average, so pretty much no difference. Would be
nice if someone else could try this on their machine(s).

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index f3ce1d7..9eb4d63 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -262,7 +262,7 @@ ExecScanSubPlan(SubPlanState *node,
 	/* Initialize ArrayBuildStateAny in caller's context, if needed */
 	if (subLinkType == ARRAY_SUBLINK)
 		astate = initArrayResultAny(subplan-firstColType,
-	CurrentMemoryContext);
+	CurrentMemoryContext, true);
 
 	/*
 	 * We are probably in a short-lived expression-evaluation context. Switch
@@ -964,7 +964,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 	/* Initialize ArrayBuildStateAny in caller's context, if needed */
 	if (subLinkType == ARRAY_SUBLINK)
 		astate = initArrayResultAny(subplan-firstColType,
-	CurrentMemoryContext);
+	CurrentMemoryContext, true);
 
 	/*
 	 * Must switch to per-query memory context.
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 600646e..49fc23a 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -498,8 +498,13 @@ array_agg_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, array_agg_transfn called in non-aggregate context);
 	}
 
-	state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0);
+	if (PG_ARGISNULL(0))
+		state = initArrayResult(arg1_typeid, aggcontext, false);
+	else
+		state = (ArrayBuildState *) PG_GETARG_POINTER(0);
+
 	elem = PG_ARGISNULL(1) ? (Datum) 0 : PG_GETARG_DATUM(1);
+
 	state = accumArrayResult(state,
 			 elem,
 			 PG_ARGISNULL(1),
@@ -573,7 +578,12 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, array_agg_array_transfn called in non-aggregate context);
 	}
 
-	state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
+
+	if (PG_ARGISNULL(0))
+		state = initArrayResultArr(arg1_typeid, InvalidOid, aggcontext, false);
+	else
+		state = (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
+
 	state = accumArrayResultArr(state,
 PG_GETARG_DATUM(1),
 PG_ARGISNULL(1),
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 5591b46..b8c4fba 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -4606,6 +4606,7 @@ array_insert_slice(ArrayType *destArray,
  *
  *	element_type is the array element type (must be a valid array element type)
  *	rcontext is where to keep working state
+ *	subcontext is a flag determining whether to use a separate memory context
  *
  * Note: there are two common schemes for using accumArrayResult().
  * In the older scheme, you start with a NULL ArrayBuildState pointer, and
@@ -4615,24 +4616,43 @@ array_insert_slice(ArrayType *destArray,
  * once per element.  In this scheme you always end with a non-NULL pointer
  * that you can pass to makeArrayResult; you get an empty array if there
  * were no elements.  This is preferred if an empty array is what you want.
+ *
+ * It's possible to choose whether to create a separate memory context for the
+ * array builde state, or whether to allocate it directly within rcontext
+ * (along with various other pieces). This influences memory 

Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Josh Berkus
On 01/28/2015 02:10 PM, Josh Berkus wrote:
 So 390MB were transferred out of a possible 474MB.  That certainly seems
 like we're still transferring the majority of the data, even though I
 verified that the hard links are being sent as hard links.  No?

Looks like the majority of that was pg_xlog.  Going to tear this down
and start over, and --exclude pg_xlog.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-01-28 Thread Jim Nasby

On 1/28/15 12:46 AM, Haribabu Kommi wrote:

Also, what happens if someone reloads the config in the middle of running
the SRF?

hba entries are reloaded only in postmaster process, not in every backend.
So there shouldn't be any problem with config file reload. Am i
missing something?


Ahh, good point. That does raise another issue though... there might well be 
some confusion about that. I think people are used to the varieties of how 
settings come into effect that perhaps this isn't an issue with pg_settings, 
but it's probably worth at least mentioning in the docs for a pg_hba view.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warnings in copy.c

2015-01-28 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2015-01-28 15:05:11 -0500, Stephen Frost wrote:
   Also, you seem to have pushed these commits with a date more than two
   weeks in the past.  Please don't do that!
  
  Oh, wow, sorry about that.  I had expected a rebase to update the date.
 
 It updates the committer, but not the author date. Use --pretty=fuller
 to see all the details. You can pass rebase --ignore-date to also reset
 the author date. Or commit --amend --reset

Thanks for the clarification as to what was happening.  I've modified my
aliases to use 'git am --ignore-date' which appears to have fixed this.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-28 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Jan 9, 2015 at 3:46 PM, Stephen Frost sfr...@snowman.net wrote:
 A policy permits SELECT, INSERT, UPDATE or DELETE commands to access rows
 in a table that has row level security enabled.  Access to existing table
 rows is granted if they match a policy expression specified via USING,
 while new rows that would be created via INSERT or UPDATE are checked
 against policy expressions specified via WITH CHECK.  For policy
 expressions specified via USING which grant access to existing rows, the
 system will generally test the policy expressions prior to any
 qualifications that appear in the query itself, in order to the prevent 
  the
 inadvertent exposure of the protected data to user-defined functions 
  which
 might not be trustworthy.  However, functions and operators marked by the
 system (or the system administrator) as LEAKPROOF may be evaluated before
 policy expressions, as they are assumed to be trustworthy.
 
 I think that sticking while new rows that would be created via INSERT
 or UPDATE are checked against policy expressions specified via WITH
 CHECK into the middle of this is horribly confusing, as it's a
 completely separate mechanism from the rest of what's being discussed
 here.  I think there needs to be some initial language that clarifies
 that USING expressions apply to old rows and WITH CHECK expressions to
 new rows, and then you can go into more detail.  But mentioning WITH
 CHECK parenthetically in the middle of the rest of this I think will
 not lead to clarity.

I agree, especially after going back and re-reading this while fixing
the issue mentioned earlier by Peter (which was an orthogonal complaint
about the shadowing of WITH CHECK by USING, if WITH CHECK isn't
specified).  We really need a paragraph on USING policies and another
on WITH CHECK policies.  How about a reword along these lines:

  When row level security is enabled on a table, all access to that
  table by users other than the owner or a superuser must be through a
  policy.  This requirement applies to both selecting out existing rows
  from the table and to adding rows to the table (through either INSERT
  or UPDATE).

  Granting access to existing rows in a table is done by specifying a
  USING expression which will be added to queries which reference the
  table.  Every row in the table which a USING expression returns true
  will be visible.

  Granting access to add rows to a table is done by specifying a WITH
  CHECK expressison.  A WITH CHECK expression must return true for
  every row being added to the table or an error will be returned and
  the command will be aborted.
  
  For policy expressions specified via USING which grant access to
  existing rows, the system will generally test the policy expressions
  prior to any qualifications that appear in the query itself, in order
  to the prevent the inadvertent exposure of the protected data to
  user-defined functions which might not be trustworthy.  However,
  functions and operators marked by the system (or the system
  administrator) as LEAKPROOF may be evaluated before policy
  expressions, as they are assumed to be trustworthy.

Thoughts?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Small bug on CREATE EXTENSION pgq...

2015-01-28 Thread Stephen Frost
* David Johnston (david.g.johns...@gmail.com) wrote:
 On Wednesday, January 28, 2015, Stephen Frost sfr...@snowman.net wrote:
  Ehh..  Shouldn't we try to take a bit more care that we reset things
  after a CREATE EXTENSION is run?  Not really sure how much effort we
  want to put into it, but I see a bit of blame on both sides here.
 
 Fair enough but reset to what?  I don't know the internal mechanics but
 if the session default is warning and a local change sets it to notice
 then an unconditional reset would not get us back to the intended value.

Yeah, we'd really want to reset it to what it was before.

 Now, if extensions can be handled similarly to how functions operate, where
 one can define function/extension -local settings and have them revert
 after resolution that might be ok.

That, imv, is really what should be happening inside the create
extension script..  I'm not anxious to look into how to make that happen
though.

 That said, while there isn't any way to prevent it the log_min GUCs should
 not be changed by extensions; that decision should be left up to the user.
 The complaint is not that it should be reset but that the installation
 script should not even care what the setting is.

I agree that the script really shouldn't be changing it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Small bug on CREATE EXTENSION pgq...

2015-01-28 Thread David Johnston
On Wednesday, January 28, 2015, Stephen Frost sfr...@snowman.net wrote:

 * David G Johnston (david.g.johns...@gmail.com javascript:;) wrote:
  Jerry Sievers-3 wrote
   Hackers; I noticed this trying to import a large pg_dump file with
   warnings supressed.
  
   It seems loading pgq sets client_min_messages to warning and leaves it
   this way which defeats an attempt to change the setting prior and have
   it stick.
  
   I tested with several other extensions in same DB and only pgq has the
   problem.
  
   Sorry if this is known/fixed already.
 
  This is not part of PostgreSQL proper and thus not supported by -hackers;
  you should report this directly to the authors.

 Ehh..  Shouldn't we try to take a bit more care that we reset things
 after a CREATE EXTENSION is run?  Not really sure how much effort we
 want to put into it, but I see a bit of blame on both sides here.



Fair enough but reset to what?  I don't know the internal mechanics but
if the session default is warning and a local change sets it to notice
then an unconditional reset would not get us back to the intended value.

Now, if extensions can be handled similarly to how functions operate, where
one can define function/extension -local settings and have them revert
after resolution that might be ok.

That said, while there isn't any way to prevent it the log_min GUCs should
not be changed by extensions; that decision should be left up to the user.
The complaint is not that it should be reset but that the installation
script should not even care what the setting is.

David J.


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-28 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jan 7, 2015 at 3:06 PM, Stephen Frost sfr...@snowman.net wrote:
  If I'm following correctly, Peter's specifically talking about:
 
  [ USING ( replaceable class=parameterexpression/replaceable ) ]
  [ WITH CHECK ( replaceable 
  class=parametercheck_expression/replaceable ) ]
 
  Where the USING parameter is 'expression' but the WITH CHECK parameter
  is 'check_expression'.  He makes a good point, I believe, as
  expression is overly generic.  I don't like the idea of using
  barrier_expression though as that ends up introducing a new term- how
  about using_expression?
 
 Oh.  Well, I guess we could change that.  I don't think it's a
 problem, myself.  I thought he was talking about something in the SGML
 markup.

I agree that it's not a big deal, but I agree with Peter that it's
worthwhile to clarify, especially since this will be seen in psql's \h
w/o the rest of the context of the CREATE POLICY documentation.

I've gone ahead and made this minor change.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-28 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 [There's also a typo further down -- filter out the records which are
 visible, should be not visible]

I agree, that's not really worded quite right.  I've reworded this along
the lines of what you suggested (though not exactly- if you get a chance
to review it, that'd be great) and pushed it, so we don't forget to do
so later.

 What do you think of the attached rewording?

Regarding the larger/earlier paragraph, Would be great if you could
comment on my latest suggestion.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Small bug on CREATE EXTENSION pgq...

2015-01-28 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * David Johnston (david.g.johns...@gmail.com) wrote:
 Fair enough but reset to what?  I don't know the internal mechanics but
 if the session default is warning and a local change sets it to notice
 then an unconditional reset would not get us back to the intended value.

 Yeah, we'd really want to reset it to what it was before.

An extension script runs as a single transaction, so SET LOCAL could've
been used to accomplish the result without trashing the session-lifespan
setting.

I'm not sure whether or not there was good reason to be changing the
setting at all, but it's entirely on the extension script's head that
it didn't do this in a less invasive way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Stephen Frost
Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 1/28/15 9:56 AM, Stephen Frost wrote:
 Such i/o systems do exist, but a single RAID5 group over spinning rust
 with a simple filter isn't going to cut it with a modern CPU- we're just
 too darn efficient to end up i/o bound in that case.  A more complex
 filter might be able to change it over to being more CPU bound than i/o
 bound and produce the performance improvments you're looking for.
 
 Except we're nowhere near being IO efficient. The vast difference between 
 Postgres IO rates and dd shows this. I suspect that's because we're not 
 giving the OS a list of IO to perform while we're doing our thing, but that's 
 just a guess.

Uh, huh?  The dd was ~321000 and the slowest uncached PG run from
Robert's latest tests was 337312.554, based on my inbox history at
least.  I don't consider ~4-5% difference to be vast.

 The caveat to this is if you have multiple i/o *channels* (which it
 looks like you don't in this case) where you can parallelize across
 those channels by having multiple processes involved.
 
 Keep in mind that multiple processes is in no way a requirement for that. 
 Async IO would do that, or even just requesting stuff from the OS before we 
 need it.

While I agree with this in principle, experience has shown that it
doesn't tend to work out as well as we'd like with a single process.

  We only support
 multiple i/o channels today with tablespaces and we can't span tables
 across tablespaces.  That's a problem when working with large data sets,
 but I'm hopeful that this work will eventually lead to a parallelized
 Append node that operates against a partitioned/inheirited table to work
 across multiple tablespaces.
 
 Until we can get a single seqscan close to dd performance, I fear worrying 
 about tablespaces and IO channels is entirely premature.

I feel like one of us is misunderstanding the numbers, which is probably
in part because they're a bit piecemeal over email, but the seqscan
speed in this case looks pretty close to dd performance for this
particular test, when things are uncached.  Cached numbers are
different, but that's not what we're discussing here, I don't think.

Don't get me wrong- I've definitely seen cases where we're CPU bound
because of complex filters, etc, but that doesn't seem to be the case
here.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal: two new role attributes and/or capabilities?

2015-01-28 Thread Stephen Frost
Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 12/23/14 12:52 PM, Stephen Frost wrote:
 Autovacuum can certainly run vacuum/analyze on a few tables every 12
 hours, so I'm not really following where you see autovacuum being unable
 to cope.  I agree that there*are*  such cases, but getting more
 information about those cases and exactly what solution*does*  work
 would really help us improve autovacuum to address those use-cases.
 
 (going through some old email...)
 
 The two cases I've dealt with recently are:
 
 - Tables with a fair update/delete rate that should always stay small
 
 The problem with these tables is if anything happens to upset vacuuming you 
 can end up with a significantly larger than expected table that's now 
 essentially impossible to shrink. This could be caused by a single 
 long-running transaction that happens to be in play when autovac kicks off, 
 or for other reasons. Even once you manage to get all the tuples off the end 
 of the heap it can still be extremely difficult to grab the lock you need to 
 truncate it. Running a vacuum every minute from cron seems to help control 
 this. Sadly, your indexes still get bloated, so occasionally you want to 
 re-cluster too.

The difference between the autovacuum-run vacuum and the cron-run vacuum
is that the one running out of cron will just keep holding the lock
until it's actually able to truncate the end of the relation, no?  I
recall discussion previously that we need a way to either support that
in autovacuum for (a configurable set of) regular relations or come up
with a solution that doesn't require that lock.

 - Preemptively vacuuming during off-hours
 
 Many sites have either nightly or weekend periods of reduced load. Such sites 
 can gain a great benefit from scheduling preemptive vacuums to reduce the 
 odds of disruptive vacuuming activity during heavy activity periods. This is 
 especially true when it comes to a scan_all vacuum of a large table; having 
 autovac do one of those at a peak period can really hose things.

Having preferrable times for autovacuum to run vacuums would certainly
be nice to support this use-case.

All that said, I'm not against a role attribute which allows the user to
vacuum/analyze anything.  I do think that's a bit different from the
existing effort to reduce the activities which require superuser as with
the vacuum/analyze case you *could* have a single role that's a member
of every role that owns the relations which you want to vacuum/analyze.
I grant that it's a bit awkward though.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Small bug on CREATE EXTENSION pgq...

2015-01-28 Thread Jerry Sievers
Hackers; I noticed this trying to import a large pg_dump file with
warnings supressed.

It seems loading pgq sets client_min_messages to warning and leaves it
this way which defeats an attempt to change the setting prior and have
it stick.

I tested with several other extensions in same DB and only pgq has the
problem.

Sorry if this is known/fixed already.

Thanks


sj$ cat q
select version();

create database foo template template0;
\c foo

show client_min_messages;

create extension pgq;

show client_min_messages;
reset client_min_messages;
show client_min_messages;

create extension pgq_node;

show client_min_messages;

\c postgres

drop database foo;


sj$ /usr/local/postgresql-9.3/bin/psql -ef q  --no-psqlrc
select version();
   version  
  
--
 PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 
4.7.2, 64-bit
(1 row)

create database foo template template0;
CREATE DATABASE
psql (9.3.5, server 9.3.4)
You are now connected to database foo as user jsievers.
show client_min_messages;
 client_min_messages 
-
 notice
(1 row)

create extension pgq;
CREATE EXTENSION
show client_min_messages;
 client_min_messages 
-
 warning
(1 row)

reset client_min_messages;
RESET
show client_min_messages;
 client_min_messages 
-
 notice
(1 row)

create extension pgq_node;
CREATE EXTENSION
show client_min_messages;
 client_min_messages 
-
 notice
(1 row)

psql (9.3.5, server 9.3.4)
You are now connected to database postgres as user jsievers.
drop database foo;
DROP DATABASE
sj$ 
-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small bug on CREATE EXTENSION pgq...

2015-01-28 Thread David G Johnston
Jerry Sievers-3 wrote
 Hackers; I noticed this trying to import a large pg_dump file with
 warnings supressed.
 
 It seems loading pgq sets client_min_messages to warning and leaves it
 this way which defeats an attempt to change the setting prior and have
 it stick.
 
 I tested with several other extensions in same DB and only pgq has the
 problem.
 
 Sorry if this is known/fixed already.

This is not part of PostgreSQL proper and thus not supported by -hackers;
you should report this directly to the authors.

David J.



--
View this message in context: 
http://postgresql.nabble.com/Small-bug-on-CREATE-EXTENSION-pgq-tp5835899p5835900.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-28 Thread Stephen Frost
Peter,

* Peter Geoghegan (p...@heroku.com) wrote:
 I also don't see this behavior documented (this is from process_policies()):
[...]
 But is that really the right place for it? Does it not equally well
 apply to FOR UPDATE policies, that can on their own have both barriers
 quals and WITH CHECK options? Basically, that seems to me like a
 *generic* property of policies (it's a generic thing that the WITH
 CHECK options/expressions shadow the USING security barrier quals as
 check options), and so should be documented as such.

Thanks, you're right, the documentation there can be improved.  I've
pushed up a change to clarify that the USING quals will be used for the
WITH CHECK case if no WITH CHECK quals are specified.  Hopefully that's
clear now, but please let me know if you feel further improvements to
this would help.

I do think we will be making additional changes in this area before too
long, but good to get it cleaned up now anyway, so we don't forget to
do so later.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Small bug on CREATE EXTENSION pgq...

2015-01-28 Thread Stephen Frost
* David G Johnston (david.g.johns...@gmail.com) wrote:
 Jerry Sievers-3 wrote
  Hackers; I noticed this trying to import a large pg_dump file with
  warnings supressed.
  
  It seems loading pgq sets client_min_messages to warning and leaves it
  this way which defeats an attempt to change the setting prior and have
  it stick.
  
  I tested with several other extensions in same DB and only pgq has the
  problem.
  
  Sorry if this is known/fixed already.
 
 This is not part of PostgreSQL proper and thus not supported by -hackers;
 you should report this directly to the authors.

Ehh..  Shouldn't we try to take a bit more care that we reset things
after a CREATE EXTENSION is run?  Not really sure how much effort we
want to put into it, but I see a bit of blame on both sides here.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Robert Haas
On Wed, Jan 28, 2015 at 8:27 PM, Stephen Frost sfr...@snowman.net wrote:
 I feel like one of us is misunderstanding the numbers, which is probably
 in part because they're a bit piecemeal over email, but the seqscan
 speed in this case looks pretty close to dd performance for this
 particular test, when things are uncached.  Cached numbers are
 different, but that's not what we're discussing here, I don't think.

 Don't get me wrong- I've definitely seen cases where we're CPU bound
 because of complex filters, etc, but that doesn't seem to be the case
 here.

To try to clarify a bit: What we've testing here is a function I wrote
called parallel_count(regclass), which counts all the visible tuples
in a named relation.  That runs as fast as dd, and giving it extra
workers or prefetching or the ability to read the relation with
different I/O patterns never seems to speed anything up very much.

The story with parallel sequential scan itself may well be different,
since that has a lot more CPU overhead than a dumb-simple
tuple-counter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing updates at few places for row level security

2015-01-28 Thread Stephen Frost
Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
 There is a new column added in pg_authid (rolbypassrls)
 and the updation for same is missed in below places:
 
 a. System catalog page for pg_authid
 http://www.postgresql.org/docs/devel/static/catalog-pg-authid.html
 b. Do we want to add this new column in pg_shadow view?

Thanks for this.  Pushed with a few additional changes (included it in
both pg_user and pg_shadow, few other doc updates, etc).

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Josh Berkus
On 01/28/2015 02:28 PM, Josh Berkus wrote:
 On 01/28/2015 02:10 PM, Josh Berkus wrote:
 So 390MB were transferred out of a possible 474MB.  That certainly seems
 like we're still transferring the majority of the data, even though I
 verified that the hard links are being sent as hard links.  No?
 
 Looks like the majority of that was pg_xlog.  Going to tear this down
 and start over, and --exclude pg_xlog.
 

So, having redone this without the pg_xlog lag, this appears to work in
terms of cutting down the rsync volume.

I'm concerned about putting this in the main docs, though.  This is a
complex, and fragile procedure, which is very easy to get wrong, and
hard to explain for a generic case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 On 01/28/2015 02:28 PM, Josh Berkus wrote:
  On 01/28/2015 02:10 PM, Josh Berkus wrote:
  So 390MB were transferred out of a possible 474MB.  That certainly seems
  like we're still transferring the majority of the data, even though I
  verified that the hard links are being sent as hard links.  No?
  
  Looks like the majority of that was pg_xlog.  Going to tear this down
  and start over, and --exclude pg_xlog.
  
 
 So, having redone this without the pg_xlog lag, this appears to work in
 terms of cutting down the rsync volume.
 
 I'm concerned about putting this in the main docs, though.  This is a
 complex, and fragile procedure, which is very easy to get wrong, and
 hard to explain for a generic case.

So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
it's a bit of a complex process to get right.  On the other hand, it's
far better if we put something out there along the lines of if you
really want to, this is how to do it than having folks try to fumble
through to find the correct steps themselves.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Exposing the stats snapshot timestamp to SQL

2015-01-28 Thread Matt Kelly
In a previous thread Tom Lane said:

(I'm also wondering if it'd make sense to expose the stats timestamp
 as a callable function, so that the case could be dealt with
 programmatically as well.  But that's future-feature territory.)


(http://www.postgresql.org/message-id/27251.1421684...@sss.pgh.pa.us)

It seemed the appropriate scope for my first submission, and that feature
has been on my wish list for a while, so I thought I'd grab it.

Main purpose of this patch is to expose the timestamp of the current stats
snapshot so that it can be leveraged by monitoring code.  The obvious case
is alerting if the stats collector becomes unresponsive.  The common use
case is to smooth out graphs which are built from high frequency monitoring
of the stats collector.

The timestamp is already available as part of PgStat_GlobalStats.  This
patch is just the boilerplate (+docs  tests) needed to expose that value
to SQL.  It shouldn't impact anything else in the server.

I'm not particularly attached to the function name, but I didn't have a
better idea.

The patch should apply cleanly to master.

- Matt K


pg_stat_snapshot_timestamp_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and rsync

2015-01-28 Thread Josh Berkus

 So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
 it's a bit of a complex process to get right.  On the other hand, it's
 far better if we put something out there along the lines of if you
 really want to, this is how to do it than having folks try to fumble
 through to find the correct steps themselves.

So, here's the correct steps for Bruce, because his current doc does not
cover all of these.  I really think this should go in as a numbered set
of steps; the current doc has some steps as steps, and other stuff
buried in paragraphs.

1. Install the new version binaries on both servers, alongside the old
version.

2. If not done by the package install, initdb the new version's data
directory.

3. Check that the replica is not very lagged.  If it is, wait for
traffic to die down and for it to catch up.

4. Shut down the master using -m fast or -m smart for a clean shutdown.
 It is not necessary to shut down the replicas yet.

5. pg_upgrade the master using the --link option.  Do not start the new
version yet.

6. create a data directory for the new version on the replica.  This
directory should be empty; if it was initdb'd by the installation
package, then delete its contents.

7. shut down postgres on the replica.

8. rsync both the old and new data directories from the master to the
replica, using the --size-only and -H hard links options.  For example,
if both 9.3 and 9.4 are in /var/lib/postgresql, do:

rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
replica-host:/var/lib/postgresql/

9. Create a recovery.conf file in the replica's data directory with the
appropriate parameters.

10. Start the master, then the replica


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Daniel Bausch
Robert Haas robertmh...@gmail.com writes:

 On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown t...@linux.com wrote:
 On 28 January 2015 at 14:03, Robert Haas robertmh...@gmail.com wrote:
 The problem here, as I see it, is that we're flying blind.  If there's
 just one spindle, I think it's got to be right to read the relation
 sequentially.  But if there are multiple spindles, it might not be,
 but it seems hard to predict what we should do.  We don't know what
 the RAID chunk size is or how many spindles there are, so any guess as
 to how to chunk up the relation and divide up the work between workers
 is just a shot in the dark.

 Can't the planner take effective_io_concurrency into account?

 Maybe.  It's answering a somewhat the right question -- to tell us how
 many parallel I/O channels we think we've got.  But I'm not quite sure
 what the to do with that information in this case.  I mean, if we've
 got effective_io_concurrency = 6, does that mean it's right to start
 scans in 6 arbitrary places in the relation and hope that keeps all
 the drives busy?  That seems like throwing darts at the wall.  We have
 no idea which parts are on which underlying devices.  Or maybe it mean
 we should prefetch 24MB, on the assumption that the RAID stripe is
 4MB?  That's definitely blind guesswork.

 Considering the email Amit just sent, it looks like on this machine,
 regardless of what algorithm we used, the scan took between 3 minutes
 and 5.5 minutes, and most of them took between 4 minutes and 5.5
 minutes.  The results aren't very predictable, more workers don't
 necessarily help, and it's not really clear that any algorithm we've
 tried is clearly better than any other.  I experimented with
 prefetching a bit yesterday, too, and it was pretty much the same.
 Some settings made it slightly faster.  Others made it slower.  Whee!

I have been researching this topic long time ago.  One notably fact is
that active prefetching disables automatic readahead prefetching (by
Linux kernel), which can occour in larger granularities than 8K.
Automatic readahead prefetching occours when consecutive addresses are
read, which may happen by a seqscan but also by accident through an
indexscan in correlated cases.

My consequence was to NOT prefetch seqscans, because OS does good enough
without advice.  Prefetching indexscan heap accesses is very valuable
though, but you need to detect the accidential sequential accesses to
not hurt your performance in correlated cases.

In general I can give you the hint to not only focus on HDDs with their
single spindle.  A single SATA SSD scales up to 32 (31 on Linux)
requests in parallel (without RAID or anything else).  The difference in
throughput is extreme for this type of storage device.  While single
spinning HDDs can only gain up to ~20% by NCQ, SATA SSDs can easily gain
up to 700%.

+1 for using effective_io_concurrency to tune for this, since
prefetching random addresses is effectively a type of parallel I/O.

Regards,
Daniel
-- 
MSc. Daniel Bausch
Research Assistant (Computer Science)
Technische Universität Darmstadt
http://www.dvs.tu-darmstadt.de/staff/dbausch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Jeff Janes
On Wed, Jan 28, 2015 at 1:13 PM, Jim Nasby jim.na...@bluetreble.com wrote:


 My $0.01:

 While I sympathize with Noah's sentiments, the only thing that makes sense
 to me is that a JSON text field is treated the same way as we treat text.
 Right now, that means NUL is not allowed, period.

 If no one has bitched about this with text, is it really that big a
 problem with JSON?


I would bitch about it for text if I thought that it would do any good.


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-28 Thread Pavel Stehule
Hi

I am testing this feature on relative complex schema (38619 tables in db)
and I got deadlock

[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database test2
vacuumdb: vacuuming of database test2 failed: ERROR:  deadlock detected
DETAIL:  Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database
194769; blocked by process 24689.
HINT:  See server log for query details.

ERROR:  deadlock detected
DETAIL:  Process 24689 waits for RowExclusiveLock on relation 2608 of
database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of
database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT:  See server log for query details.
STATEMENT:  VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR:  canceling statement due to user request
STATEMENT:  VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG:  could not send data to client: Broken pipe
STATEMENT:  VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL:  connection to client lost
LOG:  could not send data to client: Broken pipe
ERROR:  canceling statement due to user request
FATAL:  connection to client lost

   Schema   |  Name   | Type  |  Owner   |Size|
Description
+-+---+--++-
 pg_catalog | pg_attribute| table | postgres | 439 MB |
 pg_catalog | pg_rewrite  | table | postgres | 314 MB |
 pg_catalog | pg_proc | table | postgres | 136 MB |
 pg_catalog | pg_depend   | table | postgres | 133 MB |
 pg_catalog | pg_class| table | postgres | 69 MB  |
 pg_catalog | pg_attrdef  | table | postgres | 55 MB  |
 pg_catalog | pg_trigger  | table | postgres | 47 MB  |
 pg_catalog | pg_type | table | postgres | 31 MB  |
 pg_catalog | pg_description  | table | postgres | 23 MB  |
 pg_catalog | pg_index| table | postgres | 20 MB  |
 pg_catalog | pg_constraint   | table | postgres | 17 MB  |
 pg_catalog | pg_shdepend | table | postgres | 17 MB  |
 pg_catalog | pg_statistic| table | postgres | 928 kB |
 pg_catalog | pg_operator | table | postgres | 552 kB |
 pg_catalog | pg_collation| table | postgres | 232 kB |

Regards

Pavel Stehule


2015-01-27 3:26 GMT+01:00 Dilip kumar dilip.ku...@huawei.com:

 On 23 January 2015 21:10, Alvaro Herrera Wrote,


  In case you're up for doing some more work later on, there are two
  ideas
  here: move the backend's TranslateSocketError to src/common, and try to
  merge pg_dump's select_loop function with the one in this new code.
  But that's for another patch anyway (and this new function needs a
  little battle-testing, of course.)
 
 Thanks for your effort, I will take it up for next commitfest..


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] PQgetssl() and alternative SSL implementations

2015-01-28 Thread Robert Haas
On Wed, Jan 28, 2015 at 10:13 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Here's a patch to implement the above scheme. It adds four functions to
 libpq, to interrogate the SSL status:

 int PQsslInUse(const PGconn *conn)
 Returns true (1) if the connection uses SSL, false (0) if not.

 const char *PQsslAttribute(const PGconn *conn, const char *attribute_name)
 Returns a piece of information. The list of attributes depends on the
 implementation, but there are a few that are expected to be supported by all
 of them. See docs for details.

 const char **PQsslAttributes(const PGconn *conn);
 Return an array of SSL attribute names available.

 void *PQsslStruct(const PGconn *conn, const char *struct_name)
 Return a pointer to an SSL-implementation specific object describing the
 connection. PQsslStruct(conn, OpenSSL SSL) is equivalent to
 PQgetssl(conn).

 I think this is expandable enough, because you can easily add attributes
 later on, and different implementations can support different attributes. It
 contains the escape hatch for applications that need to do more, and have
 intimate knowledge of OpenSSL structs. It's also pretty easy to use.

I like it!

Although I think OpenSSL SSL is a little bit duplicatively
redundant.  Why not just OpenSSL?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2015-01-28 Thread Petr Jelinek

On 28/01/15 18:09, Heikki Linnakangas wrote:

On 01/23/2015 02:34 AM, Petr Jelinek wrote:

On 22/01/15 17:02, Petr Jelinek wrote:


The new version (the one that is not submitted yet) of gapless sequence
is way more ugly and probably not best example either but does guarantee
gaplessness (it stores the last value in it's own value table). So I am
not sure if it should be included either...


Here it is as promised.


I generally like the division of labour between the seqam
implementations and the API now.

I don't like the default_sequenceam GUC. That seems likely to create
confusion. And it's not really enough for something like a remote
sequence AM anyway that definitely needs some extra reloptions, like the
hostname of the remote server. The default should be 'local', unless you
specify something else with CREATE SEQUENCE USING - no GUCs.



Hmm, I am not too happy about this, I want SERIAL to work with custom 
sequences (as long as it's possible for the sequence to work that way at 
least). That's actually important feature for me. Your argument about 
this being potential problem for some sequenceAMs is valid though.



Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number
instead of checking whether pg_sequence_dump_state() function exists.
That's what we usually do when new features are introduced. And there's
actually a bug there: you have the check backwards. (try dumping a
database with any sequences in it; it fails)


Right.


* pg_dump should not output a USING clause when a sequence uses the
'local' sequence. No point in adding such noise - making the SQL command
not standard-compatible - for the 99% of databases that don't use other
sequence AMs.


Well this only works if we remove the GUC. Because otherwise if GUC is 
there then you always need to either add USING clause or set the GUC in 
advance (like we do with search_path for example).



* Be careful to escape all strings correctly in pg_dump. I think you're
missing escaping for the 'state' variable at least.


Ouch.


In sequence_save_tuple:

else
{
/*
 * New tuple was not sent, so the original tuple was probably
just
 * changed inline, all we need to do is mark the buffer dirty and
 * optionally log the update tuple.
 */
START_CRIT_SECTION();

MarkBufferDirty(seqh-buf);

if (do_wal)
log_sequence_tuple(seqh-rel, seqh-tup, seqh-buf, page);

END_CRIT_SECTION();
}


This is wrong when checksums are enabled and !do_wal. I believe this
should use MarkBufferDirtyHint().



Oh ok, didn't realize.


Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.


Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,


Yeah but at worst there are some unused rows there, it's not too bad. I 
could also create relation per sequence so that dependency code would 
handle everything correctly, but seems bit too expensive to create not 
one but two relations per sequence...



updating a row in a table for every nextval() call is pretty darn
expensive.


Yes it's expensive, but the gapless sequence also serializes access so 
it will never be speed daemon.



But I don't actually see a problem with dump and restore.


The problem is that the tuple stored in the sequence relation is always 
the one with latest state (and with frozen xid), so pg_dump has no way 
of getting the value as it was at the time it took the snapshot. This 
means that after dump/restore cycle, the sequence can be further away 
than the table it's attached to and you end up with a gap there.




Can you rewrite it without using the values table? AFAICS, there are a
two of problems to solve:

1. If the top-level transaction aborts, you need to restore the old
value. I suggest keeping two values in the sequence tuple: the old,
definitely committed one, and the last value. The last value is only
considered current if the associated XID committed; otherwise the old
value is current. When a transaction is about to commit, it stores its
top-level XID and the new value in the last field, and copies the
previously current value to the old field.



Yes, this is how the previous implementation worked.


2. You need to track the last value on a per-subtransaction basis, until
the transaction commits/rolls back, in order to have rollback to
savepoint to retreat the sequence's value. That can be done in
backend-private memory, maintaining a stack of subtransactions and the
last value of each. Use RegisterSubXactCallback to hook into subxact
commit/abort. Just before top-level commit (in pre-commit callback),
update the sequence tuple with the latest value, so that it gets
WAL-logged 

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2015-01-28 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Right, that was the idea. I wanted it to include the word OpenSSL, to 
 make it clear in the callers that it's specific to OpenSSL. And SSL, 
 because that's the name of the struct. I agree it looks silly, though. 
 One idea is to have two separate arguments: the implementation name, and 
 the struct name. PQgetSSLstruct(ssl, OpenSSL, SSL) would look less 
 silly.

That's probably overkill.  Why not establish a convention that the main
API struct for the library doesn't have to be named?  So it's just
PQgetSSLstruct(ssl, OpenSSL), and you only need strange naming if
you're dealing with a library that actually has more than one API object
that needs to be fetched this way.  (That set is likely empty...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes:
 Am I missing something or has there been no consideration in this forbid
 plan on whether users will be able to retrieve, even if partially
 incorrectly, any jsonb data that has already been stored?

Data that's already been stored would look like the six characters \u,
which indeed might have been what it was anyway, since part of the
problem here is that we fail to distinguish \\u from \u.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 ...hm, I spoke to soon.  So I deleted everything, and booted up a new
 instance 9.4 vanilla with asserts on and took no other action.
 Applying the script with no data activity fails an assertion every
 single time:

 TRAP: FailedAssertion(!(flags  0x0010), File: dynahash.c, Line: 330)

There's no Assert at line 330 in 9.4, though there is in HEAD.  I suspect
what you've got here is a version mismatch; in particular commit
4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c
such that external modules would need to be recompiled to use it without
error.  I'm not real sure though how you are getting past the
loadable-module version check.  Anyway, I'd try make distclean and
full rebuild before anything else.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Andres Freund
On 2015-01-28 13:38:51 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I personally still think that a comment above sbufdesc's definition
  would be sufficient for now. But whatever. I'll enforce 64byte padding
  on 64bit platforms, and do nothing on 32bit platforms.
 
  Patch doing that attached.
 
 Surely the sizeof() in BufferShmemSize needs to be
 sizeof(BufferDescPadded) now?

Good catch. It'd surely fail nastily later, once somebody actually made
the size change away from 64 bytes.

 Also, the macro is unnecessarily unsafe for use in #if tests; why not
 something like

Out of curiosity: What's the danger you're seing?

I'm perfectly fine with using SIZEOF_VOID_P, I actually looked for a
macro like it and didn't find anything.

 #define BUFFERDESC_PADDED_SIZE(SIZEOF_VOID_P == 8 ? 64 : 32)

Hm, did you intentionally put a 32in there or was that just the logical
continuation of 64? Because there's no way it'll ever fit into 32 bytes
in the near future. That's why I had put the sizeof(BufferDesc)
there. We could just make it 1 as well, to indicate that we don't want
any padding...

 Otherwise it looks reasonably sane to me, just in a quick once-over;
 I didn't test.

I tested it on both 32/64 bit linux and it indeed fixes the issue of
changing performance due to max_connections+-1.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread David G Johnston
Tom Lane-2 wrote
 Andrew Dunstan lt;

 andrew@

 gt; writes:
 On 01/27/2015 02:28 PM, Tom Lane wrote:
 Well, we can either fix it now or suffer with a broken representation
 forever.  I'm not wedded to the exact solution I described, but I think
 we'll regret it if we don't change the representation.
 
 The only other plausible answer seems to be to flat out reject \u.
 But I assume nobody likes that.
 
 I don't think we can be in the business of rejecting valid JSON.
 
 Actually, after studying the code a bit, I wonder if we wouldn't be best
 off to do exactly that, at least for 9.4.x.  At minimum we're talking
 about an API change for JsonSemAction functions (which currently get the
 already-de-escaped string as a C string; not gonna work for embedded
 nulls).  I'm not sure if there are already third-party extensions using
 that API, but it seems possible, in which case changing it in a minor
 release wouldn't be nice.  Even ignoring that risk, making sure
 we'd fixed everything seems like more than a day's work, which is as
 much as I for one could spare before 9.4.1.
 
 So at this point I propose that we reject \u when de-escaping JSON.
 Anybody who's seriously unhappy with that can propose a patch to fix it
 properly in 9.5 or later.

The hybrid option is to reject the values for 9.4.1 but then commit to
removing that hack and fixing this properly in 9.4.2; we can always call
that release 9.5...


 I think the it would mean rejecting valid JSON argument is utter
 hogwash.  We already reject, eg, \u00A0 if you're not using a UTF8
 encoding.  And we reject 1e1, not because that's invalid JSON
 but because of an implementation restriction of our underlying numeric
 type.  I don't see any moral superiority of that over rejecting \u
 because of an implementation restriction of our underlying text type.

Am I missing something or has there been no consideration in this forbid
plan on whether users will be able to retrieve, even if partially
incorrectly, any jsonb data that has already been stored?  If we mangled
their data on input we should at least return the data and provide them a
chance to manually (or automatically depending on their data) fix our
mistake.

Given we already disallow NUL in text ISTM that allowing said data in other
text-like areas is asking for just the kind of trouble we are seeing here. 
I'm OK with the proposition that those wishing to utilize NUL are relegated
to working with bytea.

From the commit Tom references down-thread:

However, this led to some perverse results in the case of Unicode
sequences.

Given that said results are not documented in the commit its hard to judge
whether a complete revert is being overly broad...

Anyway, just some observations since I'm not currently a user of JSON. 
Tom's arguments and counter-arguments ring true to me in the general sense. 
The DBA staying on 9.4.0 because of this change probably just needs to be
told to go back to using json and then run the update.  Their data has
issues even they stay on 9.4.0 with the more accepting version of jsonb.

David J.




--
View this message in context: 
http://postgresql.nabble.com/jsonb-unicode-escapes-and-escaped-backslashes-tp5834962p5835824.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Andres Freund
On 2015-01-28 17:08:46 +0100, Andres Freund wrote:
 I just have no idea whether it'd be beneficial to use more space on
 32bit to pad the individual entries. Since this mostly is beneficial on
 multi-socket, highly concurrent workloads, I doubt it really matter.

 I personally still think that a comment above sbufdesc's definition
 would be sufficient for now. But whatever. I'll enforce 64byte padding
 on 64bit platforms, and do nothing on 32bit platforms.

Patch doing that attached. I've for now dealt with the 32bit issue by:

#define BUFFERDESC_PADDED_SIZE  (sizeof(void*) == 8 ? 64 : sizeof(BufferDesc))
and
 * XXX: As this is primarily matters in highly concurrent workloads which
 * probably all are 64bit these days, and the space wastage would be a bit
 * more noticeable on 32bit systems, we don't force the stride to be cache
 * line sized. If somebody does actual performance testing, we can reevaluate.

I've replaced direct accesses to the (Local)BufferDescriptors array by a
wrapper macros to hide the union indirect and allow potential future
changes to be made more easily.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From caeb98e4ab48748556f4983fdaa52098a190aa57 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 28 Jan 2015 18:51:50 +0100
Subject: [PATCH 1/3] Fix #ifdefed'ed out code to compile again.

---
 src/backend/storage/buffer/bufmgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7eb2d22..f92d0ef 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2764,7 +2764,7 @@ PrintPinnedBufs(void)
  [%02d] (freeNext=%d, rel=%s, 
  blockNum=%u, flags=0x%x, refcount=%u %d),
  i, buf-freeNext,
- relpath(buf-tag.rnode, buf-tag.forkNum),
+ relpathperm(buf-tag.rnode, buf-tag.forkNum),
  buf-tag.blockNum, buf-flags,
  buf-refcount, GetPrivateRefCount(i + 1));
 		}
-- 
2.0.0.rc2.4.g1dc51c6.dirty

From 2bb5f85d8c46cea30b3d1d80b781574c2f2a3903 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 28 Jan 2015 13:55:36 +0100
Subject: [PATCH 2/3] Align buffer descriptors to cache line boundaries.

Benchmarks has shown that aligning the buffer descriptor array to
cache lines is important for scalability, especially on bigger,
multi-socket, machines.

Currently the array sometimes already happens to be aligned by
happenstance, depending how large previous shared memory allocations
were. That can lead to wildly varying performance results, after minor
configuration changes.

In addition to aligning the start of array array, also force the size
of individual descriptors to be of common cache line size (64
bytes). That happens to already be the case on 64bit platforms, but
this way we can change the definition more easily.

As the alignment primarily matters in highly concurrent workloads
which probably all are 64bit these days, and the space wastage of
element alignment would be a bit more noticeable on 32bit systems, we
don't force the stride to be cacheline sized for now. If somebody does
actual performance testing, we can reevaluate that decision by
changing the definition of BUFFERDESC_PADDED_SIZE.

Discussion: 20140202151319.gd32...@awork2.anarazel.de

Per discussion with Bruce Momjan, Robert Haas, Tom Lane and Peter
Geoghegan.
---
 contrib/pg_buffercache/pg_buffercache_pages.c |  6 ++-
 src/backend/storage/buffer/buf_init.c | 19 
 src/backend/storage/buffer/bufmgr.c   | 66 ++-
 src/backend/storage/buffer/freelist.c |  6 +--
 src/backend/storage/buffer/localbuf.c | 12 ++---
 src/include/c.h   |  1 +
 src/include/storage/buf_internals.h   | 34 +-
 7 files changed, 91 insertions(+), 53 deletions(-)

diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index d00f985..98016fc 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -73,7 +73,6 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
 	if (SRF_IS_FIRSTCALL())
 	{
 		int			i;
-		volatile BufferDesc *bufHdr;
 
 		funcctx = SRF_FIRSTCALL_INIT();
 
@@ -146,8 +145,11 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
 		 * Scan though all the buffers, saving the relevant fields in the
 		 * fctx-record structure.
 		 */
-		for (i = 0, bufHdr = BufferDescriptors; i  NBuffers; i++, bufHdr++)
+		for (i = 0; i  NBuffers; i++)
 		{
+			volatile BufferDesc *bufHdr;
+
+			bufHdr = GetBufferDescriptor(i);
 			/* Lock each buffer header before inspecting. */
 			LockBufHdr(bufHdr);
 
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 4434bc3..c013e10 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ 

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I personally still think that a comment above sbufdesc's definition
 would be sufficient for now. But whatever. I'll enforce 64byte padding
 on 64bit platforms, and do nothing on 32bit platforms.

 Patch doing that attached.

Surely the sizeof() in BufferShmemSize needs to be
sizeof(BufferDescPadded) now?

Also, the macro is unnecessarily unsafe for use in #if tests; why not
something like

#define BUFFERDESC_PADDED_SIZE  (SIZEOF_VOID_P == 8 ? 64 : 32)

Otherwise it looks reasonably sane to me, just in a quick once-over;
I didn't test.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I still haven't categorically ruled out pl/sh yet; that's something to
 keep in mind.

 Well, after bisection proved not to be fruitful, I replaced the pl/sh
 calls with dummy calls that approximated the same behavior and the
 problem went away.  So again, it looks like this might be a lot of
 false alarm.  A pl/sh driven failure might still be interesting if
 it's coming from the internal calls it's making, so I'm still chasing
 it down.

...hm, I spoke to soon.  So I deleted everything, and booted up a new
instance 9.4 vanilla with asserts on and took no other action.
Applying the script with no data activity fails an assertion every
single time:


mmoncure@mernix2 12:25 PM (REL9_4_STABLE) ~/src/p94$ cat
/mnt/ssd/data/pg_log/postgresql-28.log
[ 12287 2015-01-28 12:24:24.080 CST 0]LOG:  received smart shutdown request
[ 13516 2015-01-28 12:24:24.080 CST 0]LOG:  autovacuum launcher shutting down
[ 13513 2015-01-28 12:24:24.081 CST 0]LOG:  shutting down
[ 13513 2015-01-28 12:24:24.083 CST 0]LOG:  database system is shut down
[ 14481 2015-01-28 12:24:25.127 CST 0]LOG:  database system was shut
down at 2015-01-28 12:24:24 CST
[ 14457 2015-01-28 12:24:25.129 CST 0]LOG:  database system is ready
to accept connections
[ 14485 2015-01-28 12:24:25.129 CST 0]LOG:  autovacuum launcher started
TRAP: FailedAssertion(!(flags  0x0010), File: dynahash.c, Line: 330)
[ 14457 2015-01-28 12:24:47.983 CST 0]LOG:  server process (PID 14545)
was terminated by signal 6: Aborted
[ 14457 2015-01-28 12:24:47.983 CST 0]DETAIL:  Failed process was
running: SELECT CDSStartRun()
[ 14457 2015-01-28 12:24:47.983 CST 0]LOG:  terminating any other
active server processes
[cds2 14546 2015-01-28 12:24:47.983 CST 0]WARNING:  terminating
connection because of crash of another server process
[cds2 14546 2015-01-28 12:24:47.983 CST 0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[cds2 14546 2015-01-28 12:24:47.983 CST 0]HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
[ 14485 2015-01-28 12:24:47.983 CST 0]WARNING:  terminating connection
because of crash of another server process
[ 14485 2015-01-28 12:24:47.983 CST 0]DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[ 14485 2015-01-28 12:24:47.983 CST 0]HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
[ 14457 2015-01-28 12:24:47.984 CST 0]LOG:  all server processes
terminated; reinitializing
[ 14554 2015-01-28 12:24:47.995 CST 0]LOG:  database system was
interrupted; last known up at 2015-01-28 12:24:25 CST
[ 14554 2015-01-28 12:24:47.995 CST 0]LOG:  database system was not
properly shut down; automatic recovery in progress
[ 14554 2015-01-28 12:24:47.997 CST 0]LOG:  invalid magic number 
in log segment 00010001, offset 13000704
[ 14554 2015-01-28 12:24:47.997 CST 0]LOG:  redo is not required
[ 14457 2015-01-28 12:24:48.000 CST 0]LOG:  database system is ready
to accept connections
[ 14558 2015-01-28 12:24:48.000 CST 0]LOG:  autovacuum launcher started


merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-28 Thread Kevin Grittner
Alexander Korotkov aekorot...@gmail.com wrote:

 when pg_dump is run with both --serializable-deferrable and -j
 options to pg_dump, it returns errors:

 pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing 
 transaction must not be READ ONLY DEFERRABLE
 pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing 
 transaction must not be READ ONLY DEFERRABLE
 pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing 
 transaction must not be READ ONLY DEFERRABLE
 pg_dump: [parallel archiver] query was: SET TRANSACTION SNAPSHOT '0001E300-1'
 pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing 
 transaction must not be READ ONLY DEFERRABLE

 I've checked it on 9.4.0 and 9.3.5.

 So, these options are incompatible now.

The only way to make them compatible would be to have some way for
an exported snapshot to indicate that it is safe against
serialization anomalies if used for a serializable transaction, and
to only generate an error if a non-safe snapshot is used with the
--serializable-deferrable option.

 Could we start snapshot-importing transaction with repeatable
 read isolation level?

You can if you don't use the option which specifies that you want
serializable behavior.  Why specify --serializable-deferrable if
you don't?

 AFAICS, they should read exactly same data as snapshot-exporting
 serializable transaction.

Sort of.  The behavior once they have a snapshot and are running is
the same; the difference is whether the snapshot can see a
transient state which would not be consistent with some serial
order of transaction execution.  For examples, see:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

It's really a question of why you want the dump.  If it is for
crash recovery, you don't need --serializable-deferrable.  If you
want to restore it to run reports that only show states consistent
with some serial execution of serializable transactions you *do*
need --serializable-deferrable, so that the snapshot used for the
dump reflects a point in the commit stream when no serialization
anomalies can be visible to a read-only transaction.

 If not, could pg_dump return some more friendly error before
 actually trying to dump?

Sure, until such time as it is possible to request a
serializable-safe snapshot and flag it as such on export (and check
for that in pg_dump), we could add a validation that these two
options are incompatible.  If there are no objections, I can push
something to do that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2015-01-28 Thread Heikki Linnakangas

On 08/20/2014 12:58 AM, Heikki Linnakangas wrote:

On 08/19/2014 10:31 PM, Robert Haas wrote:

On Tue, Aug 19, 2014 at 3:16 PM, Magnus Hagander mag...@hagander.net wrote:

On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Robert's got a point though: there is always going to be somebody who
wants something we fail to expose.  It's better to be able to say well,
you can do PQgetssl and then munge it for yourself than to have to say
sorry, you're screwed.  So if we're going to define PQgetssl as
returning NULL when you're not using OpenSSL, I don't see why we
shouldn't expose a similarly-defined PQgetXXX for each other underlying
implementation we support.  There will not be that many of 'em, and
I suspect the people with very specific needs will not care about more
than one underlying library anyway.

This does not say that we shouldn't also try to have some
library-independent functionality for interrogating certificate state
etc.  Just that having an escape hatch isn't a bad thing.


Yeah, wouldn't hurt I guess.


I do agree tha thaving both would be useful. We could have something like
int PQgetSSLstruct(void **sslstruct)


I think it's likely smarter to have totally separate functions.
First, to make it less likely that users will try to use a pointer to
one type of object as a pointer to some other kind of object.  And
second, because you might, for example, someday have an SSL
implementation that wants to return two pointers.  May as well make
that kind of thing easy.


The struct it returns is totally SSL-implementation specific anyway, so
for an implementation that would like to return two structs, you could
well define it to return a struct like:

struct {
  CoolStructA *a;
  CoolStructB *b;
} CoolSSLStruct;

I don't much like adding a separate function for every SSL
implementation, but you've got a point that it would be nice to make it
difficult to call PQgetSSLstruct() and just assume that the returned
struct is e.g an OpenSSL struct, while it's actually something else.
Perhaps:

int PQgetSSLstruct(void **sslstruct, char *structname)

You'd call it like PQgetSSLStruct(mystruct, openssl), and it checks
that the argument matches the library actually been used, otherwise it
returns an error. And if you need to return two structs, you'd call it
twice: PQgetSSLStruct(a, cool_a) and PQgetSSLStruct(b, cool_b).


Here's a patch to implement the above scheme. It adds four functions to 
libpq, to interrogate the SSL status:


int PQsslInUse(const PGconn *conn)
Returns true (1) if the connection uses SSL, false (0) if not.

const char *PQsslAttribute(const PGconn *conn, const char *attribute_name)
Returns a piece of information. The list of attributes depends on the 
implementation, but there are a few that are expected to be supported by 
all of them. See docs for details.


const char **PQsslAttributes(const PGconn *conn);
Return an array of SSL attribute names available.

void *PQsslStruct(const PGconn *conn, const char *struct_name)
Return a pointer to an SSL-implementation specific object describing the 
connection. PQsslStruct(conn, OpenSSL SSL) is equivalent to 
PQgetssl(conn).



I think this is expandable enough, because you can easily add attributes 
later on, and different implementations can support different 
attributes. It contains the escape hatch for applications that need to 
do more, and have intimate knowledge of OpenSSL structs. It's also 
pretty easy to use.


Thoughts?

- Heikki

From 0e0f2ceb86cf99611c00e57efdbc84615347542e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Thu, 27 Nov 2014 20:00:36 +0200
Subject: [PATCH 1/1] Add API functions to libpq to interrogate SSL related
 stuff.

This makes it possible to query for things like the SSL version and cipher
used, without depending on OpenSSL functions or macros. That is a good
thing if we ever get another SSL implementation.
---
 doc/src/sgml/libpq.sgml  | 154 +++
 src/bin/psql/command.c   |  35 +++
 src/interfaces/libpq/exports.txt |   4 +
 src/interfaces/libpq/fe-secure-openssl.c |  68 ++
 src/interfaces/libpq/fe-secure.c |  20 
 src/interfaces/libpq/libpq-fe.h  |   6 ++
 6 files changed, 250 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index de272c5..b96686a 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1848,33 +1848,130 @@ int PQconnectionUsedPassword(const PGconn *conn);
   /para
  /listitem
 /varlistentry
+   /variablelist
+  /para
 
-varlistentry id=libpq-pqgetssl
- termfunctionPQgetssl/functionindextermprimaryPQgetssl///term
+  para
+The following functions return information related to SSL. This information
+usually doesn't change after a connection is established.
+
+variablelist
+varlistentry id=libpq-pqsslinuse
+ 

Re: [HACKERS] Parallel Seq Scan

2015-01-28 Thread Robert Haas
On Wed, Jan 28, 2015 at 9:12 AM, Thom Brown t...@linux.com wrote:
 On 28 January 2015 at 14:03, Robert Haas robertmh...@gmail.com wrote:
 The problem here, as I see it, is that we're flying blind.  If there's
 just one spindle, I think it's got to be right to read the relation
 sequentially.  But if there are multiple spindles, it might not be,
 but it seems hard to predict what we should do.  We don't know what
 the RAID chunk size is or how many spindles there are, so any guess as
 to how to chunk up the relation and divide up the work between workers
 is just a shot in the dark.

 Can't the planner take effective_io_concurrency into account?

Maybe.  It's answering a somewhat the right question -- to tell us how
many parallel I/O channels we think we've got.  But I'm not quite sure
what the to do with that information in this case.  I mean, if we've
got effective_io_concurrency = 6, does that mean it's right to start
scans in 6 arbitrary places in the relation and hope that keeps all
the drives busy?  That seems like throwing darts at the wall.  We have
no idea which parts are on which underlying devices.  Or maybe it mean
we should prefetch 24MB, on the assumption that the RAID stripe is
4MB?  That's definitely blind guesswork.

Considering the email Amit just sent, it looks like on this machine,
regardless of what algorithm we used, the scan took between 3 minutes
and 5.5 minutes, and most of them took between 4 minutes and 5.5
minutes.  The results aren't very predictable, more workers don't
necessarily help, and it's not really clear that any algorithm we've
tried is clearly better than any other.  I experimented with
prefetching a bit yesterday, too, and it was pretty much the same.
Some settings made it slightly faster.  Others made it slower.  Whee!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

2015-01-28 Thread Merlin Moncure
On Wed, Jan 28, 2015 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 ...hm, I spoke to soon.  So I deleted everything, and booted up a new
 instance 9.4 vanilla with asserts on and took no other action.
 Applying the script with no data activity fails an assertion every
 single time:

 TRAP: FailedAssertion(!(flags  0x0010), File: dynahash.c, Line: 330)

 There's no Assert at line 330 in 9.4, though there is in HEAD.  I suspect
 what you've got here is a version mismatch; in particular commit
 4a14f13a0abfbf7e7d44a3d2689444d1806aa9dc changed the API for dynahash.c
 such that external modules would need to be recompiled to use it without
 error.  I'm not real sure though how you are getting past the
 loadable-module version check.  Anyway, I'd try make distclean and
 full rebuild before anything else.

you're right -- git got confused and built a 9.5 devel at some random
commit.  I reran it again on 9.4 and it worked ok :(.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-01-28 13:38:51 -0500, Tom Lane wrote:
 #define BUFFERDESC_PADDED_SIZE   (SIZEOF_VOID_P == 8 ? 64 : 32)

 Hm, did you intentionally put a 32in there or was that just the logical
 continuation of 64? Because there's no way it'll ever fit into 32 bytes
 in the near future. That's why I had put the sizeof(BufferDesc)
 there. We could just make it 1 as well, to indicate that we don't want
 any padding...

Yeah, 1 would be fine too.  Maybe better to call it BUFFERDESC_MIN_SIZE,
because as this stands it's enforcing a min size not exact size.  (I'm
not going to whinge about that aspect of it; the main point here is to
put in the union and fix the ensuing notational fallout.  We can worry
about exactly what size to pad to as a separate discussion.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-01-28 Thread Fabrízio de Royes Mello
On Wed, Jan 28, 2015 at 4:46 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Wed, Jan 28, 2015 at 9:47 AM, Jim Nasby jim.na...@bluetreble.com
wrote:
  On 1/27/15 1:04 AM, Haribabu Kommi wrote:
 
  Here I attached the latest version of the patch.
  I will add this patch to the next commitfest.
 
 
  Apologies if this was covered, but why isn't the IP address an inet
instead
  of text?

 Corrected the address type as inet instead of text. updated patch is
attached.

  Also, what happens if someone reloads the config in the middle of
running
  the SRF?

 hba entries are reloaded only in postmaster process, not in every backend.
 So there shouldn't be any problem with config file reload. Am i
 missing something?


I don't have any comment to disagree with the patch (idea and code).

But I'm thinking about this patch and would not be interesting to have a
FDW to manipulate the hba file? Imagine if we are able to manipulate the
HBA file using INSERT/UPDATE/DELETE.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-01-28 Thread Tom Lane
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 But I'm thinking about this patch and would not be interesting to have a
 FDW to manipulate the hba file? Imagine if we are able to manipulate the
 HBA file using INSERT/UPDATE/DELETE.

Since the HBA file is fundamentally order-dependent, while SQL tables
are fundamentally not, that doesn't seem like a great API match.  You
could probably brute-force something that would work, but it would very
much be a case of using a hammer to solve a screwdriver problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compiler warnings in copy.c

2015-01-28 Thread Robert Haas
My compiler is unhappy with the latest changes to copy.c:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -Wall -Werror
-I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
copy.o copy.c -MMD -MP -MF .deps/copy.Po
copy.c: In function ‘DoCopy’:
copy.c:924:30: error: ‘rte’ may be used uninitialized in this function
[-Werror=uninitialized]
copy.c:793:17: note: ‘rte’ was declared here
cc1: all warnings being treated as errors

From what I can see, this is a pretty legitimate complaint.  If
stmt-relation == NULL, then rte never gets initialized, but we still
do cstate-range_table = list_make1(rte).  That can't be good.

Also, you seem to have pushed these commits with a date more than two
weeks in the past.  Please don't do that!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warnings in copy.c

2015-01-28 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 My compiler is unhappy with the latest changes to copy.c:
 
 gcc -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels
 -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
 -fwrapv -fexcess-precision=standard -g -O2 -Wall -Werror
 -I../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
 copy.o copy.c -MMD -MP -MF .deps/copy.Po
 copy.c: In function ‘DoCopy’:
 copy.c:924:30: error: ‘rte’ may be used uninitialized in this function
 [-Werror=uninitialized]
 copy.c:793:17: note: ‘rte’ was declared here
 cc1: all warnings being treated as errors

Huh, interesting that mine didn't.

 From what I can see, this is a pretty legitimate complaint.  If
 stmt-relation == NULL, then rte never gets initialized, but we still
 do cstate-range_table = list_make1(rte).  That can't be good.

Yeah, I'll fix that.

 Also, you seem to have pushed these commits with a date more than two
 weeks in the past.  Please don't do that!

Oh, wow, sorry about that.  I had expected a rebase to update the date.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] compiler warnings in copy.c

2015-01-28 Thread Andres Freund
On 2015-01-28 15:05:11 -0500, Stephen Frost wrote:
  Also, you seem to have pushed these commits with a date more than two
  weeks in the past.  Please don't do that!
 
 Oh, wow, sorry about that.  I had expected a rebase to update the date.

It updates the committer, but not the author date. Use --pretty=fuller
to see all the details. You can pass rebase --ignore-date to also reset
the author date. Or commit --amend --reset

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers