Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-17 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 14 Nov 2016 11:41:52 +0100, Emre Hasegeli  wrote in 

> > What way to deal with it is in your mind? The problem hides
> > behind operators. To fix it a user should rewrite a expression
> > using more primitive operators. For example, (line_a # point_a)
> > should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
> > more primitive way. I regared this that the operator # just
> > become useless.
> 
> Simple equations like this works well with the current algorithm:
> 
> > select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line;

That's too simple for this discussion. If it satisfies someone's
requirement, a geometiric type with integer would go. Any
trigonometric calculations (by other than right angles) would
break it.

> The operator does what you expect from it.  Users can use something
> like you described to get fuzzy behaviour with an epsilon they choose.
> 
> > Regarding optimization, at least gcc generates seemingly not so
> > different code for the two. The both generally generates extended
> > code directly calling isnan() and so. Have you measured the
> > performance of the two implement (with -O2, without
> > --enable-cassert)?  This kind of hand-optimization gets
> > legitimacy when we see a siginificant difference, according to
> > the convention here.. I suppose.
> 
> I tested it with this program:
> 
> > int main()
> > {
> >double  i,
> >j;
> >int result = 0;
> >
> >for (i = 0.1; i < 1.0; i += 1.0)
> >for (j = 0.1; j < 1.0; j += 1.0)
> >if (float8_lt(i, j))
> >result = (result + 1) % 10;
> >
> >return result;
> > }
> 
> The one calling cmp() was noticeable slower.
> 
> ./test1  0.74s user 0.00s system 99% cpu 0.748 total
> ./test2  0.89s user 0.00s system 99% cpu 0.897 total
> 
> This would probably be not much noticeable by calling SQL functions
> which are doing a few comparisons only, but it may be necessary to do
> many more comparisons on some places.  I don't find the optimised
> versions less clear than calling the cmp().  I can change it the other
> way, if you find it more clear.

Thanks for the measurment. I had similar numbers (14:13) for
separate code from Pg, but probably this difference is buried
under other steps. The law of stay-stupid says that this kind of
optimization should be consider only after it found to be
problematic, I think.


> > At least the comment you dropped by the patch,
> >
> >  int
> >  float4_cmp_internal(float4 a, float4 b)
> >  {
> > -   /*
> > -* We consider all NANs to be equal and larger than any non-NAN. 
> > This is
> > -* somewhat arbitrary; the important thing is to have a consistent 
> > sort
> > -* order.
> > -*/
> >
> > seems very significant and should be kept anywhere relevant.
> 
> I will add it back on the next version.
> 
> > I seached pgsql-general ML but counldn't find a complaint about
> > the current behavior. Even though I'm not familar with PostGIS, I
> > found that it uses exactly the same EPSILON method with
> > PostgreSQL.
> 
> Is it?  I understood from Paul Ramsey's comment on this thread [1]
> that they don't.

I saw the repository of PostGIS by myself, but only in
box2d.c. Other objects have their own EPSILONs, EPSILOG_SQLMM,
DBL_EPSLON, FLT_EPSILON... Maybe that's all

|  * Copyright (C) 2008-2011 Paul Ramsey 
|  * Copyright (C) 2008 Mark Cave-Ayland 
| 
| #ifndef EPSILON
| #define EPSILON1.0E-06
| #endif
| #ifndef FPeq
| #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON)
| #endif

I agree that we should have our own tolerance method, and it migh
be an explicit one. But anyway we need any kind of tolerance on
comparing FP numbers. Removing such tolerance as preparation for
implementing another one is quite natural, but released version
should have any kind of tolerance.

> > If we had an apparent plan to use them for other than
> > earth-scale(?)  geometric usage, we could design what they should
> > be. But without such a plan it is just a breakage of the current
> > usage.
> 
> We give no promises about the geometric types being useful in earth scale.

So, we shouldn't just remove it.

> > About What kind of (needless) complication you are saying? The
> > fuzziness seems to me essential for geometric comparisons to work
> > practically. Addition to that, I don't think that we're not
> > allowed to change the behavior in such area of released versions
> > the time after time.
> 
> Even when it is a total mess?

Yes. The seemingly 'total-mess' is intended for a use with
lon/lat values and probably gave useful result at the time of
development. The mess is rather better than just ruined.

> > I don't think index scan and tolerant comparison are not
> > contradicting. Could you let me have an description about the
> > indexing capabilities and the 

Re: [HACKERS] pg_hba_file_settings view patch

2016-11-17 Thread Haribabu Kommi
On Thu, Nov 17, 2016 at 10:13 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Nov 16, 2016 at 4:40 PM, Ashutosh Bapat
>  wrote:
> > make check run with this patch shows server crashes. regression.out
> > attached. I have run make check after a clean build, tried building it
> > after running configure, but the problem is always reproducible. Do
> > you see this problem?
>

Thanks for reviewing the patch.

No. I am not able to reproduce this problem.
make check works fine in my system.

>From the regression.out file, the crash occurred in select_parallel.out,
I don't think this patch has any affect on that test.

> Also the patch has a white space error.
> > git diff --check
> > src/backend/utils/init/postinit.c:729: space before tab in indent.
> > +   /*
> >
>

corrected.


> I looked at the patch in some more details and here are some more comments
> 1. In catalog.sgml, the sentence "If the configuration file contains any
> errors
> ..." looks redundant, as description of "error" field says so. Removed it
> in
> the attached patch. In that example, you might want to provide pg_hba.conf
> contents to help understand the view output.
>

updated details, but not exactly what you said. check it once.

2. I think the view will be useful, if loading the file did not have the
> desired effects, whether because of SIGHUP or a fresh start. So, may be the
> sentence "for diagnosing problems if a SIGHUP signal did not have the
> desired
> effects.", should be changed to be more generic e.g. ... if loading file
> did
> not have ... .
>

changed.


> 3. Something wrong with the indentation, at least not how pg_indent would
> indent
> the variable names.
> +typedef struct LookupHbaLineCxt
> +{
> +MemoryContext memcxt;
> +TupleDesctupdesc;
> +Tuplestorestate *tuple_store;
> +} LookupHbaLineCxt;
>

corrected.


> +static void lookup_hba_line_callback(void *context, int lineno,
> HbaLine *hba, const char *err_msg);
> Overflows 80 character limit.
>

corrected.


> in parse_hba_line()
> -ereport(LOG,
> +ereport(level,
>  (errcode(ERRCODE_CONFIG_FILE_ERROR),
>   errmsg("invalid connection type \"%s\"",
>  token->string),
>   errcontext("line %d of configuration file \"%s\"",
>  line_num, HbaFileName)));
> +*err_msg = pstrdup(_("invalid connection type"));
>
> Is the difference between error reported to error log and that in the view
> intentional? That brings to another question. Everywhere, in similar code,
> the
> patch adds a line *err_msg = pstrdup() or psprinf() and copies the
> arguements
> to errmsg(). Someone modifying the error message has to duplicate the
> changes.
> Since the code is just few lines away, it may not be hard to duplicate the
> changes, but still that's a maintenance burder. Isn't there a way to
> compute
> the message once and use it twice? show_all_file_settings() used for
> pg_file_settings also has similar problem, so may be it's an accepted
> practice.
> There are multiple instances of such a difference, but may be the invalid
> value
> can be found out from the value of the referenced field (which will be
> part of
> the view). So, may be it's ok. But that not true with the difference below.
> gai_strerror() may not be obvious from the referenced field.
> -ereport(LOG,
> +ereport(level,
>  (errcode(ERRCODE_CONFIG_FILE_ERROR),
>   errmsg("invalid IP address \"%s\": %s",
>  str, gai_strerror(ret)),
>   errcontext("line %d of configuration file
> \"%s\"",
>  line_num, HbaFileName)));
>  if (gai_result)
>  pg_freeaddrinfo_all(hints.ai_family, gai_result);
> +*err_msg = pstrdup(_("invalid IP address"));
>

Reused the error string once, as in this patch it chances in many places
compared
to pg_file_settings, so I tend to reuse it.


> 4.
> +if (!rsi || !IsA(rsi, ReturnSetInfo) ||
> +(rsi->allowedModes & SFRM_Materialize) == 0)
> +ereport(ERROR,
> +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("set-valued function called in context that
> cannot accept a set")));
> show_all_file_settings(), a function similar to this one, splits the
> condition
> above into two and throws different error message for each of them.
> /* Check to see if caller supports us returning a tuplestore */
> if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>  errmsg("set-valued function called in context that
> cannot accept a set")));
> if (!(rsinfo->allowedModes & SFRM_Materialize))
> ereport(ERROR,
> 

Re: [HACKERS] Hash Indexes

2016-11-17 Thread Amit Kapila
On Thu, Nov 17, 2016 at 10:54 PM, Robert Haas  wrote:
> On Thu, Nov 17, 2016 at 12:05 PM, Amit Kapila  wrote:
>
>>> I think this comment is saying that we'll release the pin on the
>>> primary bucket page for now, and then reacquire it later if the user
>>> reverses the scan direction.  But that doesn't sound very safe,
>>> because the bucket could be split in the meantime and the order in
>>> which tuples are returned could change.  I think we want that to
>>> remain stable within a single query execution.
>>
>> Isn't that possible even without the patch?  Basically, after reaching
>> end of forward scan and for doing backward *all* scan, we need to
>> perform portal rewind which will in turn call hashrescan where we will
>> drop the lock on bucket and then again when we try to move cursor
>> forward we acquire lock in _hash_first(), so in between when we don't
>> have the lock, the split could happen and next scan results could
>> differ.
>
> Well, the existing code doesn't drop the heavyweight lock at that
> location, but your patch does drop the pin that serves the same
> function, so I feel like there must be some difference.
>

Yes, but I am not sure if existing code is right.  Consider below scenario,

Session-1

Begin;
Declare c cursor for select * from t4 where c1=1;
Fetch forward all from c;  --here shared heavy-weight lock count becomes 1
Fetch prior from c; --here shared heavy-weight lock count becomes 2
close c; -- here, lock release will reduce the lock count and shared
heavy-weight lock count becomes 1

Now, if we try to insert from another session, such that it leads to
bucket-split of the bucket for which session-1 had used a cursor, it
will wait for session-1. The insert can only proceed after session-1
performs the commit.  I think after the cursor is closed in session-1,
the insert from another session should succeed, don't you think so?

>>> Come to think of it, I'm a little worried about the locking in
>>> _hash_squeezebucket().  It seems like we drop the lock on each "write"
>>> bucket page before taking the lock on the next one.  So a concurrent
>>> scan could get ahead of the cleanup process.  That would be bad,
>>> wouldn't it?
>>
>> Yeah, that would be bad if it happens, but no concurrent scan can
>> happen during squeeze phase because we take an exclusive lock on a
>> bucket page and maintain it throughout the operation.
>
> Well, that's completely unacceptable.  A major reason the current code
> uses heavyweight locks is because you can't hold lightweight locks
> across arbitrary amounts of work -- because, just to take one example,
> a process holding or waiting for an LWLock isn't interruptible.  The
> point of this redesign was to get rid of that, so that LWLocks are
> only held for short periods.  I dislike the lock-chaining approach
> (take the next lock before releasing the previous one) quite a bit and
> really would like to find a way to get rid of that, but the idea of
> holding a buffer lock across a complete traversal of an unbounded
> number of overflow buckets is far worse.  We've got to come up with a
> design that doesn't require that, or else completely redesign the
> bucket-squeezing stuff.
>

I think we can use the idea of lock-chaining (take the next lock
before releasing the previous one) for squeeze-phase to solve this
issue.  Basically for squeeze operation, what we need to take care is
that there shouldn't be any scan before we start the squeeze and then
afterward if the scan starts, it should be always behind write-end of
a squeeze.  If we follow this, then there shouldn't be any problem
even for backward scans because to start backward scans, it needs to
start with the first bucket and reach last bucket page by locking each
bucket page in read mode.

> (Would it make any sense to change the order of the hash index patches
> we've got outstanding?  For instance, if we did the page-at-a-time
> stuff first, it would make life simpler for this patch in several
> ways, possibly including this issue.)
>

I agree that page-at-a-time can help hash indexes, but I don't think
it can help with this particular issue of squeeze operation.  While
cleaning dead-tuples, it would be okay even if scan went ahead of
cleanup (considering we have page-at-a-time mode), but for squeeze, we
can't afford that because it can move some tuples to a prior bucket
page and scan would miss those tuples.  Also, page-at-a-time will help
cleaning tuples only for MVCC scans (it might not help for unlogged
tables scan or non-MVCC scans).  Another point is that we don't have a
patch for page-at-a-time scan ready at this stage.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-17 Thread Tsunakawa, Takayuki
Hello, Craig,

I'm sorry to be late to review your patch.  I've just been able to read the 
HTML doc first.  Can I get the latest .patch file for reading and running the 
code?

Here are some comments and questions.  I tried to avoid the same point as other 
reviewers, but there may be an overlap.


(1)
The example
 UPDATE mytable SET x = x + 1;
should be
 UPDATE mytable SET x = x + 1 WHERE id = 42;


(2)
"The server usually begins executing the batch before all commands in the batch 
are queued and the end of batch command is sent."

Does this mean that the app developer cannot control or predict how many TCP 
transmissions a batch is sent with?  For example, if I want to insert 10 rows 
into a table in bulk, can I send those 10 rows (and the end of batch command) 
efficiently in one TCP transmission, or are they split by libpq into multiple 
TCP transmissions?


(3)
"To avoid deadlocks on large batches the client should be structured around a 
nonblocking I/O loop using a function like select, poll, epoll, 
WaitForMultipleObjectEx, etc."

Can't we use some (new) platform-independent API instead of using poll() or 
WaitForMultipleObject()?  e.g. some thin wrapper around pqWait().  It seems a 
bit burdonsome to have to use an OS-specific API to just wait for libpq.  Apart 
from that, it does not seem possible to wait for the socket in 64-bit apps on 
Windows, because SOCKET is 64-bit while PQsocket() returns int.

[winsock2.h]
/*
 * The new type to be used in all
 * instances which refer to sockets.
 */
typedef UINT_PTRSOCKET;

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL recycle retading based on active sync rep.

2016-11-17 Thread Kyotaro HORIGUCHI
Hello.

We had too-early WAL recycling during a test we had on a sync
replication set. This is not a bug and a bit extreme case but is
contrary to expectation on synchronous replication.

> FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 
> 00010088 has already been removed

This is because sync replication doesn't wait non-commit WALs to
be replicated. This situation is artificially caused with the
first patch attached and the following steps.

- Configure a master with max_wal_size=80MB and
  min_wal_size=48MB, and synchronous_standby_names='*' then run.

- Configure a replica using pg_basebackup and run it.
  Make a file /tmp/slow to delay replication.

- On the master do 
  =# create table t (a int);
  =# insert into t (select * from generate_series(0, 200));

I could guess the following two approaches for this.

A. Retard wal recycling back to where sync replication reached.

B. Block wal insertion until sync replication reaches to the
   first surviving segments.

The second attached patch implements the first measure. It makes
CreateCheckPoint consider satisfied sync replication on WAL
recycling. If WAL segments to be recycled is required by the
currently satisfied sync-replication, it keeps the required
segments and emit the following message.

> WARNING:  sync replication too retarded. 2 extra WAL segments are preserved 
> (last segno to preserve is moved from 185 to 183)
> HINT:  If you see this message too frequently, consider increasing 
> wal_keep_segments or max_wal_size.

This is somewhat simliar to what repl-slot does but this doesn't
anything when synchronous replication is not satisfied. Perhaps
max_temporary_preserve_segments or similar GUC is required to
limit amount of extra segments.


- Is this situation required to be saved? This is caused by a
  large transaction, spans over two max_wal_size segments, or
  replication stall lasts for a chackepoint period.

- Is the measure acceptable?  For the worst case, a master
  crashes from WAL space exhaustion. (But such large transaction
  won't/shouldn't exist?)

Or other comments?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7234376b2d6fa6b86cc9e4ed95a52af7bd6225e6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 18 Nov 2016 12:44:25 +0900
Subject: [PATCH 1/2] Slows replication processing.

To cause sync-rep desync artificially, sleep 100ms for each
replication message if /tmp/slow exists.
---
 src/backend/replication/walreceiver.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2bb3dce..57f3ad2 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -439,6 +439,11 @@ WalReceiverMain(void)
 			 */
 			last_recv_timestamp = GetCurrentTimestamp();
 			ping_sent = false;
+			{
+struct stat b;
+if (stat("/tmp/slow", ) >= 0)
+	usleep(10);
+			}
 			XLogWalRcvProcessMsg(buf[0], [1], len - 1);
 		}
 		else if (len == 0)
-- 
2.9.2

>From 3368c16e7a8f30216e7d9579f5d2ca3b923259d5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 18 Nov 2016 13:14:55 +0900
Subject: [PATCH 2/2] Preserve WAL segments requred by synchronous standbys.

Since synchronous standby doesn't sync non-commit records, a large
transaction may unexpectedly break a sync replication. This patch
makes CreateCheckPoint to preserve all WAL segments required by the
currently established synchronous replication.
---
 src/backend/access/transam/xlog.c | 26 ++
 src/backend/replication/syncrep.c | 23 ++-
 src/include/replication/syncrep.h |  4 
 3 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6cec027..195272e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -49,6 +49,7 @@
 #include "replication/slot.h"
 #include "replication/origin.h"
 #include "replication/snapbuild.h"
+#include "replication/syncrep.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/barrier.h"
@@ -8628,12 +8629,37 @@ CreateCheckPoint(int flags)
 	if (PriorRedoPtr != InvalidXLogRecPtr)
 	{
 		XLogSegNo	_logSegNo;
+		bool		in_sync, am_sync;
+		XLogRecPtr	repwriteptr, repflushptr, repapplyptr;
 
 		/* Update the average distance between checkpoints. */
 		UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
 
 		XLByteToSeg(PriorRedoPtr, _logSegNo);
 		KeepLogSeg(recptr, &_logSegNo);
+
+		/*
+		 * If I am under satisfied synchronous replication, refrain from
+		 * removing segments apparently required by them. Refferring to write
+		 * pointer is enough.
+		 */
+		in_sync = SyncRepGetOldestSyncRecPtr(, ,
+			 , _sync, true);
+		if 

Re: [HACKERS] Parallel bitmap heap scan

2016-11-17 Thread Amit Khandekar
On 19 October 2016 at 09:47, Dilip Kumar  wrote:
> On Tue, Oct 18, 2016 at 1:45 AM, Andres Freund  wrote:
>> I don't quite understand why the bitmap has to be parallel at all. As
>> far as I understand your approach as described here, the only thing that
>> needs to be shared are the iteration arrays.  Since they never need to
>> be resized and such, it seems to make a lot more sense to just add an
>> API to share those, instead of the whole underlying hash.
>
> You are right that we only share iteration arrays. But only point is
> that each entry of iteration array is just a pointer to hash entry.
> So either we need to build hash in shared memory (my current approach)
> or we need to copy each hash element at shared location (I think this
> is going to be expensive).

While the discussion is going on regarding implementation for creating
shared tidbitmap, meanwhile I am starting with review of the bitmap
heap scan part, i.e. nodeBitmapHeapscan.c, since this looks mostly
independent of tidbitmap implementation.

> Brief design idea:
> ---
> #1. Shared TIDBitmap creation and initialization
>   First worker to see the state as parallel bitmap info as PBM_INITIAL
> become leader and set the state to PBM_INPROGRESS All other   workers
> see the state as PBM_INPROGRESS will wait for leader to complete the
> TIDBitmap.
>
> #2 At this level TIDBitmap is ready and all workers are awake.

As far as correctness is concerned, the logic where the first worker
becomes leader while others synchronously wait, looks good. Workers
get allocated right from the beginning even though they would stay
idle for some percentage of time (5-20% ?) , but I guess there is
nothing we can do about it with the current parallel query
infrastructure.

In pbms_is_leader() , I didn't clearly understand the significance of
the for-loop. If it is a worker, it can call
ConditionVariablePrepareToSleep() followed by
ConditionVariableSleep(). Once it comes out of
ConditionVariableSleep(), isn't it guaranteed that the leader has
finished the bitmap ? If yes, then it looks like it is not necessary
to again iterate and go back through the pbminfo->state checking.
Also, with this, variable queuedSelf also might not be needed. But I
might be missing something here. Not sure what happens if worker calls
ConditionVariable[Prepare]Sleep() but leader has already called
ConditionVariableBroadcast(). Does the for loop have something to do
with this ? But this can happen even with the current for-loop, it
seems.


> #3. Bitmap processing (Iterate and process the pages).
> In this phase each worker will iterate over page and chunk array and
> select heap pages one by one. If prefetch is enable then there will be
> two iterator. Since multiple worker are iterating over same page and
> chunk array we need to have a shared iterator, so we grab a spin lock
> and iterate within a lock, so that each worker get and different page
> to process.

tbm_iterate() call under SpinLock :
For parallel tbm iteration, tbm_iterate() is called while SpinLock is
held. Generally we try to keep code inside Spinlock call limited to a
few lines, and that too without occurrence of a function call.
Although tbm_iterate() code itself looks safe under a spinlock, I was
checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
closer to each other. One thought is :  in tbm_iterate(), acquire the
SpinLock before the while loop that iterates over lossy chunks. Then,
if both chunk and per-page data remain, release spinlock just before
returning (the first return stmt). And then just before scanning
bitmap of an exact page, i.e. just after "if (iterator->spageptr <
tbm->npages)", save the page handle, increment iterator->spageptr,
release Spinlock, and then use the saved page handle to iterate over
the page bitmap.

prefetch_pages() call under Spinlock :
Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
would get called while under the Spinlock. These can even ereport().
One option is to use mutex lock for this purpose. But I think that
would slow things down. Moreover, the complete set of prefetch pages
would be scanned by a single worker, and others might wait for this
one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
Spinlock only while incrementing pbminfo->prefetch_pages. The rest
part viz : iterating over the prefetch pages, and doing the
PrefetchBuffer() need not be synchronised using this
pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
its own iterator spinlock. Only thing is, workers may not do the
actual PrefetchBuffer() sequentially. One of them might shoot ahead
and prefetch 3-4 pages while the other is lagging with the
sequentially lesser page number; but I believe this is fine, as long
as they all prefetch all the required blocks.


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Joshua Drake
Why not hash the URL? Something like:

Http://postgresopen.org/archive/743257890976432

Where the hash is derived from the message if?

On Nov 17, 2016 17:40, "Alvaro Herrera"  wrote:
>
> Tom Lane wrote:
> > Andrew Dunstan  writes:
> > > I love seeing references to email threads in commit messages. It would
> > > make them a lot friendlier though if a full http URL were included
> > > instead of just a Message-ID,
> >
> > I've intentionally not done that, because it does not seem very
> > future-proof.  The message ids will hopefully be unique indefinitely far
> > into the future, but the location of our archives could move.
>
> It won't.  We have far too many in the archives to risk breaking them.
> In fact, we (Magnus) went great lengths to implement a system so that
> old-style PHP links (of which we have a bunch in very old archives,
> including in commit messages) continue to work to this day.  We're far
> too invested in the system now, because of how successful it has proved
> to be.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-17 Thread Etsuro Fujita

On 2016/11/16 16:38, Etsuro Fujita wrote:

On 2016/11/16 13:10, Ashutosh Bapat wrote:

I don't see any reason why DML/UPDATE pushdown should depend upon
subquery deparsing or least PHV patch. Combined together they can help
in more cases, but without those patches, we will be able to push-down
more stuff. Probably, we should just restrict push-down only for the
cases when above patches are not needed. That makes reviews easy. Once
those patches get committed, we may add more functionality depending
upon the status of this patch. Does that make sense?



OK, I'll extract from the patch the minimal part that wouldn't depend on
the two patches.


Here is a patch for that.  Todo items are: (1) add more comments and (2) 
add more regression tests.  I'll do that in the next version.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 130,142  static void deparseTargetList(StringInfo buf,
    Bitmapset *attrs_used,
    bool qualify_col,
    List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  		  deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  	 Index rtindex, Relation rel,
  	 bool trig_after_row,
  	 List *returningList,
  	 List **retrieved_attrs);
  static void deparseColumnRef(StringInfo buf, int varno, int varattno,
   PlannerInfo *root, bool qualify_col);
  static void deparseRelation(StringInfo buf, Relation rel);
--- 130,154 
    Bitmapset *attrs_used,
    bool qualify_col,
    List **retrieved_attrs);
! static void deparseExplicitTargetList(bool is_returning,
! 		  List *tlist,
! 		  List **retrieved_attrs,
  		  deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  	 Index rtindex, Relation rel,
  	 bool trig_after_row,
  	 List *returningList,
  	 List **retrieved_attrs);
+ static void deparseExplicitReturningList(List *rlist,
+ 			 List **retrieved_attrs,
+ 			 deparse_expr_cxt *context);
+ static List *make_explicit_returning_list(Index rtindex, Relation rel,
+ 			 List *returningList,
+ 			 List **fdw_scan_tlist);
+ static void pull_up_target_conditions(PlannerInfo *root, RelOptInfo *foreignrel,
+ 		  Index target_rel, List **target_conds);
+ static void extract_target_conditions(List **joinclauses, Index target_rel,
+ 		  List **target_conds);
  static void deparseColumnRef(StringInfo buf, int varno, int varattno,
   PlannerInfo *root, bool qualify_col);
  static void deparseRelation(StringInfo buf, Relation rel);
***
*** 164,171  static void deparseSelectSql(List *tlist, List **retrieved_attrs,
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
! static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
! 	RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
--- 176,183 
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
! static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 	  bool use_alias, Index target_rel, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
***
*** 994,1000  deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context)
  		foreignrel->reloptkind == RELOPT_UPPER_REL)
  	{
  		/* For a join relation use the input tlist */
! 		deparseExplicitTargetList(tlist, retrieved_attrs, context);
  	}
  	else
  	{
--- 1006,1012 
  		foreignrel->reloptkind == RELOPT_UPPER_REL)
  	{
  		/* For a join relation use the input tlist */
! 		deparseExplicitTargetList(false, tlist, retrieved_attrs, context);
  	}
  	else
  	{
***
*** 1037,1043  deparseFromExpr(List *quals, deparse_expr_cxt *context)
  	appendStringInfoString(buf, " FROM ");
  	deparseFromExprForRel(buf, context->root, scanrel,
  		  (bms_num_members(scanrel->relids) > 1),
! 		  context->params_list);
  
  	/* Construct WHERE clause */
  	if (quals != NIL)
--- 1049,1055 
  	appendStringInfoString(buf, " FROM ");
  	deparseFromExprForRel(buf, context->root, scanrel,
  		  (bms_num_members(scanrel->relids) > 1),
! 		  (Index) 0, 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-17 Thread Craig Ringer
On 17 November 2016 at 10:57, Robert Haas  wrote:
> On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki
>  wrote:
>> Do we really want to enable libpq failover against pre-V10 servers?  I don't 
>> think so, as libpq is a part of PostgreSQL and libpq failover is a new 
>> feature in PostgreSQL 10.  At least, as one user, I don't want PostgreSQL to 
>> sacrifice another round trip to establish a connection.  As a developer, I 
>> don't want libpq code more complex than necessary (the proposed patch adds a 
>> new state to the connection state machine.)  And I think it's natural for 
>> the server to return the server attribute (primary/standby, writable, etc.) 
>> as a response to the Startup message like server_version, 
>> standard_conforming_strings and server_encoding.
>
> Well, generally speaking, a new feature that works against older
> server is better than one that doesn't.  Of course, if that entails
> making other compromises then you have to decide whether it's worth
> it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only
> exist in older versions so if we pick either of those methods then it
> will just work.  If we decide to invent some completely new method of
> distinguishing masters from standbys, then it might not, but that
> would be a drawback of such a choice, not a benefit.

We can and probably should have both.

If the server tells us on connect whether it's a standby or not, use that.

Otherwise, ask it.

That way we don't pay the round-trip cost and get the log spam when
talking to newer servers that send us something useful in the startup
packet, but we can still query it on older servers. Graceful fallback.

Every round trip is potentially very expensive. Having libpq do them
unnecessarily is bad.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

2016-11-17 Thread Craig Ringer
Hi all

While adding support for logical decoding on standby I noticed that
the walsender doesn't respect
SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set.

It blindly assumes that it means there's new WAL:

WalSndSignals(void)
{
...
pqsignal(SIGUSR1, WalSndXLogSendHandler);   /* request WAL
sending */
...
}

since all WalSndXLogSendHandler does is set the latch.

Handling recovery conflicts in the walsender is neccessary for logical
decoding on standby, so that we can replay drop database.

All the recovery conflict machinery is currently contained in
postgres.c and not used by, or accessible to, the walsender. It
actually works to just set procsignal_sigusr1_handler as walsender's
SIGUSR1 handler, but I'm not convinced it's safe:

Most of the procsignals don't make sense for walsender and could
possibly attempts things that use state the walsender doesn't have set
up. The comments on procsignal say that callers should tolerate
getting the wrong signal due to possible races:

 * Also, because of race conditions, it's important that all the signals be
 * defined so that no harm is done if a process mistakenly receives one.

(procsignal.h)

I'm wondering about adding a new state flag IsWalSender and have
RecoveryConflictInterrupt() ignore most conflict reasons if
IsWalSender is true. Though it strikes me that during logical decoding
on standby, the walsender could quite possibly conflict with other
things too, so it'd be better to make it safe to handle all the
conflict cases within the walsender.

Anyway, this PoC passes regression tests and allows drop database on a
standby to succeed when a slot is in-use. Not for commit as-is.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 7b19d2375657bf9b360e4c8feeeaf5f24b7f615a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 18 Nov 2016 10:24:55 +0800
Subject: [PATCH 1/2] Allow walsender to exit on conflict with recovery

Now that logical decoding on standby is supported, the walsender needs to be
able to exit in response to conflict with recovery so that it can terminate to
allow replay of a DROP DATABASE to proceed.

WIP. The comments on RecoveryConflictInterrupt() still say it's only called
by normal user backends. There's no safeguard to stop walsender from invoking
other recovery conflict clauses that may be unsafe for it to call.
---
 src/backend/replication/walsender.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 327dbb2..65b38a2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -187,7 +187,6 @@ static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
-static void WalSndXLogSendHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -2666,17 +2665,6 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR1: set flag to send WAL records */
-static void
-WalSndXLogSendHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
@@ -2710,7 +2698,7 @@ WalSndSignals(void)
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
  * shutdown */
 
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Alvaro Herrera
Tom Lane wrote:
> Andrew Dunstan  writes:
> > I love seeing references to email threads in commit messages. It would 
> > make them a lot friendlier though if a full http URL were included 
> > instead of just a Message-ID,
> 
> I've intentionally not done that, because it does not seem very
> future-proof.  The message ids will hopefully be unique indefinitely far
> into the future, but the location of our archives could move.

It won't.  We have far too many in the archives to risk breaking them.
In fact, we (Magnus) went great lengths to implement a system so that
old-style PHP links (of which we have a bunch in very old archives,
including in commit messages) continue to work to this day.  We're far
too invested in the system now, because of how successful it has proved
to be.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-17 Thread Amit Langote
On 2016/11/18 1:43, Robert Haas wrote:
> On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote wrote:
>> OK, I will share the performance results soon.
> 
> Thanks.
> 
>>> Also, in 0006:
>>>
>>> - I doubt that PartitionTreeNodeData's header comment will survive
>>> contact with pgindent.
>>
>> Fixed by adding "/* " at the top of the comment.
> 
> OK.  Hopefully you also tested that.  In general, with a patch set
> this large, the more you can do to make it pgindent-clean, the better.
> Ideally none of your code would get changed by pgindent, or at least
> not the new files.  But note that you will probably have have to
> update typedefs.list if you actually want to be able to run it without
> having it mangle things that involve your new structs.

OK, I was afraid that running pgindent might end up introducing unrelated
diffs in the patch, but I'm sure any code committed to HEAD would have
been through pgindent and my worry might be pointless after all.  Also,
thanks for the tip about the typedefs.list.  I will go try pgindent'ing
the patches.

>>> - The code in make_modifytable() to swap out the rte_array for a fake
>>> one looks like an unacceptable kludge.  I don't know offhand what a
>>> better design would look like, but what you've got is really ugly.
>>
>> Agree that it looks horrible.  The problem is we don't add partition
>> (child table) RTEs when planning an insert on the parent and FDW
>> partitions can't do without some planner handling - planForeignModify()
>> expects a valid PlannerInfo for deparsing target lists (basically, to be
>> able to use planner_rt_fetch()).
>>
>> Any design beside this kludge would perhaps mean that we will be adding
>> valid partition RTEs at some earlier planning stage much like what
>> expand_inherited_rtentry() does during non-insert queries.  Perhaps, we
>> could have expand_inherited_rtentry() make an exception in partitioned
>> tables' case and create root->append_rel_list members even in the insert
>> case.  We could then go over the append_rel_list in make_modifytable() to
>> get the valid RT index of foreign child tables to pass to
>> PlanForeignModify().  Using append_rel_list for insert planning is perhaps
>> equally ugly though. Thoughts?
> 
> If it's only needed for foreign tables, how about for v1 we just throw
> an error and say errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("cannot route inserted tuples to a foreign table") for now.  We
> can come back and fix it later.  Doing more inheritance expansion
> early is probably not a good idea because it likely sucks for
> performance, and that's especially unfortunate if it's only needed for
> foreign tables.  Coming up with some new FDW API or some modification
> to the existing one is probably better, but I don't really want to get
> hung up on that right now.

OK, I agree with the decision to make tuple-routing a unsupported feature
in the first cut as far as foreign partitions are concerned.  Once can
still insert data into partitions directly.

I am assuming that the error should be thrown all the way below
ExecInsert(), not in the planner.  Which means I should revert any changes
I've made to the planner in this patch.

>>> - I don't understand how it can be right for get_partition_for_tuple()
>>> to be emitting an error that says "range partition key contains null".
>>> A defective partition key should be detected at partition creation
>>> time, not in the middle of tuple routing.
>>
>> That error is emitted when the partition key of the *input tuple* is found
>> to contain NULLs.  Maybe we need to consider something like how btree
>> indexes treat NULLs - order NULLs either before or after all the non-NULL
>> values based on NULLS FIRST/LAST config.  Thoughts?
> 
> Well, I'd say my confusion suggests that the error message needs to be
> clearer about what the problem is.  I think this is basically a case
> of there being no partition for the given row, so maybe just arrange
> to use that same message here ("no partition of relation \"%s\" found
> for row").

The reason NULLs in an input row are caught and rejected (with the current
message) before control reaches range_partition_for_tuple() is because
it's not clear to me whether the range bound comparison logic in
partition_rbound_datum_cmp() should be prepared to handle NULLs and what
the results of comparisons should look like.  Currently, all it ever
expects to see in the input tuple's partition key is non-NULL datums.
Comparison proceeds as follows: if a range bound datum is a finite value,
we invoke the comparison proc or if it is infinite, we conclude that the
input tuple is > or < the bound in question based on whether the bound is
a lower or upper bound, respectively.

Or are you saying that get_tuple_for_partition() should simply return -1
(partition not found) in case of encountering a NULL in range partition
key to the caller instead of throwing error as is now?  If the user sees
the message and decides to create a new range partition 

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-17 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Mithun Cy
> I am adding next version of the patch it have following fixes.
> Tsunakawa's comments
> 
> 1.  PGconn->target_server_type is now freed in freePGconn() 2.  Added
> PGTARGETSERVERTYPE.
> 
> 
> Additional comments from others
> 3.  Moved from SELECT pg_is_in_recovery() to SHOW transaction_read_only
> now should handle different kind of replication, as we recognise server
> to which writable connection can be made as primary. Very exactly like JDBC
> driver. Also documented about it.
> 4. renamed words from master to primary.

Thank you.  The following items need addressing.  Some of them require some 
more discussion to reach consensus, and I hope they will settle down soon.  
After checking the progress for a week or so, I'll mark the CommitFest entry as 
"ready for committer" or "waiting on author".

(1)
+server. Set this to any, if you want to connect to
+A server is recognized as a primary/standby by observering whether it

Typo.   , and "observering" -> "observing".


(2)
+   {"target_server_type", "PGTARGETSERVERTYPE", NULL, NULL,
+   "Target server type", "", 6,

Looking at existing parameters, the default value is defined as a macro, and 
the display label is a sequence of words separated by "-".  i.e.

+   {"target_server_type", "PGTARGETSERVERTYPE", DefaultTargetServerType, 
NULL,
+   "Target-Server-Type", "", 6,


(3)
Please avoid adding another round trip by using a GUC_REPORTed variable 
(ParameterStatus entry).  If you want to support this libpq failover with 
pre-10 servers, you can switch the method of determining the primary based on 
the server version.  But I don't think it's worth supporting older servers at 
the price of libpq code complexity.


(4)
Please consider supporting "standby" and "prefer_standby" like PgJDBC.  They 
are useful without load balancing when multiple standbys are used for HA.


(5)
I haven't tracked the progress of logical replication, but will 
target_server_type be likely to be usable with it?  How will target_server_type 
fit logical replication?

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Document how to set up TAP tests for Perl 5.8.8

2016-11-17 Thread Craig Ringer
On 18 November 2016 at 07:10, Michael Paquier  wrote:
> On Thu, Nov 17, 2016 at 2:39 PM, Craig Ringer  wrote:
>> I wasted a bunch of time getting set up to test for such an ancient
>> Perl and would love to save the next person along the hassle by
>> documenting the easy way.
>
> It looks sensible to mention that in the README, so +1.
>
> By the way, would it matter to mention ways to install perlbrew? For
> example, I just bumped into http://install.perlbrew.pl by looking
> around, though I don't doubt that most people would just use cpan with
> "install App::perlbrew" for example. For OSX users with brew and
> without macports that would matter.

The docs linked mention it, so I figured that'd probably do. Though I
notice it's only mentioned in the "configuration" section not under
"installation", so maybe it's worth adding yeah.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-11-17 Thread Peter Geoghegan
On Wed, Nov 16, 2016 at 5:06 PM, Thomas Munro
 wrote:
> + * Ideally, we'd compare every item in the index against every other
> + * item in the index, and not trust opclass obedience of the transitive
> + * law to bridge the gap between children and their grandparents (as
> + * well as great-grandparents, and so on).  We don't go to those
> + * lengths because that would be prohibitively expensive, and probably
> + * not markedly more effective in practice.
> + */
>
> I skimmed the Modern B-Tree Techniques paper recently, and there was a
> section on "the cousin problem" when verifying btrees, which this
> comment reminded me of.

I'm impressed that you made the connection. This comment reminded you
of that section because it was directly influenced by it. I actually
use the term "cousin page" elsewhere, because it's really useful to
distinguish cousins from "true siblings" when talking about page
deletion in particular...and I have a lot to say about page deletion.

> I tried to come up with an example of a pair
> of characters being switched in a collation that would introduce
> undetectable corruption:
>
> T1.  Order   = a < b < c < d
>
>  Btree   =[a|c]
>   /   \
>  / \
> /   \
>   [a]---[c]
>| |
>| |
>   [b]---[d]
>
> T2.  Order   = a < c < b < d

So, the Btree you show is supposed to be good? You've left it up to me
to draw T2, the corrupt version? If so, here is a drawing of T2 for
the sake of clarity:

  Btree   =[a|b]
   /   \
  / \
 /   \
   [a]---[b]
| |
| |
   [c]---[d]

> Now c and b have been switched around.  Won't amcheck still pass since
> we only verify immediate downlinks and siblings?  Yet searches for b
> will take a wrong turn at the root.  Do I have this right?  I wonder
> if there is a way to verify that each page's high key < the 'next' key
> for all ancestors.

I don't think you have it right, because your argument is predicated
on classic L, not the Postgres variant. The high key is literally
guaranteed to be *equal* to the initial first value of the right half
of a split page post-split (at least initially), which is where the
new downlink for the new right half comes from in turn, so it will be
exactly equal as well. There is a comment which says all this within
_bt_insert_parent(), actually.

In general, I think it's pretty bad that we're so loose about
representing the key space in downlinks compared to classic L,
because it probably makes index bloat a lot worse in internal pages,
messing up the key space long term. As long as we're doing that,
though, we clearly cannot usefully "verify that each page's high key <
the 'next' key for all ancestors", because we deviate from L in a
way that makes that check always fail. It's okay that it always fails
for us (it doesn't actually indicate corruption), because of our
duplicate-aware index scan logic (change to invariant).

In summary, I'm pretty sure that our "Ki <= v <= Ki+1" thing makes the
cousin problem a non-issue, because we don't follow downlinks that are
equal to our scankey -- we go to their immediate left and start
*there*, in order to support duplicates in leaf pages. Index scans for
'b' work for both T1 and T2 in Postgres, because we refuse to follow
the 'b' downlinks when doing an index scan for 'b' (relevant to T2). I
would actually like to change things to make the invariant the classic
L "Ki < v <= Ki+1", to avoid bloat in the internal pages and to make
suffix truncation in internal pages work.

So, we don't have the cousin problem, but since I wish for us to adopt
the stricter classic L invariant at some point, you could say that I
also wished that we had the cousin problem.

Confused yet? :-)

[1] https://archive.org/stream/symmetricconcurr00lani#page/6/mode/2up
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Document how to set up TAP tests for Perl 5.8.8

2016-11-17 Thread Michael Paquier
On Thu, Nov 17, 2016 at 2:39 PM, Craig Ringer  wrote:
> I wasted a bunch of time getting set up to test for such an ancient
> Perl and would love to save the next person along the hassle by
> documenting the easy way.

It looks sensible to mention that in the README, so +1.

By the way, would it matter to mention ways to install perlbrew? For
example, I just bumped into http://install.perlbrew.pl by looking
around, though I don't doubt that most people would just use cpan with
"install App::perlbrew" for example. For OSX users with brew and
without macports that would matter.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Document how to set up TAP tests for Perl 5.8.8

2016-11-17 Thread Craig Ringer
On 18 November 2016 at 05:41, Robert Haas  wrote:
> On Wed, Nov 16, 2016 at 9:54 PM, Craig Ringer  wrote:
>> On 17 November 2016 at 10:42, Craig Ringer  wrote:
>>> But sure, if it's easier, we can have 5.8.0 in the README. What's five
>>> extra years matter anyway? Hey, while we're at it, lets change Pg to
>>> build on Borland C and require K style!
>>
>> Sorry. That was unnecessary. I should've had the sense to save that as
>> a draft and come back later.
>
> Well, *I* thought it was pretty funny...
>
> I remember K style rather fondly.  Of course it was inferior
> technically, but if you could get things to compile that way you could
> feel like a wizard for being able to make the old compiler work.

Rather. I remember getting modern software to compile on SCO
OpenServer 5.0.5 that way. *Twich* *twitch*.

Anyway, here's an updated patch with version changed to 5.8.4 since
that's the newest we test on the buildfarm (and it's from 2004).

I wasted a bunch of time getting set up to test for such an ancient
Perl and would love to save the next person along the hassle by
documenting the easy way.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 2a911dda796e88c88703095ad2f34cb73ccf919b Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 15 Nov 2016 15:45:41 +0800
Subject: [PATCH] Document that perl 5.8.4 is required for TAP tests

We don't test with anything older than Perl 5.8.4 in the buildfarm
and it's from 2004, so document that the TAP tests are expected to
work with Perl 5.8.4 and explain how to get it easily.

This does not affect configure tests; we'll still compile against
an older Perl 5.8.x if there's one in the wild.
---
 src/test/perl/README | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/test/perl/README b/src/test/perl/README
index cfb45a1..9c8bd05 100644
--- a/src/test/perl/README
+++ b/src/test/perl/README
@@ -64,3 +64,20 @@ For available PostgreSQL-specific test methods and some example tests read the
 perldoc for the test modules, e.g.:
 
 perldoc src/test/perl/PostgresNode.pm
+
+Required Perl
+-
+
+Tests must run on perl 5.8.4 and newer. perlbrew is a good way to obtain
+such a Perl; see https://metacpan.org/pod/distribution/App-perlbrew/bin/perlbrew .
+Just install and
+
+perlbrew --force install 5.8.4
+perlbrew use 5.8.4
+perlbrew install-cpanm
+cpanm install IPC::Run
+
+then re-run configure to ensure the correct Perl is used when running tests. To verify
+that the right Perl was found:
+
+grep ^PERL= config.log
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-17 Thread Tom Lane
Andreas Karlsson  writes:
> Emre Hasegeli wrote:
> Attached patch adds <@, @>, <<@, and @>> operator symbols for inet
> datatype to replace <<=, >>=, <<, and >>.

> Nice, I am fine with this version of the patch. Setting it to ready for 
> committer!

I looked at this for awhile and TBH I am not very excited about it.
I am not sure it makes anything better, but I am sure it makes things
different.  People tend not to like that.

The new names might be better if we were starting in a green field,
but in themselves they are not any more mnemonic than what we had, and
what we had has been there for a lot of years.  Also, if we accept both
old names and new (which it seems like we'd have to), that creates new
opportunities for confusion, which is a cost that should not be
disregarded.

The original post proposed that we'd eventually get some benefit by
being able to repurpose << and >> to mean something else, but the
time scale over which that could happen is so long as to make it
unlikely to ever happen.  I think we'd need to deprecate these names
for several years, then actually remove them and have nothing there for
a few years more, before we could safely install new operators that
take the same arguments but do something different.  (For comparison's
sake, it took us five years to go from deprecating => as a user operator
to starting to use it as parameter naming syntax ... and that was a
case where conflicting use could be expected to throw an error, not
silently misbehave, so we could force it with little risk of silently
breaking peoples' applications.  To repurpose << and >> in this way
we would need to move much slower.)

I'm inclined to think we should just reject this patch.  I'm certainly not
going to commit it without seeing positive votes from multiple people.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] quieting DEBUG3

2016-11-17 Thread Robert Haas
On Wed, Nov 16, 2016 at 12:00 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>
>> Right: me either.  So, here is a series of three patches.
>
> +1 to the general idea of the three patches.  I didn't review nor test
> them in detail.

Thanks.  Hearing no objections, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Andrew Dunstan



On 11/17/2016 04:06 PM, Andres Freund wrote:


On November 17, 2016 1:02:38 PM PST, Andrew Dunstan  wrote:

I love seeing references to email threads in commit messages. It would
make them a lot friendlier though if a full http URL were included
instead of just a Message-ID, i.e. instead of  put
. I know this is

a bit more trouble. but not that much, and it would make it very easy
to
follow with just a mouse click.

They're really hard to read though, because lines with e.g. gmail message IDs 
get very long even without that prefix.  Do you look at these in the e-mail or 
gitweb?




Mainly in Thunderbird. But it's cumbersome either way.

I agree gmail message IDs are ugly and insanely long. Personally I would 
rather pay the price of one line of ugliness for nice clickability.


To Tom's point that the URL might be ephemeral while the Message-ID is 
not, the URL would contain the Message-ID, so even if the base changed 
you could adjust to it.


cheers

andrew
**


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-17 Thread Claudio Freire
On Thu, Nov 17, 2016 at 6:34 PM, Robert Haas  wrote:
> On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire  
> wrote:
>> Attached is patch 0002 with pgindent applied over it
>>
>> I don't think there's any other formatting issue, but feel free to
>> point a finger to it if I missed any
>
> Hmm, I had imagined making all of the segments the same size rather
> than having the size grow exponentially.  The whole point of this is
> to save memory, and even in the worst case you don't end up with that
> many segments as long as you pick a reasonable base size (e.g. 64MB).

Wastage is bound by a fraction of the total required RAM, that is,
it's proportional to the amount of required RAM, not the amount
allocated. So it should still be fine, and the exponential strategy
should improve lookup performance considerably.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Document how to set up TAP tests for Perl 5.8.8

2016-11-17 Thread Robert Haas
On Wed, Nov 16, 2016 at 9:54 PM, Craig Ringer  wrote:
> On 17 November 2016 at 10:42, Craig Ringer  wrote:
>> But sure, if it's easier, we can have 5.8.0 in the README. What's five
>> extra years matter anyway? Hey, while we're at it, lets change Pg to
>> build on Borland C and require K style!
>
> Sorry. That was unnecessary. I should've had the sense to save that as
> a draft and come back later.

Well, *I* thought it was pretty funny...

I remember K style rather fondly.  Of course it was inferior
technically, but if you could get things to compile that way you could
feel like a wizard for being able to make the old compiler work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire  wrote:
> Attached is patch 0002 with pgindent applied over it
>
> I don't think there's any other formatting issue, but feel free to
> point a finger to it if I missed any

Hmm, I had imagined making all of the segments the same size rather
than having the size grow exponentially.  The whole point of this is
to save memory, and even in the worst case you don't end up with that
many segments as long as you pick a reasonable base size (e.g. 64MB).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire  wrote:
> Attached is patch 0002 with pgindent applied over it
>
> I don't think there's any other formatting issue, but feel free to
> point a finger to it if I missed any

Hmm, I had imagined making all of the segments the same size rather
than having the size grow exponentially.  The whole point of this is
to save memory, and even in the worst case you don't end up with that
many segments as long as you pick a reasonable base size (e.g. 64MB).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Tom Lane
Andrew Dunstan  writes:
> I love seeing references to email threads in commit messages. It would 
> make them a lot friendlier though if a full http URL were included 
> instead of just a Message-ID,

I've intentionally not done that, because it does not seem very
future-proof.  The message ids will hopefully be unique indefinitely far
into the future, but the location of our archives could move.

Andres' point about line length is also an issue.

FWIW, I have my home page set up so that I can paste in a message ID and
go directly to the archived thread; there's more than one way to skin
that cat.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-17 Thread Michael Paquier
On Thu, Nov 17, 2016 at 11:16 AM, Robert Haas  wrote:
> On Thu, Nov 17, 2016 at 1:41 PM, Michael Paquier
>  wrote:
>> Okay, let's remove the documentation then. What do you think about the
>> updated version attached?
>> (Even this patch enters into "Needs Review" state).
>
> LGTM.  I'll commit it if there are not objections.

Thank you.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Alvaro Herrera
Robert Haas wrote:
> On Thu, Nov 17, 2016 at 4:06 PM, Andres Freund  wrote:
> > On November 17, 2016 1:02:38 PM PST, Andrew Dunstan  
> > wrote:
> >>I love seeing references to email threads in commit messages. It would
> >>make them a lot friendlier though if a full http URL were included
> >>instead of just a Message-ID, i.e. instead of  put
> >>. I know this is
> >>
> >>a bit more trouble. but not that much, and it would make it very easy
> >>to
> >>follow with just a mouse click.
> >
> > They're really hard to read though, because lines with e.g. gmail message 
> > IDs get very long even without that prefix.  Do you look at these in the 
> > e-mail or gitweb?
> 
> Yeah.  I really hate having commit messages that are more than 70-75
> characters wide, and a message-ID takes you well above that.I wish
> we had something like archives.postgresql.org/sha/5bcd5142 instead of
> basing everything off the message-ID.

The advantage of using the message-ID is that you can use it while
offline too (very common among this crowd to do stuff while on planes,
since conferences and such are so frequent now), so I wouldn't make that
trade.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Alvaro Herrera
Andrew Dunstan wrote:
> I love seeing references to email threads in commit messages. It would make
> them a lot friendlier though if a full http URL were included instead of
> just a Message-ID, i.e. instead of  put
> . I know this is a
> bit more trouble. but not that much, and it would make it very easy to
> follow with just a mouse click.

I completely agree.  I would favor something even more convenient, such
as
https://postgr.es/m/foo1...@bar.com
which just redirects to the full URL.  That saves a decent number of
chars.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG?] pg_event_trigger_ddl_commands() error with ALTER TEXT SEARCH CONFIGURATION

2016-11-17 Thread Alvaro Herrera
Artur Zakirov wrote:
> Hello hackers,
> 
> I attached the test extension. It just creates event trigger on
> ddl_command_end. And pg_catalog.pg_event_trigger_ddl_commands() is executed
> from extension's C-function handler.
> 
> The pg_event_trigger_ddl_commands() returns the error:
> 
> ERROR:  unrecognized object class: 3603
> 
> if I try to execute the following commands:
> 
> => create text search CONFIGURATION en (copy=english);
> 
> => alter text search CONFIGURATION en ALTER MAPPING for host, email, url,
> sfloat with simple;
> ERROR:  unrecognized object class: 3603
> CONTEXT:  SQL statement "SELECT objid, UPPER(object_type), object_identity,
>   UPPER(command_tag)
>   FROM pg_catalog.pg_event_trigger_ddl_commands()"
> 
> This error is raised from getObjectClass() function. It seems that we need
> new OCLASS_TSCONFIG_MAP object class.
> 
> Is this a bug? Or it wasn't added intentionally?

It's a bug.  You're right that we need to handle the object class
somewhere.  Perhaps I failed to realize that tsconfigs could get
altered.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 4:06 PM, Andres Freund  wrote:
> On November 17, 2016 1:02:38 PM PST, Andrew Dunstan  
> wrote:
>>I love seeing references to email threads in commit messages. It would
>>make them a lot friendlier though if a full http URL were included
>>instead of just a Message-ID, i.e. instead of  put
>>. I know this is
>>
>>a bit more trouble. but not that much, and it would make it very easy
>>to
>>follow with just a mouse click.
>
> They're really hard to read though, because lines with e.g. gmail message IDs 
> get very long even without that prefix.  Do you look at these in the e-mail 
> or gitweb?

Yeah.  I really hate having commit messages that are more than 70-75
characters wide, and a message-ID takes you well above that.I wish
we had something like archives.postgresql.org/sha/5bcd5142 instead of
basing everything off the message-ID.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Mail thread references in commits

2016-11-17 Thread Andrew Dunstan
I love seeing references to email threads in commit messages. It would 
make them a lot friendlier though if a full http URL were included 
instead of just a Message-ID, i.e. instead of  put 
. I know this is 
a bit more trouble. but not that much, and it would make it very easy to 
follow with just a mouse click.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-17 Thread Andres Freund


On November 17, 2016 1:02:38 PM PST, Andrew Dunstan  wrote:
>I love seeing references to email threads in commit messages. It would 
>make them a lot friendlier though if a full http URL were included 
>instead of just a Message-ID, i.e. instead of  put 
>. I know this is
>
>a bit more trouble. but not that much, and it would make it very easy
>to 
>follow with just a mouse click.

They're really hard to read though, because lines with e.g. gmail message IDs 
get very long even without that prefix.  Do you look at these in the e-mail or 
gitweb?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-11-17 Thread Peter Geoghegan
On Wed, Nov 16, 2016 at 1:03 PM, Andres Freund  wrote:
> I think the patch could use a pgindent run.

Okay.

> I'd really want a function that runs all check on a table.

Why not just use a custom SQL query? The docs have an example of one
such query, that verifies all user indexes, but there could be another
example.

>> +#define CHECK_SUPERUSER() { \
>> + if (!superuser()) \
>> + ereport(ERROR, \
>> + 
>> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), \
>> +  (errmsg("must be superuser to use 
>> verification functions"; }
>
> Why is this a macro?

This was just copied from pageinspect, TBH. What would you prefer?

>> +#ifdef NOT_USED
>> +#define CORRUPTION   PANIC
>> +#define CONCERN  WARNING
>> +#define POSITION NOTICE
>> +#else
>> +#define CORRUPTION   ERROR
>> +#define CONCERN  DEBUG1
>> +#define POSITION DEBUG2
>> +#endif
>
> Hm, if we want that - and it doesn't seem like a bad idea - I think we
> should be make it available without recompiling.

I suppose, provided it doesn't let CORRUPTION elevel be < ERROR. That
might be broken if it was allowed.

> Hm, I think it'd be, for the long term, better if we'd move the btree
> check code to amcheck_nbtree.c or such.

Agreed.

>> +Datum
>> +bt_index_parent_check(PG_FUNCTION_ARGS)
>> +{
>> + Oid indrelid = PG_GETARG_OID(0);

> Theoretically we should recheck that the oids still match, theoretically
> the locking here allows the index->table mapping to change. It's not a
> huge window tho...

>> + /* Check for active uses of the index in the current transaction */
>> + CheckTableNotInUse(indrel, "bt_index_parent_check");
>
> Why do we actually care?

This was left-over from duplicating REINDEX code. Indeed, we have no
reason to care at all.

>> + if (!rel->rd_index->indisready)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +  errmsg("cannot check index \"%s\"",
>> + RelationGetRelationName(rel)),
>> +  errdetail("Index is not yet ready for 
>> insertions")));
>
> Why can't we check this? Seems valuable to allow checking after a
> partial CONCURRENTLY index build?

Okay.

> Why don't you use IndexIsReady et al? Not sure if the patch compiles
> against an earlier release (there weren't valid/ready/live dead fields,
> but instead it was compressed into fewer)?

This version is almost identical to the Github version of amcheck that
we've been using with 9.4+. Agreed on IndexIsReady().

>> +/*
>> + * bt_check_every_level()
>> + *
>> + * Main entry point for B-Tree SQL-callable functions.  Walks the B-Tree in
>> + * logical order, verifying invariants as it goes.
>> + *
>> + * It is the caller's responsibility to acquire appropriate heavyweight 
>> lock on
>> + * the index relation, and advise us if extra checks are safe when an
>> + * ExclusiveLock is held.  An ExclusiveLock is generally assumed to prevent 
>> any
>> + * kind of physical modification to the index structure, including
>> + * modifications that VACUUM may make.
>> + */
>
> Hm, what kind of locking does the kill_prior_tuple stuff use?

The killing itself, or the setting of LP_DEAD bit? Ordinary index
scans could concurrently set LP_DEAD with only an AccessShareLock.
But, that's just metadata that amcheck doesn't care about. Anything
that needs to actually recyle space based on finding an LP_DEAD bit
set is a write operation, that is precluded from running when
bt_index_parent_check() is called by its ExclusiveLock.
_bt_vacuum_one_page() is only called by one nbtinsert.c code path,
when it looks like a page split might be needed.

>> +static void
>> +bt_check_every_level(Relation rel, bool exclusivelock)
>> +{
>
>> + /*
>> +  * Certain deletion patterns can result in "skinny" B-Tree indexes, 
>> where
>> +  * the fast root and true root differ.
>> +  *
>> +  * Start from the true root, not the fast root, unlike conventional 
>> index
>> +  * scans.  This approach is more thorough, and removes the risk of
>> +  * following a stale fast root from the meta page.
>> +  */
>> + if (metad->btm_fastroot != metad->btm_root)
>> + ereport(CONCERN,
>> + (errcode(ERRCODE_DUPLICATE_OBJECT),
>> +  errmsg("fast root mismatch in index %s",
>> + RelationGetRelationName(rel)),
>> +  errdetail_internal("Fast block %u (level %u) "
>> + 
>> "differs from true root "
>> + "block 
>> %u (level %u).",
>> +

Re: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2016-11-17 Thread Michael Meskes
> [System Design Plan]
> To support above functionality, ecpg precompiler should support:
>  - To understand the DECLARE STATEMENT syntax 

Already does, albeit as a noop.

>  - Translate the DECLARE STATEMENT into a new function with parameters. 
>These parameters carry the information like connection_name and 
> statement_name. 
>  - The function is a new function defined in the ECPG library.

Why's that? I haven't checked if the standard says anything about this and my
memory might be wrong, but isn't "DECLARE STATEMENT" supposed to be purely
declarative, i.e. not executed at run time?  

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 1:41 PM, Michael Paquier
 wrote:
> Okay, let's remove the documentation then. What do you think about the
> updated version attached?
> (Even this patch enters into "Needs Review" state).

LGTM.  I'll commit it if there are not objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote
 wrote:
> Meanwhile, here are updated patch that address most of the following comments.

OK, I have re-reviewed 0005 and it looks basically fine to me, modulo
a few minor nitpicks. "This is called, *iff*" shouldn't have a comma
there, and I think the entire comment block that starts with "NOTE:
SQL specifies that a NULL" and ends with "it's unlikely that NULL
would result." should be changed to say something like /* As for
catalogued constraints, we treat a NULL result as a success, not a
failure. */ rather than duplicating an existing comment that doesn't
quite apply here.  Finally, ExecConstraints() contains a new if-block
whose sole contents are another if-block.  Perhaps if (this && that)
would be better.

Regarding 0006 and 0007, I think the PartitionTreeNodeData structure
you've chosen is awfully convoluted and probably not that efficient.
For example, get_partition_for_tuple() contains this loop:

+   prev = parent;
+   node = parent->downlink;
+   while (node != NULL)
+   {
+   if (node->index >= cur_idx)
+   break;
+
+   prev = node;
+   node = node->next;
+   }

Well, it looks to me like that's an O(n) way to find the n'th
partition, which seems like a pretty bad idea in performance-critical
code, which this is.  I think this whole structure needs a pretty
heavy overhaul.  Here's a proposal:

1. Forget the idea of a tree.  Instead, let the total number of tables
in the partitioning hierarchy be N and let the number of those that
are partitioned be K.  Assign each partitioned table in the hierarchy
an index between 0 and K-1.  Make your top level data structure (in
lieu of PartitionTreeNodeData) be an array of K PartitionDispatch
objects, with the partitioning root in entry 0 and the rest in the
remaining entries.

2. Within each PartitionDispatch object, store (a) a pointer to a
PartitionDesc and (b) an array of integers of length equal to the
PartitionDesc's nparts value.  Each integer i, if non-negative, is the
final return value for get_partition_for_tuple.  If i == -1, tuple
routing fails.  If i < -1, we must next route using the subpartition
whose PartitionDesc is at index -(i+1).  Arrange for the array to be
in the same order the PartitionDesc's OID list.

3. Now get_partition_for_tuple looks something like this:

K = 0
loop:
pd = PartitionDispatch[K]
idx = list/range_partition_for_tuple(pd->partdesc, ...)
if (idx >= -1)
return idx
K = -(idx + 1)

No recursion, minimal pointer chasing, no linked lists.  The whole
thing is basically trivial aside from the cost of
list/range_partition_for_tuple itself; optimizing that is a different
project.  I might have some details slightly off here, but hopefully
you can see what I'm going for: you want to keep the computation that
happens in get_partition_for_tuple() to an absolute minimum, and
instead set things up in advance so that getting the partition for a
tuple is FAST.  And you want the data structures that you are using in
that process to be very compact, hence arrays instead of linked lists.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql \setfileref

2016-11-17 Thread Pavel Stehule
Hi

2016-11-17 12:00 GMT+01:00 Pavel Stehule :

>
> Hi
>
> a independent implementation of parametrized queries can looks like
> attached patch:
>
> this code is simple without any unexpected behave.
>
> parameters are used when user doesn't require escaping and when
> PARAMETRIZED_QUERIES variable is on
>
> Regards
>

here is a Daniel's syntax patch

The independent implementation of using query parameters is good idea, the
patch looks well, and there is a agreement with Tom.

I implemented a Daniel proposal - it is few lines, but personally I prefer
curly bracket syntax against `` syntax. When separator is double char, then
the correct implementation will not be simple - the bad combinations "` ``,
`` `" should be handled better. I wrote my arguments for my design, but if
these arguments are not important for any other, then I can accept any
other designs.

regards

Pavel



>
> Pavel
>
>
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index 86832a8..15ad610 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -19,7 +19,7 @@
 #include "postgres_fe.h"
 
 #include "psqlscanslash.h"
-
+#include "settings.h"
 #include "libpq-fe.h"
 }
 
@@ -53,7 +53,7 @@ static int	backtick_start_offset;
 #define LEXRES_OK			1	/* OK completion of backslash argument */
 
 
-static void evaluate_backtick(PsqlScanState state);
+static void evaluate_backtick(PsqlScanState state, bool to_hex);
 
 #define ECHO psqlscan_emit(cur_state, yytext, yyleng)
 
@@ -221,6 +221,13 @@ other			.
 	BEGIN(xslashbackquote);
 }
 
+"``"{
+	backtick_start_offset = output_buf->len;
+	*option_quote = '@';
+	unquoted_option_chars = 0;
+	BEGIN(xslashbackquote);
+}
+
 {dquote}		{
 	ECHO;
 	*option_quote = '"';
@@ -354,10 +361,18 @@ other			.
 "`"{
 	/* In NO_EVAL mode, don't evaluate the command */
 	if (option_type != OT_NO_EVAL)
-		evaluate_backtick(cur_state);
+		evaluate_backtick(cur_state, false);
 	BEGIN(xslasharg);
 }
 
+"``"{
+	/* In NO_EVAL mode, don't evaluate the command */
+	if (option_type != OT_NO_EVAL && *option_quote == '@')
+		evaluate_backtick(cur_state, true);
+	BEGIN(xslasharg);
+}
+
+
 {other}|\n		{ ECHO; }
 
 }
@@ -693,7 +708,7 @@ dequote_downcase_identifier(char *str, bool downcase, int encoding)
  * as a shell command and then replaced by the command's output.
  */
 static void
-evaluate_backtick(PsqlScanState state)
+evaluate_backtick(PsqlScanState state, bool to_hex)
 {
 	PQExpBuffer output_buf = state->output_buf;
 	char	   *cmd = output_buf->data + backtick_start_offset;
@@ -750,7 +765,20 @@ evaluate_backtick(PsqlScanState state)
 		if (cmd_output.len > 0 &&
 			cmd_output.data[cmd_output.len - 1] == '\n')
 			cmd_output.len--;
-		appendBinaryPQExpBuffer(output_buf, cmd_output.data, cmd_output.len);
+
+		if (to_hex)
+		{
+			unsigned char  *escaped_value;
+			size_t			escaped_size;
+
+			escaped_value = PQescapeByteaConn(pset.db,
+(const unsigned char *) cmd_output.data, cmd_output.len,
+		  _size);
+
+			appendBinaryPQExpBuffer(output_buf, (const char *) escaped_value, escaped_size);
+		}
+		else
+			appendBinaryPQExpBuffer(output_buf, cmd_output.data, cmd_output.len);
 	}
 
 	termPQExpBuffer(_output);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-17 Thread Claudio Freire
On Thu, Nov 17, 2016 at 2:51 PM, Claudio Freire  wrote:
> On Thu, Nov 17, 2016 at 2:34 PM, Masahiko Sawada  
> wrote:
>> I glanced at the patches but the both patches don't obey the coding
>> style of PostgreSQL.
>> Please refer to [1].
>>
>> [1] 
>> http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F.
>
> I thought I had. I'll go through that list to check what I missed.

Attached is patch 0002 with pgindent applied over it

I don't think there's any other formatting issue, but feel free to
point a finger to it if I missed any
From b9782cef0d971de341115bd3474d192419674219 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH 2/2] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c | 295 +-
 1 file changed, 230 insertions(+), 65 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 854bce3..dfb0612 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -91,6 +91,7 @@
  * tables.
  */
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
+#define LAZY_MIN_TUPLES		Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData))
 
 /*
  * Before we consider skipping a page that's marked as clean in
@@ -98,6 +99,27 @@
  */
 #define SKIP_PAGES_THRESHOLD	((BlockNumber) 32)
 
+typedef struct DeadTuplesSegment
+{
+	int			num_dead_tuples;	/* # of entries in the segment */
+	int			max_dead_tuples;	/* # of entries allocated in the segment */
+	ItemPointerData last_dead_tuple;	/* Copy of the last dead tuple (unset
+		 * until the segment is fully
+		 * populated) */
+	unsigned short padding;
+	ItemPointer dead_tuples;	/* Array of dead tuples */
+}	DeadTuplesSegment;
+
+typedef struct DeadTuplesMultiArray
+{
+	int			num_entries;	/* current # of entries */
+	int			max_entries;	/* total # of slots that can be allocated in
+ * array */
+	int			num_segs;		/* number of dead tuple segments allocated */
+	int			last_seg;		/* last dead tuple segment with data (or 0) */
+	DeadTuplesSegment *dead_tuples;		/* array of num_segs segments */
+}	DeadTuplesMultiArray;
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -117,14 +139,14 @@ typedef struct LVRelStats
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	/* List of TIDs of tuples we intend to delete */
 	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
+	DeadTuplesMultiArray dead_tuples;
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+#define DeadTuplesCurrentSegment(lvrelstats) (&((lvrelstats)->dead_tuples.dead_tuples[ \
+	(lvrelstats)->dead_tuples.last_seg ]))
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -149,7 +171,7 @@ static void lazy_cleanup_index(Relation indrel,
    IndexBulkDeleteResult *stats,
    LVRelStats *vacrelstats);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
- int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+ int tupindex, LVRelStats *vacrelstats, DeadTuplesSegment * seg, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
@@ -157,8 +179,9 @@ static BlockNumber count_nondeletable_pages(Relation onerel,
 static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
 static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
 	   ItemPointer itemptr);
+static void lazy_clear_dead_tuples(LVRelStats *vacrelstats);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int	vac_cmp_itemptr(const void *left, const void *right);
+static inline int vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 	 TransactionId *visibility_cutoff_xid, bool *all_frozen);
 
@@ -498,7 +521,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = 

Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-17 Thread Michael Paquier
On Thu, Nov 17, 2016 at 7:37 AM, Robert Haas  wrote:
> The whole point of the review process is to get an opinion from
> somebody other than the original author on the patch in question.  If
> you write a patch and then mark your own patch Ready for Committer,
> you're defeating the entire purpose of this process.  I think that's
> what you did here, I think you have done it on other occasions in the
> not-too-distant patch, and I wish you'd stop.  I understand perfectly
> well that there are times when a committer needs to involve themselves
> directly in a patch even when nobody else has reviewed it, because
> that just has to happen, and I try to keep an eye out for such
> scenarios, but it's frustrating to clear time to work on the RfC
> backlog and then discover patches that have just been marked that way
> without discussion or agreement.

OK, sorry about that, I switched it back to "Needs Review" FWIW. For
my defense, after a review of the first set of patches proposed, I
suggested one way to go which is the second patch I sent. Bruce and
yourself mentioned that nuking them more aggressively would be better
to have. Tom mentioned that doing so and having a GUC would be worth
it. In short this sounded like an agreement to me, which is why I
proposed a new patch to make the discussion move on. And the patch is
not that complicated. When looking at it I asked myself about
potential timing issues regarding fact of doing this cleanup more
aggressively but discarded them at the end.

> Now, on the substance of this issue, I think your proposed change to
> clean up temporary tables immediately rather than waiting until they
> become old enough to threaten wraparound is a clear improvement, and
> IMHO we should definitely do it.  However, I think your proposed
> documentation changes are pretty well inadequate.  If somebody
> actually does want to get at the temporary tables of a crashed
> backend, they will need to do a lot more than what you mention in that
> update.  Even if they set autovacuum=off, they will be vulnerable to
> having those tables removed by another backend with the same backend
> ID that reinitializes the temporary schema.  And even if that doesn't
> happen, they won't be able to select from those tables because they'll
> get an error about reading temporary tables of another session.  To
> fix that they'd have to move the temporary tables into some other
> schema AND change the filenames on disk.  And then on top of that, by
> the time anybody even found this in the documentation, it would be far
> too late to shut off autovacuum in the first place because it runs
> every minute (unless you have raised autovacuum_naptime, which you
> shouldn't).  I think we just shouldn't document this. The proposed
> behavior change is basically switching from an extremely conservative
> behavior with surprising consequences to what people would expect
> anyway, and anyone who needs to dig information out of another
> session's temporary tables is going to need to know a lot more about
> the server internals than we normally explain in the documentation.

Okay, let's remove the documentation then. What do you think about the
updated version attached?
(Even this patch enters into "Needs Review" state).
-- 
Michael


autovacuum-orphan-cleanup-v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-17 Thread Alvaro Herrera
I've been going over this patch.  I think it'd be better to restructure
the  before adding the docs for this new function; I already
split it out, so don't do anything about this.

Next, looking at struct TableExprBuilder I noticed that the comments are
already obsolete, as they talk about function params that do not exist
(missing_columns) and they fail to mention the ones that do exist.
Also, function member SetContent is not documented at all.  Overall,
these comments do not convey a lot -- apparently, whoever reads them is
already supposed to know how it works: "xyz sets a row generating
filter" doesn't tell me anything.  Since this is API documentation, it
needs to be much clearer.

ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
whatsoever.  Needs fixed.

I wonder if it'd be a good idea to install TableExpr first without the
implementing XMLTABLE, so that it's clearer what is API and what is
implementation.

The number of new keywords in this patch is depressing.  I suppose
there's no way around that -- as I understand, this is caused by the SQL
standard's definition of the syntax for this feature.

Have to go now for a bit -- will continue looking afterwards.  Please
submit delta patches on top of your latest v12 to fix the comments I
mentioned.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-11-17 Thread Michael Paquier
On Thu, Nov 17, 2016 at 8:12 AM, Robert Haas  wrote:
> So, the problem isn't Darwin-specific.  I experimented with this on
> Linux and found Linux does the same thing with libpgcommon_srv.a that
> macOS does: a file in the archive that is totally unused is omitted
> from the postgres binary.  In Linux, however, that doesn't prevent
> pgcrypto from compiling anyway.  It does, however, prevent it from
> working.  Instead of failing at compile time with a complaint about
> missing symbols, it fails at load time.  I think that's because macOS
> has -bundle-loader and we use it; without that, I think we'd get the
> same behavior on macOS that we get on Windows.

Yes, right. I recall seeing the regression tests failing with pgcrypto
when doing that. Though I did not recall if this was specific to macos
or Linux when I looked again at this patch yesterday. When testing
again yesterday I was able to make the tests of pgcrypto to pass, but
perhaps my build was not in a clean state...

> 1. Rejigger things so that we don't build libpgcommon_srv.a in the
> first place, and instead add $(top_builddir)/src/common to
> src/backend/Makefile's value of SUBDIRS.  With appropriate adjustments
> to src/common/Makefile, this should allow us to include all of the
> object files on the linker command line individually instead of
> building an archive library that is then used only for the postgres
> binary itself anyway.  Then, things wouldn't get dropped.
>
> 2. Just postpone committing this patch until we're ready to use the
> new code in the backend someplace (or add a dummy reference to it
> someplace).

At the end this refactoring makes sense because it will be used in the
backend with the SCRAM engine, so we could just wait for 2 instead of
having some workarounds. This is dropping the ball for later and there
will be already a lot of work for the SCRAM core part, though I don't
think that the SHA2 refactoring will change much going forward.

Option 3 would be to do things the patch does it, aka just compiling
pgcrypto using the source files directly and put a comment to revert
that once the APIs are used in the backend. I can guess that you don't
like that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-17 Thread Claudio Freire
On Thu, Nov 17, 2016 at 2:34 PM, Masahiko Sawada  wrote:
> I glanced at the patches but the both patches don't obey the coding
> style of PostgreSQL.
> Please refer to [1].
>
> [1] 
> http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F.

I thought I had. I'll go through that list to check what I missed.

>> In the results archive, the .vacuum prefix is the patched version with
>> both patch 1 and 2, .git.ref is just patch 1 (without which the
>> truncate takes unholily long).
>
> Did you measure the performance benefit of 0001 patch by comparing
> HEAD with HEAD+0001 patch?

Not the whole test, but yes. Without the 0001 patch the backward scan
step during truncate goes between 3 and 5 times slower. I don't have
timings because the test never finished without the patch. It would
have finished, but it would have taken about a day.

>> Grepping the results a bit, picking an average run out of all runs on
>> each scale:
>>
>> Timings:
>>
>> Patched:
>>
>> s100: CPU: user: 3.21 s, system: 1.54 s, elapsed: 18.95 s.
>> s400: CPU: user: 14.03 s, system: 6.35 s, elapsed: 107.71 s.
>> s4000: CPU: user: 228.17 s, system: 108.33 s, elapsed: 3017.30 s.
>>
>> Unpatched:
>>
>> s100: CPU: user: 3.39 s, system: 1.64 s, elapsed: 18.67 s.
>> s400: CPU: user: 15.39 s, system: 7.03 s, elapsed: 114.91 s.
>> s4000: CPU: user: 282.21 s, system: 105.95 s, elapsed: 3017.28 s.
>>
>> Total I/O (in MB)
>>
>> Patched:
>>
>> s100: R:2.4 - W:5862
>> s400: R:1337.4 - W:29385.6
>> s4000: R:318631 - W:370154
>>
>> Unpatched:
>>
>> s100: R:1412.4 - W:7644.6
>> s400: R:3180.6 - W:36281.4
>> s4000: R:330683 - W:370391
>>
>>
>> So, in essence, CPU time didn't get adversely affected. If anything,
>> it got improved by about 20% on the biggest case (scale 4000).
>
> And this test case deletes all tuples in relation and then measure
> duration of vacuum.
> It would not be effect much in practical use case.

Well, this patch set started because I had to do exactly that, and
realized just how inefficient vacuum was in that case.

But it doesn't mean it won't benefit more realistic use cases. Almost
any big database ends up hitting this 1GB limit because big tables
take very long to vacuum and accumulate a lot of bloat in-between
vacuums.

If you have a specific test case in mind I can try to run it.

>> While total runtime didn't change much, I believe this is only due to
>> the fact that the index is perfectly correlated (clustered?) since
>> it's a pristine index, so index scans either remove or skip full
>> pages, never leaving things half-way. A bloated index would probably
>> show a substantially different behavior, I'll try to get a script that
>> does it by running pgbench a while before the vacuum or something like
>> that.
>>
>> However, the total I/O metric already shows remarkable improvement.
>> This metric is measuring all the I/O including pgbench init, the
>> initial vacuum pgbench init does, the delete and the final vacuum. So
>> it's not just the I/O for the vacuum itself, but the whole run. We can
>> see the patched version reading a lot less (less scans over the
>> indexes will do that), and in some cases writing less too (again, less
>> index scans may be performing less redundant writes when cleaning
>> up/reclaiming index pages).
>
> What value of maintenance_work_mem did you use for this test?

4GB on both, patched and HEAD.

> Since
> DeadTuplesSegment struct still stores array of ItemPointerData(6byte)
> representing dead tuple I supposed that the frequency of index vacuum
> does not change. But according to the test result, a index vacuum is
> invoked once and removes 4 rows at the time. It means that the
> vacuum stored about 2289 MB memory during heap vacuum. On the other
> side, the result of test without 0002 patch show that a index vacuum
> remove 178956737 rows at the time, which means 1GB memory was used.

1GB is a hardcoded limit on HEAD for vacuum work mem. This patch
removes that limit and lets vacuum use all the memory the user gave to
vacuum.

In the test, in both cases, 4GB was used as maintenance_work_mem
value, but HEAD cannot use all the 4GB, and it will limit itself to
just 1GB.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-17 Thread Masahiko Sawada
On Thu, Oct 27, 2016 at 5:25 AM, Claudio Freire  wrote:
> On Thu, Sep 15, 2016 at 1:16 PM, Claudio Freire  
> wrote:
>> On Wed, Sep 14, 2016 at 12:24 PM, Claudio Freire  
>> wrote:
>>> On Wed, Sep 14, 2016 at 12:17 PM, Robert Haas  wrote:

 I am kind of doubtful about this whole line of investigation because
 we're basically trying pretty hard to fix something that I'm not sure
 is broken.I do agree that, all other things being equal, the TID
 lookups will probably be faster with a bitmap than with a binary
 search, but maybe not if the table is large and the number of dead
 TIDs is small, because cache efficiency is pretty important.  But even
 if it's always faster, does TID lookup speed even really matter to
 overall VACUUM performance? Claudio's early results suggest that it
 might, but maybe that's just a question of some optimization that
 hasn't been done yet.
>>>
>>> FYI, the reported impact was on CPU time, not runtime. There was no
>>> significant difference in runtime (real time), because my test is
>>> heavily I/O bound.
>>>
>>> I tested with a few small tables and there was no significant
>>> difference either, but small tables don't stress the array lookup
>>> anyway so that's expected.
>>>
>>> But on the assumption that some systems may be CPU bound during vacuum
>>> (particularly those able to do more than 300-400MB/s sequential I/O),
>>> in those cases the increased or decreased cost of lazy_tid_reaped will
>>> directly correlate to runtime. It's just none of my systems, which all
>>> run on amazon and is heavily bandwidth constrained (fastest I/O
>>> subsystem I can get my hands on does 200MB/s).
>>
>> Attached is the patch with the multiarray version.
>>
>> The tests are weird. Best case comparison over several runs, to remove
>> the impact of concurrent activity on this host (I couldn't remove all
>> background activity even when running the tests overnight, the distro
>> adds tons of crons and background cleanup tasks it would seem),
>> there's only very mild CPU impact. I'd say insignificant, as it's well
>> below the mean variance.
>
> I reran the tests on a really dedicated system, and with a script that
> captured a lot more details about the runs.
>
> The system isn't impressive, an i5 with a single consumer HD and 8GB
> RAM, but it did the job.
>
> These tests make more sense, so I bet it was the previous tests that
> were spoiled by concurrent activity on the host.
>
> Attached is the raw output of the test, the script used to create it,
> and just in case the patch set used. I believe it's the same as the
> last one I posted, just rebased.

I glanced at the patches but the both patches don't obey the coding
style of PostgreSQL.
Please refer to [1].

[1] 
http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F.

>
> In the results archive, the .vacuum prefix is the patched version with
> both patch 1 and 2, .git.ref is just patch 1 (without which the
> truncate takes unholily long).

Did you measure the performance benefit of 0001 patch by comparing
HEAD with HEAD+0001 patch?

> Grepping the results a bit, picking an average run out of all runs on
> each scale:
>
> Timings:
>
> Patched:
>
> s100: CPU: user: 3.21 s, system: 1.54 s, elapsed: 18.95 s.
> s400: CPU: user: 14.03 s, system: 6.35 s, elapsed: 107.71 s.
> s4000: CPU: user: 228.17 s, system: 108.33 s, elapsed: 3017.30 s.
>
> Unpatched:
>
> s100: CPU: user: 3.39 s, system: 1.64 s, elapsed: 18.67 s.
> s400: CPU: user: 15.39 s, system: 7.03 s, elapsed: 114.91 s.
> s4000: CPU: user: 282.21 s, system: 105.95 s, elapsed: 3017.28 s.
>
> Total I/O (in MB)
>
> Patched:
>
> s100: R:2.4 - W:5862
> s400: R:1337.4 - W:29385.6
> s4000: R:318631 - W:370154
>
> Unpatched:
>
> s100: R:1412.4 - W:7644.6
> s400: R:3180.6 - W:36281.4
> s4000: R:330683 - W:370391
>
>
> So, in essence, CPU time didn't get adversely affected. If anything,
> it got improved by about 20% on the biggest case (scale 4000).

And this test case deletes all tuples in relation and then measure
duration of vacuum.
It would not be effect much in practical use case.

> While total runtime didn't change much, I believe this is only due to
> the fact that the index is perfectly correlated (clustered?) since
> it's a pristine index, so index scans either remove or skip full
> pages, never leaving things half-way. A bloated index would probably
> show a substantially different behavior, I'll try to get a script that
> does it by running pgbench a while before the vacuum or something like
> that.
>
> However, the total I/O metric already shows remarkable improvement.
> This metric is measuring all the I/O including pgbench init, the
> initial vacuum pgbench init does, the delete and the final vacuum. So
> it's not just the I/O for the vacuum itself, but the whole run. We can
> see 

Re: [HACKERS] Hash Indexes

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 12:05 PM, Amit Kapila  wrote:
> Are you expecting a comment change here?
>
>> +old_blkno = _hash_get_oldblock_from_newbucket(rel,
>> opaque->hasho_bucket);
>>
>> Couldn't you pass "bucket" here instead of "hasho->opaque_bucket"?  I
>> feel like I'm repeating this ad nauseum, but I really think it's bad
>> to rely on the special space instead of our own local variables!
>>
>
> Sure, we can pass bucket as well. However, if you see few lines below
> (while (BlockNumberIsValid(opaque->hasho_nextblkno))), we are already
> relying on special space to pass variables. In general, we are using
> special space to pass variables to functions in many other places in
> the code.  What exactly are you bothered about in accessing special
> space, if it is safe to do?

I don't want to rely on the special space to know which buffers we
have locked or pinned.  We obviously need the special space to find
the next and previous buffers in the block chain -- there's no other
way to know that.  However, we should be more careful about locks and
pins.  If the special space is corrupted in some way, we still
shouldn't get confused about which buffers we have locked or pinned.

>> I think this comment is saying that we'll release the pin on the
>> primary bucket page for now, and then reacquire it later if the user
>> reverses the scan direction.  But that doesn't sound very safe,
>> because the bucket could be split in the meantime and the order in
>> which tuples are returned could change.  I think we want that to
>> remain stable within a single query execution.
>
> Isn't that possible even without the patch?  Basically, after reaching
> end of forward scan and for doing backward *all* scan, we need to
> perform portal rewind which will in turn call hashrescan where we will
> drop the lock on bucket and then again when we try to move cursor
> forward we acquire lock in _hash_first(), so in between when we don't
> have the lock, the split could happen and next scan results could
> differ.

Well, the existing code doesn't drop the heavyweight lock at that
location, but your patch does drop the pin that serves the same
function, so I feel like there must be some difference.

>> Also, I think that so->hashso_skip_moved_tuples is badly designed.
>> There are two separate facts you need to know: (1) whether you are
>> scanning a bucket that was still being populated at the start of your
>> scan and (2) if yes, whether you are scanning the bucket being
>> populated or whether you are instead scanning the corresponding "old"
>> bucket.  You're trying to keep track of that using one Boolean, but
>> one Boolean only has two states and there are three possible states
>> here.
>
> So do you prefer to have two booleans to track those facts or have an
> uint8 with a tri-state value or something else?

I don't currently have a preference.

>> Come to think of it, I'm a little worried about the locking in
>> _hash_squeezebucket().  It seems like we drop the lock on each "write"
>> bucket page before taking the lock on the next one.  So a concurrent
>> scan could get ahead of the cleanup process.  That would be bad,
>> wouldn't it?
>
> Yeah, that would be bad if it happens, but no concurrent scan can
> happen during squeeze phase because we take an exclusive lock on a
> bucket page and maintain it throughout the operation.

Well, that's completely unacceptable.  A major reason the current code
uses heavyweight locks is because you can't hold lightweight locks
across arbitrary amounts of work -- because, just to take one example,
a process holding or waiting for an LWLock isn't interruptible.  The
point of this redesign was to get rid of that, so that LWLocks are
only held for short periods.  I dislike the lock-chaining approach
(take the next lock before releasing the previous one) quite a bit and
really would like to find a way to get rid of that, but the idea of
holding a buffer lock across a complete traversal of an unbounded
number of overflow buckets is far worse.  We've got to come up with a
design that doesn't require that, or else completely redesign the
bucket-squeezing stuff.

(Would it make any sense to change the order of the hash index patches
we've got outstanding?  For instance, if we did the page-at-a-time
stuff first, it would make life simpler for this patch in several
ways, possibly including this issue.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-11-17 Thread Amit Kapila
On Thu, Nov 17, 2016 at 3:08 AM, Robert Haas  wrote:
> On Sat, Nov 12, 2016 at 12:49 AM, Amit Kapila  wrote:
>> You are right and I have changed the code as per your suggestion.
>
> So...
>
> +/*
> + * We always maintain the pin on bucket page for whole scan 
> operation,
> + * so releasing the additional pin we have acquired here.
> + */
> +if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
> +_hash_dropbuf(rel, *bufp);
>
> This relies on the page contents to know whether we took a pin; that
> seems like a bad plan.  We need to know intrinsically whether we took
> a pin.
>

Okay, I think we can do that as we have bucket buffer information
(hashso_bucket_buf) in HashScanOpaqueData.  We might need to pass this
information in _hash_readprev.

> + * If the bucket split is in progress, then we need to skip tuples that
> + * are moved from old bucket.  To ensure that vacuum doesn't clean any
> + * tuples from old or new buckets till this scan is in progress, maintain
> + * a pin on both of the buckets.  Here, we have to be cautious about
>
> It wouldn't be a problem if VACUUM removed tuples from the new bucket,
> because they'd have to be dead anyway.   It also wouldn't be a problem
> if it removed tuples from the old bucket that were actually dead.  The
> real issue isn't vacuum anyway, but the process of cleaning up after a
> split.  We need to hold the pin so that tuples being moved from the
> old bucket to the new bucket by the split don't get removed from the
> old bucket until our scan is done.
>

Are you expecting a comment change here?

> +old_blkno = _hash_get_oldblock_from_newbucket(rel,
> opaque->hasho_bucket);
>
> Couldn't you pass "bucket" here instead of "hasho->opaque_bucket"?  I
> feel like I'm repeating this ad nauseum, but I really think it's bad
> to rely on the special space instead of our own local variables!
>

Sure, we can pass bucket as well. However, if you see few lines below
(while (BlockNumberIsValid(opaque->hasho_nextblkno))), we are already
relying on special space to pass variables. In general, we are using
special space to pass variables to functions in many other places in
the code.  What exactly are you bothered about in accessing special
space, if it is safe to do?

> -/* we ran off the end of the bucket without finding a match */
> +/*
> + * We ran off the end of the bucket without finding a match.
> + * Release the pin on bucket buffers.  Normally, such pins are
> + * released at end of scan, however scrolling cursors can
> + * reacquire the bucket lock and pin in the same scan multiple
> + * times.
> + */
>  *bufP = so->hashso_curbuf = InvalidBuffer;
>  ItemPointerSetInvalid(current);
> +_hash_dropscanbuf(rel, so);
>
> I think this comment is saying that we'll release the pin on the
> primary bucket page for now, and then reacquire it later if the user
> reverses the scan direction.  But that doesn't sound very safe,
> because the bucket could be split in the meantime and the order in
> which tuples are returned could change.  I think we want that to
> remain stable within a single query execution.
>

Isn't that possible even without the patch?  Basically, after reaching
end of forward scan and for doing backward *all* scan, we need to
perform portal rewind which will in turn call hashrescan where we will
drop the lock on bucket and then again when we try to move cursor
forward we acquire lock in _hash_first(), so in between when we don't
have the lock, the split could happen and next scan results could
differ.

Also, in the documentation, it is mentioned that "The SQL standard
says that it is implementation-dependent whether cursors are sensitive
to concurrent updates of the underlying data by default. In
PostgreSQL, cursors are insensitive by default, and can be made
sensitive by specifying FOR UPDATE." which I think indicates that
results can't be guaranteed for forward and backward scans.

So, even if we try to come up with some solution for stable results in
some scenarios, I am not sure that can be guaranteed for all
scenarios.


> +/*
> + * setting hashso_skip_moved_tuples to false
> + * ensures that we don't check for tuples that 
> are
> + * moved by split in old bucket and it also
> + * ensures that we won't retry to scan the old
> + * bucket once the scan for same is finished.
> + */
> +so->hashso_skip_moved_tuples = false;
>
> I think you've got a big problem here.  Suppose the user starts the
> scan in the new bucket and runs it forward until they end up in the
> old bucket.  Then they turn 

Re: [HACKERS] Declarative partitioning - another take

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 6:27 AM, Amit Langote
 wrote:
> OK, I will share the performance results soon.

Thanks.

>> Also, in 0006:
>>
>> - I doubt that PartitionTreeNodeData's header comment will survive
>> contact with pgindent.
>
> Fixed by adding "/* " at the top of the comment.

OK.  Hopefully you also tested that.  In general, with a patch set
this large, the more you can do to make it pgindent-clean, the better.
Ideally none of your code would get changed by pgindent, or at least
not the new files.  But note that you will probably have have to
update typedefs.list if you actually want to be able to run it without
having it mangle things that involve your new structs.

>> - The code in make_modifytable() to swap out the rte_array for a fake
>> one looks like an unacceptable kludge.  I don't know offhand what a
>> better design would look like, but what you've got is really ugly.
>
> Agree that it looks horrible.  The problem is we don't add partition
> (child table) RTEs when planning an insert on the parent and FDW
> partitions can't do without some planner handling - planForeignModify()
> expects a valid PlannerInfo for deparsing target lists (basically, to be
> able to use planner_rt_fetch()).
>
> Any design beside this kludge would perhaps mean that we will be adding
> valid partition RTEs at some earlier planning stage much like what
> expand_inherited_rtentry() does during non-insert queries.  Perhaps, we
> could have expand_inherited_rtentry() make an exception in partitioned
> tables' case and create root->append_rel_list members even in the insert
> case.  We could then go over the append_rel_list in make_modifytable() to
> get the valid RT index of foreign child tables to pass to
> PlanForeignModify().  Using append_rel_list for insert planning is perhaps
> equally ugly though. Thoughts?

If it's only needed for foreign tables, how about for v1 we just throw
an error and say errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot route inserted tuples to a foreign table") for now.  We
can come back and fix it later.  Doing more inheritance expansion
early is probably not a good idea because it likely sucks for
performance, and that's especially unfortunate if it's only needed for
foreign tables.  Coming up with some new FDW API or some modification
to the existing one is probably better, but I don't really want to get
hung up on that right now.

>> - I don't understand how it can be right for get_partition_for_tuple()
>> to be emitting an error that says "range partition key contains null".
>> A defective partition key should be detected at partition creation
>> time, not in the middle of tuple routing.
>
> That error is emitted when the partition key of the *input tuple* is found
> to contain NULLs.  Maybe we need to consider something like how btree
> indexes treat NULLs - order NULLs either before or after all the non-NULL
> values based on NULLS FIRST/LAST config.  Thoughts?

Well, I'd say my confusion suggests that the error message needs to be
clearer about what the problem is.  I think this is basically a case
of there being no partition for the given row, so maybe just arrange
to use that same message here ("no partition of relation \"%s\" found
for row").

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding pin scan during btree vacuum

2016-11-17 Thread Alvaro Herrera
Alvaro Herrera wrote:
> I am ready now to backpatch this to 9.4 and 9.5; here are the patches.
> They are pretty similar, but some adjustments were needed due to XLog
> format changes in 9.5.  (I kept most of Simon's original commit
> message.)

Finally done.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-11-17 Thread Robert Haas
On Wed, Nov 16, 2016 at 11:28 PM, Michael Paquier
 wrote:
> On Wed, Nov 16, 2016 at 8:04 PM, Michael Paquier
>  wrote:
>> In the current set of patches, the sha2 functions would not get used
>> until the main patch for SCRAM gets committed so that's a couple of
>> steps and many months ahead.. And --as-needed/--no-as-needed are not
>> supported in macos. So I would believe that the best route is just to
>> use this patch with the way it does things, and once SCRAM gets in we
>> could switch the build into more appropriate linking. At least that's
>> far less ugly than having fake objects in the backend code. Of course
>> a comment in pgcrypo's Makefile would be appropriate.
>
> Or a comment with a "ifeq ($(PORTNAME), darwin)" containing the
> additional objects to make clear that this is proper to only OSX.
> Other ideas are welcome.

So, the problem isn't Darwin-specific.  I experimented with this on
Linux and found Linux does the same thing with libpgcommon_srv.a that
macOS does: a file in the archive that is totally unused is omitted
from the postgres binary.  In Linux, however, that doesn't prevent
pgcrypto from compiling anyway.  It does, however, prevent it from
working.  Instead of failing at compile time with a complaint about
missing symbols, it fails at load time.  I think that's because macOS
has -bundle-loader and we use it; without that, I think we'd get the
same behavior on macOS that we get on Windows.

The fundamental problem here is that the archive-member-dropping
behavior that we're getting here is not really what we want, and I
think that's going to happen on most or all architectures.  For GNU
ld, we could add -Wl,--whole-archive, and macOS has -all_load, but I
that this is just a nest of portability problems waiting to happen.  I
think there are two things we can do here that are far simpler:

1. Rejigger things so that we don't build libpgcommon_srv.a in the
first place, and instead add $(top_builddir)/src/common to
src/backend/Makefile's value of SUBDIRS.  With appropriate adjustments
to src/common/Makefile, this should allow us to include all of the
object files on the linker command line individually instead of
building an archive library that is then used only for the postgres
binary itself anyway.  Then, things wouldn't get dropped.

2. Just postpone committing this patch until we're ready to use the
new code in the backend someplace (or add a dummy reference to it
someplace).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use of pg_proc.probin is legal?

2016-11-17 Thread Kohei KaiGai
2016-11-16 7:46 GMT-08:00 Tom Lane :
> Kohei KaiGai  writes:
>> On the other hands, interpret_AS_clause() raises an ereport if SQL
>> function tries to use probin except
>> for C-language. Is it illegal for other languages to use probin field
>> to store something useful?
>
> Well, there's no convention about how to use it.
>
>> In my case, PL/CUDA language allows to define SQL function with a CUDA
>> code block.
>> It saves a raw CUDA source code on the pg_proc.prosrc and its
>> intermediate representation
>> on the pg_proc.probin; which is automatically constructed on the
>> validator callback of the language
>> handler.
>
> I have precisely zero sympathy for such a kluge.  The validator exists
> to validate, it is not supposed to modify the pg_proc row.
>
> We could imagine extending the PL API to allow storage of a compiled
> version in probin, but overloading the validator isn't the way to do
> that IMO.  I'd prefer to see a separate "compile" function for it.
> Existence of a compile function could be the trigger that instructs,
> eg, pg_dump not to include the probin value in the dump.
>
> (There once was a LANCOMPILER option in the CREATE LANGUAGE syntax,
> which I imagine was meant to do something like this, but it was never
> fully implemented and we got rid of it years ago.)
>
> The bigger question though is whether it's really worth the trouble.
> All existing PLs deal with this by caching compiled (to one degree
> or another) representations in process memory.  If you keep it in
> probin you can save some compilation time once per session, but on the
> other hand you're limited to representations that can fit in a flat blob.
>
After the thought with fresh brain, it may not be a serious problem.
Right now, it assigns dependency between a PL/CUDA function and
its helper functions (*1), then put function's OID in the intermediate
representation on the pg_proc.probin, to work even if function gets
renamed.
However, it is a situation simply we can raise an ereport if helper
function was not found, instead of the dependency mechanism.
In addition, the heavier compilation stuff is intermediate form to
the real binary form rather than the source-to-intermediate stuff,
although I don't know about other script language.

Thanks,

(*1) PL/CUDA allows to put some kind of helper function in its
program source for some purpose. For example, the directive
below suggests the runtime of PL/CUDA to allocate a particular
amount of device memory according to the result (int8) of
my_plcuda_workbuf_sz() which have identical argument set with
the PL/CUDA function.

$$
  :
#plcuda_workbuf_sz my_plcuda_func_workbuf_sz
  :
$$

-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-17 Thread Karl O. Pinc
Hi Gilles,

On Sun, 30 Oct 2016 02:04:59 -0500
"Karl O. Pinc"  wrote:

> Attached is a patch to be applied on top of your v10 patch
> which does basic fixup to logfile_writename().

I'm looking at the v13 patch and don't see a change I submitted
with a patch to v10.  You wrote:

snprintf(tempfn, sizeof(tempfn), "%s",
CURRENT_LOG_FILENAME);
strcat(tempfn, ".tmp");

I patched to:

snprintf(tempfn, sizeof(tempfn), "%s.tmp",
CURRENT_LOG_FILENAME);

As long as you're doing a snprintf() there's no point
in "risking" a buffer overflow by a subsequent strcat().
(Not that you're likely to ever get a buffer overflow.)
And why make two calls instead of 1?  That's what's
in my head.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-17 Thread Robert Haas
On Thu, Nov 17, 2016 at 10:07 AM, Tobias Bussmann  wrote:
>> Yeah, we could do something like this, perhaps not in exactly this
>> way, but I'm not sure it's a good idea to just execute the parallel
>> plan without workers.
>
> sure, executing parallel plans w/o workers seems a bit of a hack. But:
> - we already do it this way in some other situations

True, but we also try to avoid it whenever possible, because it's
likely to lead to poor performance.

> - the alternative in this special situation would be to _force_ replanning 
> without the CURSOR_OPT_PARALLEL_OK. The decision for replanning is hidden 
> deep within plancache.c and while we could influence it with 
> CURSOR_OPT_CUSTOM_PLAN this wouldn't have an effect if the prepared statement 
> doesn't have any parameters. Additionally, influencing the decision and 
> generating a non-parallel plan would shift the avg cost calculation used to 
> choose custom or generic plans.

I think it would be a good idea to come up with a way for a query to
produce both a parallel and a non-parallel plan and pick between them
at execution time.  However, that's more work than I've been willing
to undertake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-17 Thread Robert Haas
On Wed, Nov 16, 2016 at 11:14 PM, Michael Paquier
 wrote:
> Hm. Thinking about that again, having a GUC to control if orphaned
> temp tables in autovacuum is an overkill (who is going to go into this
> level of tuning!?) and that we had better do something more aggressive
> as there have been cases of users complaining about dangling temp
> tables. I suspect the use case where people would like to have a look
> at orphaned temp tables after a backend crash is not that wide, at
> least a method would be to disable autovacuum after a crash if such a
> monitoring is necessary. Tom has already stated upthread that the
> patch to remove wildly locks is not acceptable, and he's clearly
> right.
>
> So the best move would be really to make the removal of orphaned temp
> tables more aggressive, and not bother about having a GUC to control
> that. The patch sent in
> https://www.postgresql.org/message-id/cab7npqsrywaz1i12mpkh06_roo3ifgcgr88_aex1oeg2r4o...@mail.gmail.com
> does so, so I am marking the CF entry as ready for committer for this
> patch to attract some committer's attention on the matter.

The whole point of the review process is to get an opinion from
somebody other than the original author on the patch in question.  If
you write a patch and then mark your own patch Ready for Committer,
you're defeating the entire purpose of this process.  I think that's
what you did here, I think you have done it on other occasions in the
not-too-distant patch, and I wish you'd stop.  I understand perfectly
well that there are times when a committer needs to involve themselves
directly in a patch even when nobody else has reviewed it, because
that just has to happen, and I try to keep an eye out for such
scenarios, but it's frustrating to clear time to work on the RfC
backlog and then discover patches that have just been marked that way
without discussion or agreement.

Now, on the substance of this issue, I think your proposed change to
clean up temporary tables immediately rather than waiting until they
become old enough to threaten wraparound is a clear improvement, and
IMHO we should definitely do it.  However, I think your proposed
documentation changes are pretty well inadequate.  If somebody
actually does want to get at the temporary tables of a crashed
backend, they will need to do a lot more than what you mention in that
update.  Even if they set autovacuum=off, they will be vulnerable to
having those tables removed by another backend with the same backend
ID that reinitializes the temporary schema.  And even if that doesn't
happen, they won't be able to select from those tables because they'll
get an error about reading temporary tables of another session.  To
fix that they'd have to move the temporary tables into some other
schema AND change the filenames on disk.  And then on top of that, by
the time anybody even found this in the documentation, it would be far
too late to shut off autovacuum in the first place because it runs
every minute (unless you have raised autovacuum_naptime, which you
shouldn't).  I think we just shouldn't document this.  The proposed
behavior change is basically switching from an extremely conservative
behavior with surprising consequences to what people would expect
anyway, and anyone who needs to dig information out of another
session's temporary tables is going to need to know a lot more about
the server internals than we normally explain in the documentation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2016-11-17 Thread Tom Lane
Rahila Syed  writes:
> CASE 2:
> postgres=# create view v as select 'abc' a;
> 2016-11-16 15:28:48 IST WARNING:  column "a" has type "unknown"
> 2016-11-16 15:28:48 IST DETAIL:  Proceeding with relation creation anyway.
> WARNING:  column "a" has type "unknown"
> DETAIL:  Proceeding with relation creation anyway.
> CREATE VIEW

We really ought to make that a hard error.  And ideally fix things so
that the type of the view column will be resolved as text, so that you
don't hit this condition in the first place; but there is no good that
comes out of allowing a view to be created like this.

> Attached WIP patch does that. Kindly let me know your opinion.

This is a seriously ugly kluge that's attacking the symptom not the
problem.  Or really, a symptom not the problem.  There are lots of
other symptoms, for instance

regression=# select * from v order by 1;
ERROR:  failed to find conversion function from unknown to text

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unlogged tables cleanup

2016-11-17 Thread Robert Haas
On Wed, Nov 16, 2016 at 11:55 PM, Michael Paquier
 wrote:
> On Wed, Nov 16, 2016 at 7:09 PM, Robert Haas  wrote:
>> On Wed, Nov 16, 2016 at 3:54 PM, Michael Paquier
>>  wrote:
>>> Indeed I missed this comment block. Please let me suggest the following 
>>> instead:
>>>  /*
>>>   * Set up an init fork for an unlogged table so that it can be correctly
>>> - * reinitialized on restart.  Since we're going to do an immediate sync, we
>>> - * only need to xlog this if archiving or streaming is enabled.  And the
>>> - * immediate sync is required, because otherwise there's no guarantee that
>>> - * this will hit the disk before the next checkpoint moves the redo 
>>> pointer.
>>> + * reinitialized on restart.  An immediate sync is required even if the
>>> + * page has been logged, because the write did not go through
>>> + * shared_buffers and therefore a concurrent checkpoint may have moved
>>> + * the redo pointer past our xlog record.
>>>   */
>>
>> Hmm.  Well, that deletes the comment that's no longer true, but it
>> doesn't replace it with any explanation of why we also need to WAL-log
>> it unconditionally, and I think that explanation is not entirely
>> trivial?
>
> OK, the original code does not give any special reason either
> regarding why doing so is safe for archiving or streaming :)

Yeah, but surely it's obvious that if you don't WAL log it, it's not
going to work for archiving or streaming.  It's a lot less obvious why
you have to WAL log it when you're not doing either of those things if
you've already written it and fsync'd it locally.  How is it going to
disappear if it's been fsync'd, one wonders?

> More seriously, if there could be more details regarding that, I would
> think that we could say something like "logging the init fork is
> mandatory in any case to ensure its on-disk presence when at recovery
> replay, even on non-default tablespaces whose base location are
> deleted and re-created from scratch if the WAL record in charge of
> creating this tablespace gets replayed". The problem shows up because
> of tablespaces being deleted at replay at the end... So perhaps this
> makes sense. What do you think?

Yes, that's about what I think we need to explain.  Actually, I'm
wondering if we ought to just switch this over to using the delayChkpt
mechanism instead of going through this complicated song-and-dance.
Incurring an immediate sync to avoid having to WAL-log this is a bit
tenuous, but having to WAL-log it AND do the immediate sync just seems
silly, and I'm actually a bit worried that even with your fix there
might still be a latent bug somewhere.  We couldn't switch mechanisms
cleanly in the 9.2 branch (cf.
f21bb9cfb5646e1793dcc9c0ea697bab99afa523) so perhaps we should
back-patch it in the form you have it in already, but it's probably
worth switching over at least in master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-17 Thread Tobias Bussmann
> Yeah, we could do something like this, perhaps not in exactly this
> way, but I'm not sure it's a good idea to just execute the parallel
> plan without workers.

sure, executing parallel plans w/o workers seems a bit of a hack. But:
- we already do it this way in some other situations
- the alternative in this special situation would be to _force_ replanning 
without the CURSOR_OPT_PARALLEL_OK. The decision for replanning is hidden deep 
within plancache.c and while we could influence it with CURSOR_OPT_CUSTOM_PLAN 
this wouldn't have an effect if the prepared statement doesn't have any 
parameters. Additionally, influencing the decision and generating a 
non-parallel plan would shift the avg cost calculation used to choose custom or 
generic plans.

Maybe someone can come up with a better idea for a solution. These three 
approaches are all I see so far.

Best regards,
Tobias Bussmann

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bug in UCS_to_most.pl

2016-11-17 Thread Kyotaro HORIGUCHI
At Tue, 15 Nov 2016 09:44:54 -0500, Robert Haas  wrote 
in 
> On Tue, Nov 8, 2016 at 3:56 AM, Kyotaro HORIGUCHI
>  wrote:
> > I was surprised to find that
> > src/backend/utils/mb/Unicode/Makefile got changed (3a47c70) to
> > have download feature of character conversion authority files. It
> > seems fine according to the previous discussion as the commnet of
> > that is saying.
> >
> > So, I did 'make maintainer-clean; make' on it but got the
> > following error.
> >
> >> Hash %filename missing the % in argument 1 of keys() at UCS_to_most.pl 
> >> line 51.'/usr/bin/perl' UCS_to_most.pl
> >
> > What is more surprising to find that it is there since
> > 2006. Maybe Heikki unintentionally fixed that in the patch in the
> > thread, or older perls might not have complained about it (my
> > perl is 5.16).
> >
> > Anyway, the attached small patch fixes it.
> 
> It's quite amazing to me that that ever worked for anyone.  Committed.

Thanks.

As for myself, Probably I have never ran UCS_to_most.pl. Maybe no
one ran it since the patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscription

2016-11-17 Thread Dmitry Dolgov
> On 15 November 2016 at 15:03, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
> Hello.
>
> I took a look on the latest -v4 patch. I would like to note that this
> patch breaks a backward compatibility. For instance sr_plan extension[1]
> stop to compile with errors

Thank you for the feedback.

Well, if we're speaking about this particular extension, if I understood
correctly, it fetches all parse tree nodes from Postgres and generates code
using this information. So to avoid compilation problems I believe you need
to
run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
mentions of `ArrayRef`).

But speaking generally, I don't see how we can provide backward
compatibility
for those extensions, who are strongly coupled with implementation details
of
parsing tree. I mean, in terms of interface it's mostly about to replace
`ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the
extension code.


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-17 Thread Mithun Cy
On Thu, Nov 17, 2016 at 8:27 AM, Robert Haas  wrote:
>but SELECT pg_is_in_recovery() and SHOW transaction_read_only
>exist in older versions so if we pick either of those methods then it
>will just work.

I am adding next version of the patch it have following fixes.
Tsunakawa's comments
1.  PGconn->target_server_type is now freed in freePGconn()
2.  Added PGTARGETSERVERTYPE.

*Additional comments from others*
3.  Moved from SELECT pg_is_in_recovery() to SHOW transaction_read_only now
should handle different kind of replication, as we recognise server to
which writable connection can be made as primary. Very exactly like JDBC
driver. Also documented about it.
4. renamed words from master to primary.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


failover_to_new_master-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba_file_settings view patch

2016-11-17 Thread Ashutosh Bapat
On Wed, Nov 16, 2016 at 4:40 PM, Ashutosh Bapat
 wrote:
> make check run with this patch shows server crashes. regression.out
> attached. I have run make check after a clean build, tried building it
> after running configure, but the problem is always reproducible. Do
> you see this problem?
>
> Also the patch has a white space error.
> git diff --check
> src/backend/utils/init/postinit.c:729: space before tab in indent.
> +   /*
>
I looked at the patch in some more details and here are some more comments
1. In catalog.sgml, the sentence "If the configuration file contains any errors
..." looks redundant, as description of "error" field says so. Removed it in
the attached patch. In that example, you might want to provide pg_hba.conf
contents to help understand the view output.

2. I think the view will be useful, if loading the file did not have the
desired effects, whether because of SIGHUP or a fresh start. So, may be the
sentence "for diagnosing problems if a SIGHUP signal did not have the desired
effects.", should be changed to be more generic e.g. ... if loading file did
not have ... .

3. Something wrong with the indentation, at least not how pg_indent would indent
the variable names.
+typedef struct LookupHbaLineCxt
+{
+MemoryContext memcxt;
+TupleDesctupdesc;
+Tuplestorestate *tuple_store;
+} LookupHbaLineCxt;

+static void lookup_hba_line_callback(void *context, int lineno,
HbaLine *hba, const char *err_msg);
Overflows 80 character limit.

in parse_hba_line()
-ereport(LOG,
+ereport(level,
 (errcode(ERRCODE_CONFIG_FILE_ERROR),
  errmsg("invalid connection type \"%s\"",
 token->string),
  errcontext("line %d of configuration file \"%s\"",
 line_num, HbaFileName)));
+*err_msg = pstrdup(_("invalid connection type"));

Is the difference between error reported to error log and that in the view
intentional? That brings to another question. Everywhere, in similar code, the
patch adds a line *err_msg = pstrdup() or psprinf() and copies the arguements
to errmsg(). Someone modifying the error message has to duplicate the changes.
Since the code is just few lines away, it may not be hard to duplicate the
changes, but still that's a maintenance burder. Isn't there a way to compute
the message once and use it twice? show_all_file_settings() used for
pg_file_settings also has similar problem, so may be it's an accepted practice.
There are multiple instances of such a difference, but may be the invalid value
can be found out from the value of the referenced field (which will be part of
the view). So, may be it's ok. But that not true with the difference below.
gai_strerror() may not be obvious from the referenced field.
-ereport(LOG,
+ereport(level,
 (errcode(ERRCODE_CONFIG_FILE_ERROR),
  errmsg("invalid IP address \"%s\": %s",
 str, gai_strerror(ret)),
  errcontext("line %d of configuration file \"%s\"",
 line_num, HbaFileName)));
 if (gai_result)
 pg_freeaddrinfo_all(hints.ai_family, gai_result);
+*err_msg = pstrdup(_("invalid IP address"));

4.
+if (!rsi || !IsA(rsi, ReturnSetInfo) ||
+(rsi->allowedModes & SFRM_Materialize) == 0)
+ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that
cannot accept a set")));
show_all_file_settings(), a function similar to this one, splits the condition
above into two and throws different error message for each of them.
/* Check to see if caller supports us returning a tuplestore */
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("set-valued function called in context that
cannot accept a set")));
if (!(rsinfo->allowedModes & SFRM_Materialize))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("materialize mode required, but it is not " \
"allowed in this context")));
Why is this difference?

5. May be you want to rename "type" attribute to "connection_type" to be
explicit.

6. The attribute names "keyword_database" and "keyword_user" do not seem to be
appropriate. They do not look like keywords as such. They are more like
synonyms or collection (value replication is an exception). May be you want to
rename those as "database_keyword" or "user_keyword" similar to the naming
convention of token_is_a_database_keyword(). I agree that the usage
can not be described in a single phrase correctly, and pg_hba.conf
documentation too doesn't help much. Similarly for keyword_address.

7. Also, 

Re: [HACKERS] proposal: psql \setfileref

2016-11-17 Thread Pavel Stehule
Hi

a independent implementation of parametrized queries can looks like
attached patch:

this code is simple without any unexpected behave.

parameters are used when user doesn't require escaping and when
PARAMETRIZED_QUERIES variable is on

Regards

Pavel
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a7789df..57dd8f0 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -119,9 +119,13 @@ setQFout(const char *fname)
  * If "escape" is true, return the value suitably quoted and escaped,
  * as an identifier or string literal depending on "as_ident".
  * (Failure in escaping should lead to returning NULL.)
+ *
+ * When "from_query" is true, then the variable can be passed as query parameter,
+ * when it is not used as identifier (as_ident:false), when escape is not required
+ * (escaping changes the content).
  */
 char *
-psql_get_variable(const char *varname, bool escape, bool as_ident)
+psql_get_variable(const char *varname, bool escape, bool as_ident, bool from_query)
 {
 	char	   *result;
 	const char *value;
@@ -130,6 +134,26 @@ psql_get_variable(const char *varname, bool escape, bool as_ident)
 	if (!value)
 		return NULL;
 
+	if (from_query && pset.parametrized_queries)
+	{
+		if (!escape && !as_ident)
+		{
+			char		printbuf[5];
+
+			if (pset.nparams > MAX_QUERY_PARAMS)
+			{
+psql_error("too much parameters (more than %d)\n",
+	  MAX_QUERY_PARAMS);
+return NULL;
+			}
+
+			pset.params[pset.nparams++] = value;
+			snprintf(printbuf, sizeof(printbuf) - 1, "$%d", pset.nparams);
+
+			return pstrdup(printbuf);
+		}
+	}
+
 	if (escape)
 	{
 		char	   *escaped_value;
@@ -1287,7 +1311,16 @@ SendQuery(const char *query)
 		if (pset.timing)
 			INSTR_TIME_SET_CURRENT(before);
 
-		results = PQexec(pset.db, query);
+		if (pset.nparams > 0)
+			results = PQexecParams(pset.db, query,
+	  pset.nparams,
+	  NULL,
+	  (const char * const *) pset.params,
+	  NULL,
+	  NULL,
+	  0);
+		else
+			results = PQexec(pset.db, query);
 
 		/* these operations are included in the timing result: */
 		ResetCancelConn();
@@ -1382,6 +1415,9 @@ SendQuery(const char *query)
 
 	ClearOrSaveResult(results);
 
+	/* the number of query parameters are not necessary now */
+	pset.nparams = 0;
+
 	/* Possible microtiming output */
 	if (pset.timing)
 		PrintTiming(elapsed_msec);
@@ -1488,7 +1524,16 @@ ExecQueryUsingCursor(const char *query, double *elapsed_msec)
 	appendPQExpBuffer(, "DECLARE _psql_cursor NO SCROLL CURSOR FOR\n%s",
 	  query);
 
-	results = PQexec(pset.db, buf.data);
+	if (pset.nparams > 0)
+		results = PQexecParams(pset.db, buf.data,
+  pset.nparams,
+  NULL,
+  (const char * const *) pset.params,
+  NULL,
+  NULL,
+  0);
+	else
+		results = PQexec(pset.db, buf.data);
 	OK = AcceptResult(results) &&
 		(PQresultStatus(results) == PGRES_COMMAND_OK);
 	ClearOrSaveResult(results);
diff --git a/src/bin/psql/common.h b/src/bin/psql/common.h
index bdcb58f..79f8c91 100644
--- a/src/bin/psql/common.h
+++ b/src/bin/psql/common.h
@@ -18,7 +18,7 @@
 extern bool openQueryOutputFile(const char *fname, FILE **fout, bool *is_pipe);
 extern bool setQFout(const char *fname);
 
-extern char *psql_get_variable(const char *varname, bool escape, bool as_ident);
+extern char *psql_get_variable(const char *varname, bool escape, bool as_ident, bool from_query);
 
 extern void psql_error(const char *fmt,...) pg_attribute_printf(1, 2);
 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index a69c4dd..3c62146 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -325,7 +325,7 @@ helpVariables(unsigned short int pager)
 	 * Windows builds currently print one more line than non-Windows builds.
 	 * Using the larger number is fine.
 	 */
-	output = PageOutput(88, pager ? &(pset.popt.topt) : NULL);
+	output = PageOutput(90, pager ? &(pset.popt.topt) : NULL);
 
 	fprintf(output, _("List of specially treated variables\n\n"));
 
@@ -352,6 +352,8 @@ helpVariables(unsigned short int pager)
 	fprintf(output, _("  LASTOIDvalue of the last affected OID\n"));
 	fprintf(output, _("  ON_ERROR_ROLLBACK  if set, an error doesn't stop a transaction (uses implicit savepoints)\n"));
 	fprintf(output, _("  ON_ERROR_STOP  stop batch execution after error\n"));
+	fprintf(output, _("  PARAMETRIZED_QUERIES\n"
+	  " pass psql's variables as query parameters\n"));
 	fprintf(output, _("  PORT   server port of the current connection\n"));
 	fprintf(output, _("  PROMPT1specifies the standard psql prompt\n"));
 	fprintf(output, _("  PROMPT2specifies the prompt used when a statement continues from a previous line\n"));
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index 37dfa4d..cb6a2ee 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -403,6 +403,9 @@ MainLoop(FILE *source)
 		

Re: [HACKERS] WIP: About CMake v2

2016-11-17 Thread Yury Zhuravlev

Michael Paquier wrote:

src/include/port/win32.h:#define putenv(x) pgwin32_putenv(x)

and my MSVC2015 drop down here because pgwin32_putenv has wrong signature.
I hope it is not true now. 


--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers