date:20150810

Michael Paquier  writes:
> On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane  wrote:
>> Somebody refresh my memory as to why we have this restriction (that is,
>> slave's max_connections >= master's max_connections) in the first place?
>> Seems like it should not be a necessary requirement, and working towards
>> getting rid of it would be far better than any other answer.

> If I recall correctly, that's because KnownAssignedXIDs and the lock
> table need to be large enough on the standby for the largest snapshot
> possible (procarray.c).

Hm.  Surely KnownAssignedXIDs could be resized at need.  As for the shared
lock table on the standby, that could be completely occupied by locks
taken by hot-standby backend processes, so I don't see why we're insisting
on anything particular as to its size.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Priority table or Cache table

2015-08-10 Thread Haribabu Kommi

On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila  wrote:
> On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi 
> wrote:
>
> What is the configuration for test (RAM of m/c, shared_buffers,
> scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

>>  Threads HeadPatchedDiff
>>  1  3123  3238  3.68%
>>  2  5997  6261  4.40%
>>  4 11102   11407  2.75%
>>
>> I am suspecting that, this may because of buffer locks that are
>> causing the problem.
>> where as in older approach of different buffer pools, each buffer pool
>> have it's own locks.
>> I will try to collect the profile output and analyze the same.
>>
>> Any better ideas?
>>
>
> I think you should try to find out during test, for how many many pages,
> it needs to perform clocksweep (add some new counter like
> numBufferBackendClocksweep in BufferStrategyControl to find out the
> same).  By theory your patch should reduce the number of times it needs
> to perform clock sweep.
>
> I think in this approach even if you make some buffers as non-replaceable
> (buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
> needs to access all the buffers.  I think we might want to find some way to
> reduce that if this idea helps.
>
> Another thing is that, this idea looks somewhat similar (although not same)
> to current Ring Buffer concept, where Buffers for particular types of scan
> uses buffers from Ring.  I think it is okay to prototype as you have done
> in patch and we can consider to do something on those lines if at all
> this patch's idea helps.

Thanks for the details. I will try the same.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_connections and standby server

2015-08-10 Thread Michael Paquier

On Tue, Aug 11, 2015 at 2:57 PM, Michael Paquier
 wrote:
> On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane  wrote:
>> Tatsuo Ishii  writes:
>>> I think this is because pg_control on the standby remembers that the
>>> previous primary server's max_connections = 1100 even if the standby
>>> server fails to start. Shouldn't we update pg_control file only when
>>> standby succeeds to start?
>>
>> Somebody refresh my memory as to why we have this restriction (that is,
>> slave's max_connections >= master's max_connections) in the first place?
>> Seems like it should not be a necessary requirement, and working towards
>> getting rid of it would be far better than any other answer.
>
> If I recall correctly, that's because KnownAssignedXIDs and the lock
> table need to be large enough on the standby for the largest snapshot
> possible (procarray.c).

... And the maximum number of locks possible on master (for the lock
table, wasn't it for the concurrent number of AccessExclusiveLocks,
btw?).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_connections and standby server

2015-08-10 Thread Michael Paquier

On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane  wrote:
> Tatsuo Ishii  writes:
>> I think this is because pg_control on the standby remembers that the
>> previous primary server's max_connections = 1100 even if the standby
>> server fails to start. Shouldn't we update pg_control file only when
>> standby succeeds to start?
>
> Somebody refresh my memory as to why we have this restriction (that is,
> slave's max_connections >= master's max_connections) in the first place?
> Seems like it should not be a necessary requirement, and working towards
> getting rid of it would be far better than any other answer.

If I recall correctly, that's because KnownAssignedXIDs and the lock
table need to be large enough on the standby for the largest snapshot
possible (procarray.c).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Masahiko Sawada

On Tue, Aug 11, 2015 at 1:50 AM, Alvaro Herrera
 wrote:
> Masahiko Sawada wrote:
>
>> This topic may have been already discussed but, why don't we use just
>> total scanned pages and total pages?
>
> Because those numbers don't extrapolate nicely.  If the density of dead
> tuples is irregular across the table, such absolute numbers might be
> completely meaningless: you could scan 90% of the table without seeing
> any index scan, and then at the final 10% be hit by many index scans
> cleaning dead tuples.  Thus you would see progress go up to 90% very
> quickly and then take hours to have it go to 91%.  (Additionally, and a
> comparatively minor point: since you don't know how many index scans are
> going to happen, there's no way to know the total number of blocks
> scanned, unless you don't count index blocks at all, and then the
> numbers become a lie.)
> If you instead track number of heap pages separately from index pages,
> and indicate how many index scans have taken place, you have a chance of
> actually figuring out how many heap pages are left to scan and how many
> more index scans will occur.

Thank you for your explanation!
I understood about this.

> VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an 
> unbounded number of phases. But that assumes that we don't count truncation 
> as a 4th phase of VACUUM...

In case of vacuum, I think we need to track the number of scanned heap
pages at least, and the information about index scan is the additional
information.
The another idea for displaying progress is to have two kind of
information: essential information and additional information.

Essential information has one numeric data, which is stored
essentially information regarding of its processing.
Additional information has two data: text and numeric. These data is
free-style data which is stored by each backend as it like.
And these three data are output at same time.

For example, In case of vacuum, essential information is the number of
total scanned heap page.

* When lazy_scan_heap starts, the two additional data are NULL.

* When lazy_vacuum_index starts, the backend set additional data like
followings.
  - "Index vacuuming" into text data which describes what we're doing
now actually.
  - "50" into numeric data which describes how many index pages we scanned.

* And when lazy_vacuum_index is done, backend sets additional data NULL again.

Regards,

--
Masahiko Sawada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_connections and standby server

2015-08-10 Thread Tatsuo Ishii

> Somebody refresh my memory as to why we have this restriction (that is,
> slave's max_connections >= master's max_connections) in the first place?
> Seems like it should not be a necessary requirement, and working towards
> getting rid of it would be far better than any other answer.

If you care about max_connections, you might want to care about below as well 
(from xlog.c)

RecoveryRequiresIntParameter("max_worker_processes",
 
max_worker_processes,
 
ControlFile->max_worker_processes);
RecoveryRequiresIntParameter("max_prepared_transactions",
 
max_prepared_xacts,
 
ControlFile->max_prepared_xacts);
RecoveryRequiresIntParameter("max_locks_per_transaction",

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_connections and standby server

Tatsuo Ishii  writes:
> I think this is because pg_control on the standby remembers that the
> previous primary server's max_connections = 1100 even if the standby
> server fails to start. Shouldn't we update pg_control file only when
> standby succeeds to start?

Somebody refresh my memory as to why we have this restriction (that is,
slave's max_connections >= master's max_connections) in the first place?
Seems like it should not be a necessary requirement, and working towards
getting rid of it would be far better than any other answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] linux sparc compile issue

2015-08-10 Thread Waldemar Brodkorb

Hi Tom,
Tom Lane wrote,

> Waldemar Brodkorb  writes:
> > while doing regular builds via buildroot autobuilders
> > a compile problem for sparc 32bit v8 was found.
> > It seems the defines for Linux are other than for Solaris.
> 
> > Following patch fixes it for buildroot:
> > The gcc predefines for Linux are __sparc_v8__/__sparc_v7__
> 
> I've applied your suggested patch for this, but I'm a bit curious what
> version of gcc you are using; our code's been like that for a very long
> time and nobody complained before.

Thanks.
The cross-compiler we use is gcc 4.9.3. But also the native gcc
on my Sun Voyager running Debian 4.0 have it:
platin:~# gcc -dM -E - http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] max_connections and standby server

2015-08-10 Thread Tatsuo Ishii

Today I encountered an interesting situation.

1) A streaming replication primary server and a standby server is
   running. At this point max_connections = 100 on both servers.

2) Shutdown both servers.

3) Change max_connections to 1100 on both servers and restart both
   servers.

4) The primary server happily started but the standby server won't
   because of lacking resource.

5) Shutdown both servers.

6) Restore max_connections to 100 on both servers and restart both
   servers.

7) The primary server happily started but the standby server won't
   because of the reason below.

32695 2015-08-11 13:46:22 JST FATAL:  hot standby is not possible because 
max_connections = 100 is a lower setting than on the master server (its value 
was 1100)
32695 2015-08-11 13:46:22 JST CONTEXT:  xlog redo parameter change: 
max_connections=1100 max_worker_processes=8 max_prepared_xacts=10 
max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off
32693 2015-08-11 13:46:22 JST LOG:  startup process (PID 32695) exited with 
exit code 1
32693 2015-08-11 13:46:22 JST LOG:  terminating any other active server 
processes

I think this is because pg_control on the standby remembers that the
previous primary server's max_connections = 1100 even if the standby
server fails to start. Shouldn't we update pg_control file only when
standby succeeds to start?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.




On 08/10/2015 07:29 PM, Tom Lane wrote:

I wrote:

Peter Eisentraut  writes:

This was probably just copied from how proacl and lanacl are handled,
which predate typacl by quite a bit.  Maybe there was a reason in those
days.

Hm ... I wonder whether those are well-thought-out either.

They're not.  Testing with ancient servers shows that we dump very silly
grant/revoke state for functions and languages as well, if the source
server is too old to have proacl or lanacl (ie, pre-7.3).  As with typacl,
the silliness is accidentally masked as long as the owner doesn't do
something like revoke the privileges granted to PUBLIC.

Things work far more sanely with the attached patch, to wit we just leave
all object privileges as default if dumping from a version too old to have
privileges on that type of object.  I think we should back-patch this into
all supported branches; it's considerably more likely that older branches
would be used to dump from ancient servers.





FYI this has fixed the problem that was encountered in cross-version 
upgrade testing.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix oversight converting buf_id to Buffer

2015-08-10 Thread Qingqing Zhou

On Mon, Aug 10, 2015 at 4:15 PM, Andres Freund  wrote:
>
> That's a very nice catch! Did you trigger the error or just found it
> when reading the code?
>

My fellow colleagues hit the issue during some stress: I am not clear
the exact repro but from the failed assertion, the cause is kinda
clear.

Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/10/2015 04:03 PM, David Fetter wrote:
>> I am wrong on that? I feel like you guys are all telling me that 
>> \pivot should happen on the server, but the point that it would
>> not be realistic to begin with is not considered.
> 
> I think that starting the conversation again, especially at this
> stage of the 9.6 cycle, is a very good thing, whatever its
> outcome.

Talk to Atri Sharma -- he recently indicated to me that he might be
working on built-in pivot in the near term future (if not already
started).

- -- 
Joe Conway
Crunchy Data
Enterprise PostgreSQL
http://crunchydata.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVyUVSAAoJEDfy90M199hlhEgP/1mQpZ0JOR2kbaP5Iw6FY4q+
P1SLWbCWMhVDp97OAjbP0T34YmG5Rw0fDh4WLPbNBp6xyPTQQgIVFvQAu6kDSngk
X9kbQHKGE+pNqc9hMc7CyuSse8Pw8VeQGRDU5a+8E3fPdi9rbB2YTDt7SJIXlavc
zJ8kZlUj59Xw9Kdkpon8Jb/Nxn3JV4GWLTe4+nxRZoH/9POjslyM+rVGtrMlHA59
0NWWPnTsLfU1HNk+olf72ihZpmQM4KPog8PdWo0GyFqJhMwmYtE/WTJh4jNDygBe
NXTQE7/I5OA6fYQniq+7Wsyhc5raOH16lVSSo0QquuTtGG2mrYsvd82Zx7J0SDQp
NfVk4qgjJYMNBN9/hvPXZZ2+LReYqliloR2PqLqxgDn3DyGgSpSUs1JyDZRtoJEj
P4jEVGVWnUbjKuNldRJZi1DmMTKVSS2RSnoC0RJ7DzAfUyQ4oH4FmX8jX5rZpoXN
nJLtVPhIRpp0A2Lq849hXB3/LuNzjYmZ3VvGNUwffTD7d3XxQ/Zeb9ZtWZszfJUo
zjMAh0wFwDtdVBYdFStzlByGTSEIqeqdGCDSHhiZlNw/E0xV2YjNzdUP9N34aWml
i+KUgPMxTJ/GAineSXpjt+I6Y+cHA10JpQLOf6fVdrOp/R1Z2EqFfZl2GzbyXJ5n
lz4QjhOfx6YG1Tdi5qYW
=nyuc
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-10 Thread Michael Paquier

On Tue, Aug 11, 2015 at 4:28 AM, Andres Freund wrote:
> On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO wrote:
>>> You can't allocate 4GB with palloc(), it has a builtin limit against
>>> allocating more than 1GB.
>>
>>Argh, too bad, I assumed very naively that palloc was malloc in
>>disguise.
>
> It is, but there's some layering (memory pools/contexts) on top. You can get 
> huge allocations with polloc_huge.

palloc_huge does not exist yet ;)
There is either repalloc_huge or palloc_extended now, though
implementing one would be trivial.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

I wrote:
> Peter Eisentraut  writes:
>> This was probably just copied from how proacl and lanacl are handled,
>> which predate typacl by quite a bit.  Maybe there was a reason in those
>> days.

> Hm ... I wonder whether those are well-thought-out either.

They're not.  Testing with ancient servers shows that we dump very silly
grant/revoke state for functions and languages as well, if the source
server is too old to have proacl or lanacl (ie, pre-7.3).  As with typacl,
the silliness is accidentally masked as long as the owner doesn't do
something like revoke the privileges granted to PUBLIC.

Things work far more sanely with the attached patch, to wit we just leave
all object privileges as default if dumping from a version too old to have
privileges on that type of object.  I think we should back-patch this into
all supported branches; it's considerably more likely that older branches
would be used to dump from ancient servers.

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 51b6d3a..87dadbf 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** getTypes(Archive *fout, int *numTypes)
*** 3513,3519 
  	else if (fout->remoteVersion >= 80300)
  	{
  		appendPQExpBuffer(query, "SELECT tableoid, oid, typname, "
! 		  "typnamespace, '{=U}' AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
--- 3513,3519 
  	else if (fout->remoteVersion >= 80300)
  	{
  		appendPQExpBuffer(query, "SELECT tableoid, oid, typname, "
! 		  "typnamespace, NULL AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
*** getTypes(Archive *fout, int *numTypes)
*** 3528,3534 
  	else if (fout->remoteVersion >= 70300)
  	{
  		appendPQExpBuffer(query, "SELECT tableoid, oid, typname, "
! 		  "typnamespace, '{=U}' AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
--- 3528,3534 
  	else if (fout->remoteVersion >= 70300)
  	{
  		appendPQExpBuffer(query, "SELECT tableoid, oid, typname, "
! 		  "typnamespace, NULL AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
*** getTypes(Archive *fout, int *numTypes)
*** 3542,3548 
  	else if (fout->remoteVersion >= 70100)
  	{
  		appendPQExpBuffer(query, "SELECT tableoid, oid, typname, "
! 		  "0::oid AS typnamespace, '{=U}' AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
--- 3542,3548 
  	else if (fout->remoteVersion >= 70100)
  	{
  		appendPQExpBuffer(query, "SELECT tableoid, oid, typname, "
! 		  "0::oid AS typnamespace, NULL AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
*** getTypes(Archive *fout, int *numTypes)
*** 3558,3564 
  		appendPQExpBuffer(query, "SELECT "
  		 "(SELECT oid FROM pg_class WHERE relname = 'pg_type') AS tableoid, "
  		  "oid, typname, "
! 		  "0::oid AS typnamespace, '{=U}' AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
--- 3558,3564 
  		appendPQExpBuffer(query, "SELECT "
  		 "(SELECT oid FROM pg_class WHERE relname = 'pg_type') AS tableoid, "
  		  "oid, typname, "
! 		  "0::oid AS typnamespace, NULL AS typacl, "
  		  "(%s typowner) AS rolname, "
  		  "typinput::oid AS typinput, "
  		  "typoutput::oid AS typoutput, typelem, typrelid, "
*** getAggregates(Archive *fout, DumpOptions
*** 4249,4255 
    "CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, "
  		  "aggbasetype AS proargtypes, "
  		  "(%s aggowner) AS rolname, "
! 		  "'{=X}' AS aggacl "
  		  "FROM pg_aggregate "
  		  "where oid > '%u'::oid",
  		  username_subquery,
--- 4249,4255 
    "CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, "
  		  "aggbasetype AS proargtypes, "
  		  "(%s aggowner) AS rolname, "
! 		  "NULL AS aggacl "
  		  "FROM pg_aggregate "
  		  "where oid > '%u'::oid",
  		  username_subquery,
*** getAggregates(Archive *fout, DumpOptions
*** 4264,4270 
    "CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, "
  		  "aggbasetype AS proargtypes, "
  		  "(%s aggowner) AS rolname, "
! 		  "'{=X}' AS aggacl "
  		  "FROM pg_aggregate "
  		  "where oid > '%u'::oid",
  		  username_subquery,
--- 4264,4270 
    "CASE WHEN a

Re: [HACKERS] fix oversight converting buf_id to Buffer

Hi,

That's a very nice catch! Did you trigger the error or just found it
when reading the code?

On 2015-08-10 12:12:01 -0700, Qingqing Zhou wrote:
> Attached patch fixes oversights converting buf_id to Buffer in
> PrintBufferDescs() and InvalidateBuffer(). Especially for the latter,
> the reason we haven't seen any reports of the issue might be that it
> needs certain concurrent conditions to be true.

PrintBufferDescs() is dead code, so bit is not surprising. I'm also not
surprised that the wrong buffer in InvalidateBuffer()'s check doesn't
trigger. You'd have to have a cursor open in the current backend that
currently pins the off-by-one buffer.

I'm too tired right now to look at this, but it generally looked sane.

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread David Fetter

On Mon, Aug 10, 2015 at 07:10:41PM +0200, Daniel Verite wrote:
>   David Fetter wrote:
> 
> > I'm working up a proposal to add (UN)PIVOT support to the back-end.
> 
> I was under the impression that a server-side PIVOT *with dynamic
> columns* was just unworkable as an SQL query, because it couldn't be
> prepared if it existed.

That depends on what you mean by "dynamic columns."  The approach
taken in the tablefunc extension is to use functions which return
SETOF RECORD, which in turn need to be cast at runtime.

At least one other implementation takes two separate approaches, both
interesting at least from my point of view.

The first is to spell out all the columns in the query, which is not
"dynamic columns" in any reasonable sense, as "hand-craft one piece of
SQL whose purpose is to generate the actual pivot SQL" is not
a reasonable level of dynamic.

The second, more on point, is to specify a serialization for the rows
in the "dynamic columns" case.  Their syntax is "PIVOT XML", but I
would rather do something more like "PIVOT (SERIALIZATION XML)".

A third idea I have only roughed out feels a bit like cheating:
creating a function which returns two distinct rowtypes via REFCURSORs
or similar.  The first result, used to create a CAST, spells out the
row type of the second.

> I am wrong on that? I feel like you guys are all telling me that
> \pivot should happen on the server, but the point that it would not
> be realistic to begin with is not considered.

I think that starting the conversation again, especially at this stage
of the 9.6 cycle, is a very good thing, whatever its outcome.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types

2015-08-10 Thread Peter Geoghegan

On Wed, Aug 5, 2015 at 12:58 AM, Amit Langote
 wrote:
> I forgot mentioning one thing later yesterday. The way exclRelTlist is
> initialized, all the way in the beginning (transformOnConflictClause), is
> most probably to blame. It uses expandRelAttrs() for other valid reasons;
> but within it, expandRTE() is called with 'false' for include_dropped
> (columns). I applied the attached (perhaps ugly) fix, and it seemed to fix
> the issue. But, I guess you'll be able to come up with some better fix.

Thanks for working on this. I think you have this right, but I also
don't think we should discuss this fix outside of the context of a
wider discussion about the excluded targetlist.

What I'm going to do is roll this into my own pending patch to fix the
issue with wholerow vars, which is also down to a problem with the
excluded targetlist initially generated by calling expandRelAttrs().
Andres might want to take an alternative approach with that, so a
consolidation makes sense.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests remove working directories




On 08/10/2015 10:32 AM, Andrew Dunstan wrote:



On 08/10/2015 09:55 AM, Tom Lane wrote:

Andrew Dunstan  writes:

On 08/09/2015 08:58 PM, Michael Paquier wrote:

Sure. Attached is what I have in mind. Contrary to your version we
keep around temp paths should a run succeed after one that has failed
when running make check multiple times in a row. Perhaps it does not
matter much in practice as log files get removed at each new run but I
think that this allows a finer control in case of failure. Feel free
to have a look.

Actually, I don't think this is a very good idea. You could end up with
a whole series of opaquely named directories from a series of failing
runs. If you want to keep the directory after a failure, and want to do
another run, then rename the directory. That's what you have to do with
the main regression tests, too. My version also has the benefit of
changing exactly 3 lines in the source :-)

FWIW, I think we should keep the behavior of the TAP tests as close as
possible to the established behavior of the main regression tests.

It's certainly possible that there's room for improvement in that
benchmark behavior.  But let's first converge the test behaviors,
and then have a discussion about whether we want changes, and if
so change all the tests at the same time.



Yeah. To do that we should probably stop using File::Temp to make the 
directory, and just use a hardcoded name given to File::Path::mkpath. 
If the directory exists, we'd just remove it first.


That won't be a very big change - probably just a handful of lines.




Unfortunately, it's rather more complicated. These tests are extremely 
profligate with space and time, creating not less than 29 data 
directories in 19 temporary directories, including 14 temp directories 
in scripts alone, and not counting those in pg_rewind which puts two 
more data directories in a place which is different from all the others. 
And of course, in those directories (scripts and pg_ctl) that have 
multiple temp directories, we have no idea which one goes with which set 
of tests.


It's no wonder that these tests take an inordinate amount of time to run 
- on crake, where it takes 23 seconds for "make installcheck" to run, it 
takes 176 seconds for "make -C src/bin installcheck" to run. My bet is 
all those initdb calls account for a substantial amount of that time.


I think this needs a significant bit of reworking.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] linux sparc compile issue

On 2015-08-10 17:36:57 -0400, Tom Lane wrote:
> Waldemar Brodkorb  writes:
> > while doing regular builds via buildroot autobuilders
> > a compile problem for sparc 32bit v8 was found.
> > It seems the defines for Linux are other than for Solaris.
> 
> > Following patch fixes it for buildroot:
> > The gcc predefines for Linux are __sparc_v8__/__sparc_v7__
> 
> I've applied your suggested patch for this, but I'm a bit curious what
> version of gcc you are using; our code's been like that for a very long
> time and nobody complained before.

Hm. Could this be caused by using a new solaris studio compiler? It
apparently uses gcc as a frontend and thus might actually reach that bit
of code.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

I wrote:
> But now that you mention it, isn't that completely broken?  What pg_dump
> actually prints given this made-up data is

> REVOKE ALL ON TYPE myshell FROM PUBLIC;
> REVOKE ALL ON TYPE myshell FROM postgres;
> GRANT ALL ON TYPE myshell TO PUBLIC;

> which seems like a completely insane interpretation.  There is no way
> that dumping a type from a pre-typacl database and restoring it into
> a newer one should end up with the type's owner having no privileges
> on it.  I'm astonished that we've not gotten bug reports about that.

... of course, the reason for no field reports is that the owner still
has privileges, as she inherits them from PUBLIC.  Doesn't make this
any less broken though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] linux sparc compile issue

Waldemar Brodkorb  writes:
> while doing regular builds via buildroot autobuilders
> a compile problem for sparc 32bit v8 was found.
> It seems the defines for Linux are other than for Solaris.

> Following patch fixes it for buildroot:
> The gcc predefines for Linux are __sparc_v8__/__sparc_v7__

I've applied your suggested patch for this, but I'm a bit curious what
version of gcc you are using; our code's been like that for a very long
time and nobody complained before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] subplan variable reference / unassigned NestLoopParams

Andreas Seltenreich  writes:
> Tom Lane writes:
>> Andreas Seltenreich  writes:
>>> Tom Lane writes:
 Well, I certainly think all of these represent bugs:
 3 | ERROR:  plan should not reference subplan's variable
 2 | ERROR:  failed to assign all NestLoopParams to plan nodes

>>> These appear to be related.  The following query produces the former,
>>> but if you replace the very last reference of provider with the literal
>>> 'bar', it raises the latter error.

>> Fixed that, thanks for the test case!

> I haven't seen the former since your commit, but there recently were
> some instances of the latter.  The attached queries all throw the error
> against master at 8752bbb.

Turns out my patch only addressed some cases of the underlying issue.
I've gone back for a second swipe at it; thanks for the continued testing!

BTW, the commit hashes you've been mentioning don't seem to correspond to
anything in the master PG repo ... are you working with modified sources?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

Peter Eisentraut  writes:
> On 8/9/15 6:23 PM, Tom Lane wrote:
>> It looks to me like the reason for this is that pg_dump forces the
>> "typacl" of a type to be '{=U}' when reading the schema data for a
>> pre-9.2 type, rather than reading it as NULL (ie default permissions)
>> which would result in not printing any grant/revoke commands for
>> the object.
>> 
>> I do not see a good reason for that; quite aside from this problem,
>> it means there is one more place that knows the default permissions
>> for a type than there needs to be.  Peter, what was the rationale?

> This was probably just copied from how proacl and lanacl are handled,
> which predate typacl by quite a bit.  Maybe there was a reason in those
> days.

Hm ... I wonder whether those are well-thought-out either.

> It might also have something to do with how owner privileges are
> handled.  An explicit '{=U}' doesn't create owner privileges, unlike a
> null value in that field.  Maybe this is necessary if you dump and
> restore between databases with different user names.

But now that you mention it, isn't that completely broken?  What pg_dump
actually prints given this made-up data is

REVOKE ALL ON TYPE myshell FROM PUBLIC;
REVOKE ALL ON TYPE myshell FROM postgres;
GRANT ALL ON TYPE myshell TO PUBLIC;

which seems like a completely insane interpretation.  There is no way
that dumping a type from a pre-typacl database and restoring it into
a newer one should end up with the type's owner having no privileges
on it.  I'm astonished that we've not gotten bug reports about that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Alvyhank

Having a freeze map would be wholly unnecessary if we don't ever need to
freeze whole tables again. Freezing would still be needed on individual
blocks where an old row has been updated or deleted; a freeze map would not
help there either.

So there is no conflict, but options 2) and 3) are completely redundant if
we go for 5). After investigation, I now think 5) is achievable in 9.6, but
if I am wrong for whatever reason, we have 2) as a backstop for more go to h
ttp://www.pillenpalast.com/   



-
Kamagra http://www.pillenpalast.com/
--
View this message in context: 
http://postgresql.nabble.com/Summary-of-plans-to-avoid-the-annoyance-of-Freezing-tp5861530p5861534.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FSM versus GIN pending list bloat

2015-08-10 Thread Jeff Janes

On Tue, Aug 4, 2015 at 12:38 PM, Jeff Janes  wrote:

> On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs  wrote:
>
>> On 4 August 2015 at 06:03, Jeff Janes  wrote:
>>
>>
>>> The attached proof of concept patch greatly improves the bloat for both
>>> the insert and the update cases.  You need to turn on both features: adding
>>> the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
>>> The first of those two things could probably be adopted for real, but the
>>> second probably is not acceptable.  What is the right way to do this?
>>> Could a variant of RecordFreeIndexPage bubble the free space up `the map
>>> immediately rather than waiting for a vacuum?  It would only have to move
>>> up until it found a page with freespace already recorded in it, which the
>>> vast majority of the time would mean observing up one level and then not
>>> writing to it, assuming the pending list pages remain well clustered.
>>>
>>
>> You make a good case for action here since insert only tables with GIN
>> indexes on text are a common use case for GIN.
>>
>> Why would vacuuming the FSM be unacceptable? With a
>> large gin_pending_list_limit it makes sense.
>>
>
> But with a smallish gin_pending_list_limit (like the default 4MB) this
> could be called a lot (multiple times a second during some spurts), and
> would read the entire fsm each time.
>
>
>>
>> If it is unacceptable, perhaps we can avoid calling it every time, or
>> simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
>> 80/20 heuristic for this case.
>>
>
> Or maybe it could be passed a range of blocks which need vacuuming, so it
> concentrated on that range.
>
> But from the README file, it sounds like it is already supposed to be
> bubbling up.  I'll have to see just whats going on there when I get a
> chance.
>

Before making changes to the FSM code to make immediate summarization
possible, I decided to quantify the effect of vacuuming the entire fsm.
Going up to 5 GB of index size, the time taken to vacuum the entire FSM one
time for each GIN_NDELETE_AT_ONCE was undetectable.

Based on that, I made this patch which vacuums it one time per completed
ginInsertCleanup, which should be far less than once per
GIN_NDELETE_AT_ONCE.

I would be interested in hearing what people with very large GIN indexes
think of it.  It does seem like at some point the time needed must become
large, but from what I can tell that point is way beyond what someone is
likely to have for an index on an unpartitioned table.

I have a simple test case that inserts an array of 101 md5 digests into
each row.  With 10_000 of these rows inserted into an already indexed
table, I get 40MB for the table and 80MB for the index unpatched.  With the
patch, I get 7.3 MB for the index.

Cheers,

Jeff

gin_fast_freespace_v001.patch
Description: Binary data

gin_freespace2.pl
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Asynchronous execution on FDW

2015-08-10 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas  wrote:
> > I've marked this as rejected in the commitfest, because others are
> > working on a more general solution with parallel workers. That's still
> > work-in-progress, and it's not certain if it's going to make it into
> > 9.6, but if it does it will largely render this obsolete. We can revisit
> > this patch later in the release cycle, if the parallel scan patch hasn't
> > solved the same use case by then.
> 
> I think the really important issue for this patch is the one discussed here:
> 
> http://www.postgresql.org/message-id/ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com

I agree that it'd be great to figure out the answer to #2, but I'm also
of the opinion that we can either let the user tell us through the use
of the GUCs proposed in the patch or simply not worry about the
potential for time wastage associated with starting them all at once, as
you suggested there.

> You raised an important issue there but never really expressed an
> opinion on the points I raised, here or on the other thread.  And
> neither did anyone else except the patch author who, perhaps
> unsurprisingly, thinks it's OK.  I wish we could get more discussion
> about that.

When I read the proposal, I had the same reaction that it didn't seem
like quite the right place and it further bothered me that it was
specific to FDWs.

Perhaps not surprisingly, as I authored it, but I'm still a fan of my
proposal #1 here:

http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net

More generally, I completely agree with the position (I believe your's,
but I might be misremembering) that we want to have this async
capability independently and in addition to parallel scan.  I don't
believe one obviates the advantages of the other.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.

2015-08-10 Thread Peter Eisentraut

On 8/9/15 6:23 PM, Tom Lane wrote:
> It looks to me like the reason for this is that pg_dump forces the
> "typacl" of a type to be '{=U}' when reading the schema data for a
> pre-9.2 type, rather than reading it as NULL (ie default permissions)
> which would result in not printing any grant/revoke commands for
> the object.
> 
> I do not see a good reason for that; quite aside from this problem,
> it means there is one more place that knows the default permissions
> for a type than there needs to be.  Peter, what was the rationale?

This was probably just copied from how proacl and lanacl are handled,
which predate typacl by quite a bit.  Maybe there was a reason in those
days.

It might also have something to do with how owner privileges are
handled.  An explicit '{=U}' doesn't create owner privileges, unlike a
null value in that field.  Maybe this is necessary if you dump and
restore between databases with different user names.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: SCRAM authentication

2015-08-10 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Sat, Aug 8, 2015 at 1:23 PM, Heikki Linnakangas  wrote:
> > On 08/08/2015 04:27 PM, Robert Haas wrote:
> >> I don't see that there's any good reason to allow the same password to
> >> be stored in the catalog encrypted more than one way,
> >
> > Sure there is. If you want to be able to authenticate using different
> > mechanism, you need the same password "encrypted" in different ways. SCRAM
> > uses verifier that's derived from the password in one way, MD5
> > authentication needs an MD5 hash, and yet other protocols have other
> > requirements.
> 
> Why do we need to be able to authenticate using more than one
> mechanism?  If you have some clients that can't support SCRAM yet, you
> might as well continue using MD5 across the board until that changes.
> You're not going to get much real security out of using MD5 for some
> authentication attempts and SCRAM for other ones, and the amount of
> infrastructure we're proposing to introduce to support that is pretty
> substantial.

I agree with Josh that requiring a hard switch simply won't be
acceptable to our user community and if decide "there can only be one
running at a time" then we'll never get people to move off md5 and
therefore we wouldn't ever make real progress here.  Apologies to Josh
if I've misconstrued anything in his excellent response.

> >> and I don't think there's any good reason to introduce the PASSWORD
> >> VERIFIER terminology.  I think we should store (1) your password,
> >> either encrypted or unencrypted; and (2) the method used to encrypt
> >> it.  And that's it.
> >
> >
> > Like Joe and Stephen, I actually find it highly confusing that we call the
> > MD5 hash an "encrypted password". The term "password verifier" is fairly
> > common in the specifications of authentication mechanisms. I think we should
> > adopt it.
> 
> OK, but it sure looked from Michael's syntax description like you
> wrote PASSWORD VERIFIER (md5 'the_actual_password').   Or at least
> that was my impression from reading it, maybe I got it wrong.  If you
> want to introduce ALTER USER ... PASSWORD VERIFIER as alternative
> syntax for what we now call ALTER USER ... ENCRYPTED PASSWORD, that
> works for me.  But a plaintext password shouldn't be called a password
> verifier under the terminology you are using here, IIUC.

Apologies for my incomplete up-thread definition of a password verifier,
it was a bit off-the-cuff while I was out of the country and referred to
the specific password verifier in SCRAM, which isn't a cleartext
or encrypted password or a simple hash of it, or a salted hash.

To flip it around as a positive definition instead of the negative "it's
not X, Y or Z" which I provided up-thread, a "password verifier" is a
general term for anything that can be used to verify that the client
knows the right password, so it technically could be the cleartext
version of the password, or a simple hash of the password, or an
encrypted version of the password, or pretty much anything else that
works for the protocol.  The reason it came about was because of this
very issue that there wasn't a general term for the value- it was
referred to as a salted hash, or encrypted, or just hashed, etc, and
those terms don't apply to everything that can be used today as a
password verifier.

As such, it's actually correct usage in this case as it's a general
term, rather than the specific and incorrect term "encrypted" which we
have today.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] checkpointer continuous flushing

On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO  
wrote:
>
>Hello Andres,
>
>> You can't allocate 4GB with palloc(), it has a builtin limit against
>> allocating more than 1GB.
>
>Argh, too bad, I assumed very naively that palloc was malloc in
>disguise.

It is, but there's some layering (memory pools/contexts) on top. You can get 
huge allocations with polloc_huge.

>Then the file would be fsynced twice: if the fsync is done properly
>(data 
>have already been flushed to disk) then it would not cost much, and
>doing 
>it sometimes twice on some file would not be a big issue. The code
>could 
>also detect such event and log a warning, which would give a hint about
>
>how often it occurs in practice.

Right. At the cost of keeping track of all files...

 If the pivot element changes its identity won't the result be
>pretty much
 random?
>>>
>>> That would be a very unlikely event, given the short time spent in
>>> qsort.
>>
>> Meh, we don't want to rely on "likeliness" on such things.
>
>My main argument is that even if it occurs, and the qsort result is
>partly 
>wrong, it does not change correctness, it just mean that the actual
>writes 
>will be less in order than wished. If it occurs, one pivot separation 
>would be quite strange, but then others would be right, so the buffers 
>would be "partly sorted".

It doesn't matter for correctness today, correct. But it makes out impossible 
to rely on or too.

>Another issue I see is that even if buffers are locked within cmp, the 
>status may change between two cmp...

Sure. That's not what in suggesting. Earlier versions of the patch kept an 
array of buffer headers exactly because of that.

 I do not think that locking all 
>buffers for sorting them is an option. So on the whole, I think that 
>locking buffers for sorting is probably not possible with the simple
>(and 
>efficient) lightweight approach used in the patch.

Yes, the other version has a higher space overhead. I'm not convinced that's 
meaningful in comparison to shared buffets in space.
And rather doubtful it a loss performance wise in a loaded server. All the 
buffer headers are touched on other cores and doing the sort with indirection 
will greatly increase bus traffic.

>The good news, as I argued before, is that the order is only advisory
>to 
>help with performance, but the correctness is really that all
>checkpoint 
>buffers are written and fsync is called in the end, and does not depend
>on 
>the buffer order. That is how it currently works anyway

It's not particularly desirable to have a performance feature that works less 
well if the server is heavily and concurrently loaded. The likelihood of bogus 
sort results will increase with the churn rate in shared buffers.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: SCRAM authentication

2015-08-10 Thread Josh Berkus

On 08/09/2015 07:19 PM, Michael Paquier wrote:
>> ... during my exchange with Michael, I was thinking about the bug
>> > potential of taking the password field and multiplexing it in some way,
>> > which is significant.  There is a definite risk of "making this too
>> > complicated" and we'll need to contrast that against ease-of-migration,
>> > because complicated mechanisms tend to be less secure due to user error.

> Sure. That's why I am all in for adding a compatibility GUC or similar
> that enforces the removal of old verifier types after marking those as
> deprecated for a couple of years as there's surely a significant risk
> to keep old passwords around or bad pg_hba entries. Still we need IMO
> a way for a user to save multiple verifiers generated from a client to
> manage carefully the password verifier aging, deprecations and support
> removal.

That still falls under the heading of "possibly too complicated" though.

As I see it, there's two potential migration paths:

1. Allow multiple verifiers for each login role (Heikki's plan).

2. Set verifier type per login role.

(2) has the advantage of not requiring a bunch of new scaffolding
(although it will require a little) and thus being less likely to
introduce new bugs.  It also doesn't require adding new catalog
structures which are really only needed for the migration period, and
after which will become a wart (i.e. having multiple verifiers per login
role).

In real migration terms, though, (2) has some major drawbacks in terms
of making migration much harder.

a) it won't work for Heroku and other 1-login-per-database hosting.

b) moving to multiple roles from single roles per app is a painful
process currently.

For (a), one could argue that these are good candidates for "all at
once" migrations, and that moving to SCRAM will depend on the host
supporting it.  Someone from Heroku could speak up here.

For (b), there are a lot of things we could do to make migrating to a
multiple-role infra much easier for users, which would continue to be
useful even after the migration to SCRAM is history:

* remove the role requirement for ALTER DEFAULT PRIVILEGES, and ...

* add ALTER ROLE ___ ALTER DEFAULT OWNER, a command which sets the
default owner of newly created objects by that login role to a different
role of which they are a member.  Alternatively, add a way to make a
default SET ROLE whenever a login role logs in.

These two changes, or changes like them that serve the same purpose,
would allow us to prescribe the following migration path for most users:

1. add a new login role which is a member of the old login role and uses
SCRAM;

2. set the defaults for that role so that its objects and permissions
belong to the parent role;

3. move all applications to using SCRAM and the new role;

4. disable logins on the old, parent role.

It's currently (2) which is painfully difficult, and could be made less
so via the two features recommended above.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Stephen Frost

Zhaomo,

* Zhaomo Yang (zmp...@gmail.com) wrote:
> This thread has a pretty thorough discussion of pros and cons of applying
> SELECT policy to other commands. Besides what have been mentioned, I think
> there is another potential pro: we can enable references to pseudorelations
> OLD and NEW in predicates. Now, for UPDATE, the references to the target
> table in USING clause are actually references to OLD and the references in
> WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK
> are combined by AND, so we cannot have predicates like

For my part, I find that the simplicity of having USING only ever refer
to existing records and WITH CHECK only ever refer to records being
added to be good and I'm concerned that this approach would be
confusing.  If no NEW or OLD is used, what happens?  Or would you have
to always specify OLD/NEW for UPDATE, and then what about for the other
policies, and the FOR ALL policies?

>   foo(NEW) > 1 OR bar(OLD) > 1   (combine predicates referencing OLD
> and NEW by an operation other than AND)

Your statement that this can't be done with the existing policy approach
is incorrect, or I've misunderstood what you mean above.  This could be
accomplished with "USING (bar > 1)" and "WITH CHECK (foo > 1)", no?
Your sentence above that "USING and WITH CHECK are combined by AND"
isn't correct either- they're independent and are therefore really OR'd.
If they were AND'd then the new record would have to pass both USING and
WITH CHECK policies.

>   NEW.id <> OLD.id(reference both in the same expression)

Here you're correct that this isn't something the existing approach to
UPDATE policies can support.  I don't intend to simply punt on this, but
I will point out that this particular requirement can be handled in a
trigger, if desired.

Further, I'm not sure that I see how this would work in a case where you
have the SELECT policy (which clearly could only refer to OLD) applied
first, as you suggest?

> If we apply SELECT policy to other commands, we only need one predicate for
> INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like
> predicates for triggers and rules. For UPDATE and DELETE, the predicate of
> SELECT will be applied first (when the target table is scanned) to ensure no
> information leakage and their own predicate will be applied later. This
> doesn't change much for INSERT and DELETE, but it gives users more
> flexibility when they set predicates for UPDATE.

OLD and NEW are only applicable for the UPDATE case and requiring those
to be used for the other policies is adding complexity for a pretty
narrow use-case, as is combining the SELECT USING policy with the USING
(or any) policy for the other commands.

Further, it clearly reduces the range of options for UPDATE as it means
that you can't do blind updates or deletes.  Perhaps that's sensible,
but we could do that by simply AND'ing the SELECT USING policy with the
other command USING policy, but Dean was against that idea up-thread, as
am I, because it adds complexity, and it would further mean that neither
of your suggested predicates above would be supported, as I explained
above.

Also, as far as I see, none of this really amounts to anything different
with regard to RETURNING than the previous proposal of simply applying
the SELECT USING policy to the records first, and the records returned
could still not be allowed by the SELECT policy as they would be the
results of the update, which could still pass the UPDATE policy for new
records.  Applying the SELECT USING policy against the RETURNING records
strikes me, more and more, as just complexity which will cause more
confusion than it helps anyone.  We definitely need to explain how the
USING clause works for the commands and how that impacts RETURNING, as
I've mentioned in the 9.5 open items list, and I'm planning to tackle
that this week.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] fix oversight converting buf_id to Buffer

2015-08-10 Thread Qingqing Zhou

Attached patch fixes oversights converting buf_id to Buffer in
PrintBufferDescs() and InvalidateBuffer(). Especially for the latter,
the reason we haven't seen any reports of the issue might be that it
needs certain concurrent conditions to be true.

Along the line, it also changes all direct maths against buf_id to use
BufferDescriptorGetBuffer() instead.

Regards,
Qingqing
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index e4b2558..2e9a7c7
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*** retry:
*** 1273,1279 
UnlockBufHdr(buf);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
!   if (GetPrivateRefCount(buf->buf_id) > 0)
elog(ERROR, "buffer is pinned in InvalidateBuffer");
WaitIO(buf);
goto retry;
--- 1273,1279 
UnlockBufHdr(buf);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
!   if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) > 0)
elog(ERROR, "buffer is pinned in InvalidateBuffer");
WaitIO(buf);
goto retry;
*** ReleaseAndReadBuffer(Buffer buffer,
*** 1426,1441 
  static bool
  PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
  {
!   int b = buf->buf_id;
boolresult;
PrivateRefCountEntry *ref;
  
!   ref = GetPrivateRefCountEntry(b + 1, true);
  
if (ref == NULL)
{
ReservePrivateRefCountEntry();
!   ref = NewPrivateRefCountEntry(b + 1);
  
LockBufHdr(buf);
buf->refcount++;
--- 1426,1441 
  static bool
  PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
  {
!   Buffer  b = BufferDescriptorGetBuffer(buf);
boolresult;
PrivateRefCountEntry *ref;
  
!   ref = GetPrivateRefCountEntry(b, true);
  
if (ref == NULL)
{
ReservePrivateRefCountEntry();
!   ref = NewPrivateRefCountEntry(b);
  
LockBufHdr(buf);
buf->refcount++;
*** PinBuffer(volatile BufferDesc *buf, Buff
*** 1460,1467 
  
ref->refcount++;
Assert(ref->refcount > 0);
!   ResourceOwnerRememberBuffer(CurrentResourceOwner,
!   
BufferDescriptorGetBuffer(buf));
return result;
  }
  
--- 1460,1466 
  
ref->refcount++;
Assert(ref->refcount > 0);
!   ResourceOwnerRememberBuffer(CurrentResourceOwner, b);
return result;
  }
  
*** PinBuffer(volatile BufferDesc *buf, Buff
*** 1489,1511 
  static void
  PinBuffer_Locked(volatile BufferDesc *buf)
  {
!   int b = buf->buf_id;
PrivateRefCountEntry *ref;
  
/*
 * As explained, We don't expect any preexisting pins. That allows us to
 * manipulate the PrivateRefCount after releasing the spinlock
 */
!   Assert(GetPrivateRefCountEntry(b + 1, false) == NULL);
  
buf->refcount++;
UnlockBufHdr(buf);
  
!   ref = NewPrivateRefCountEntry(b + 1);
ref->refcount++;
  
!   ResourceOwnerRememberBuffer(CurrentResourceOwner,
!   
BufferDescriptorGetBuffer(buf));
  }
  
  /*
--- 1488,1509 
  static void
  PinBuffer_Locked(volatile BufferDesc *buf)
  {
!   Buffer  b = BufferDescriptorGetBuffer(buf);
PrivateRefCountEntry *ref;
  
/*
 * As explained, We don't expect any preexisting pins. That allows us to
 * manipulate the PrivateRefCount after releasing the spinlock
 */
!   Assert(GetPrivateRefCountEntry(b, false) == NULL);
  
buf->refcount++;
UnlockBufHdr(buf);
  
!   ref = NewPrivateRefCountEntry(b);
ref->refcount++;
  
!   ResourceOwnerRememberBuffer(CurrentResourceOwner, b);
  }
  
  /*
*** static void
*** 1520,1533 
  UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
  {
PrivateRefCountEntry *ref;
  
/* not moving as we're likely deleting it soon anyway */
!   ref = GetPrivateRefCountEntry(buf->buf_id + 1, false);
Assert(ref != NULL);
  
if (fixOwner)
!   ResourceOwnerForgetBuffer(CurrentResourceOwner,
! 
BufferDescriptorGetBuffer(buf));
  
Assert(ref->refcount > 0);
ref->refcount--;
--- 1518,1531 
  UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
  {
PrivateRefCountEntry *ref;
+   Buffer  b = BufferDescriptorGetBuffer(b

Re: [HACKERS] checkpointer continuous flushing




Ok ok, I stop resisting... I'll have a look.


Here is a v7 a&b version which uses shared memory instead of palloc.

--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e900dcc..1cec243 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2454,6 +2454,28 @@ include_dir 'conf.d'
   
  
 
+ 
+  checkpoint_sort (bool)
+  
+   checkpoint_sort configuration parameter
+  
+  
+  
+   
+Whether to sort buffers before writting them out to disk on checkpoint.
+For a HDD storage, this setting allows to group together
+neighboring pages written to disk, thus improving performance by
+reducing random write activity.
+This sorting should have limited performance effects on SSD backends
+as such storages have good random write performance, but it may
+help with wear-leveling so be worth keeping anyway.
+The default is on.
+This parameter can only be set in the postgresql.conf
+file or on the server command line.
+   
+  
+ 
+
  
   checkpoint_warning (integer)
   
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index e3941c9..f538698 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -546,6 +546,18 @@
   
 
   
+   When hard-disk drives (HDD) are used for terminal data storage
+allows to sort pages
+   so that neighboring pages on disk will be flushed together by
+   chekpoints, reducing the random write load and improving performance.
+   If solid-state drives (SSD) are used, sorting pages induces no benefit
+   as their random write I/O performance is good: this feature could then
+   be disabled by setting checkpoint_sort to off.
+   It is possible that sorting may help with SSD wear leveling, so it may
+   be kept on that account.
+  
+
+  
The number of WAL segment files in pg_xlog directory depends on
min_wal_size, max_wal_size and
the amount of WAL generated in previous checkpoint cycles. When old log
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 68e33eb..bee38ab 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7995,11 +7995,13 @@ LogCheckpointEnd(bool restartpoint)
 sync_secs,
 total_secs,
 longest_secs,
+sort_secs,
 average_secs;
 	int			write_usecs,
 sync_usecs,
 total_usecs,
 longest_usecs,
+sort_usecs,
 average_usecs;
 	uint64		average_sync_time;
 
@@ -8030,6 +8032,10 @@ LogCheckpointEnd(bool restartpoint)
 		CheckpointStats.ckpt_end_t,
 		&total_secs, &total_usecs);
 
+	TimestampDifference(CheckpointStats.ckpt_sort_t,
+		CheckpointStats.ckpt_sort_end_t,
+		&sort_secs, &sort_usecs);
+
 	/*
 	 * Timing values returned from CheckpointStats are in microseconds.
 	 * Convert to the second plus microsecond form that TimestampDifference
@@ -8048,8 +8054,8 @@ LogCheckpointEnd(bool restartpoint)
 
 	elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
 		 "%d transaction log file(s) added, %d removed, %d recycled; "
-		 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
-		 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
+		 "sort=%ld.%03d s, write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s;"
+		 " sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		 "distance=%d kB, estimate=%d kB",
 		 restartpoint ? "restartpoint" : "checkpoint",
 		 CheckpointStats.ckpt_bufs_written,
@@ -8057,6 +8063,7 @@ LogCheckpointEnd(bool restartpoint)
 		 CheckpointStats.ckpt_segs_added,
 		 CheckpointStats.ckpt_segs_removed,
 		 CheckpointStats.ckpt_segs_recycled,
+		 sort_secs, sort_usecs / 1000,
 		 write_secs, write_usecs / 1000,
 		 sync_secs, sync_usecs / 1000,
 		 total_secs, total_usecs / 1000,
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 3ae2848..ec2436f 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -65,7 +65,8 @@ void
 InitBufferPool(void)
 {
 	bool		foundBufs,
-foundDescs;
+foundDescs,
+foundCpid;
 
 	/* Align descriptors to a cacheline boundary. */
 	BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN(
@@ -77,10 +78,14 @@ InitBufferPool(void)
 		ShmemInitStruct("Buffer Blocks",
 		NBuffers * (Size) BLCKSZ, &foundBufs);
 
-	if (foundDescs || foundBufs)
+	CheckpointBufferIds = (int *)
+		ShmemInitStruct("Checkpoint BufferIds",
+		NBuffers * sizeof(int), &foundCpid);
+
+	if (foundDescs || foundBufs || foundCpid)
 	{
-		/* both should be present or neither */
-		Assert(foundDescs && foundBufs);
+		/* all should be present or neither */
+		Assert(foundDescs && foundBufs && foundCpid);
 		/* note: this path is only taken in EXEC_BACKEND case */
 	}
 	else
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e4b25587..ba5298d 100644
--- a/src/backend/storage/buffer/bufmgr.

[HACKERS] linux sparc compile issue

2015-08-10 Thread Waldemar Brodkorb

Hi,

while doing regular builds via buildroot autobuilders
a compile problem for sparc 32bit v8 was found.
It seems the defines for Linux are other than for Solaris.

Following patch fixes it for buildroot:

The gcc predefines for Linux are __sparc_v8__/__sparc_v7__

Signed-off-by: Waldemar Brodkorb 

diff -Nur postgresql-9.4.4.orig/src/include/storage/s_lock.h 
postgresql-9.4.4/src/include/storage/s_lock.h
--- postgresql-9.4.4.orig/src/include/storage/s_lock.h  2015-06-09 
21:29:38.0 +0200
+++ postgresql-9.4.4/src/include/storage/s_lock.h   2015-08-09 
19:57:06.0 +0200
@@ -420,12 +420,12 @@
 :  "=r"(_res), "+m"(*lock)
 :  "r"(lock)
 :  "memory");
-#if defined(__sparcv7)
+#if defined(__sparcv7) || defined(__sparc_v7__)
/*
 * No stbar or membar available, luckily no actually produced hardware
 * requires a barrier.
 */
-#elif defined(__sparcv8)
+#elif defined(__sparcv8) || defined(__sparc_v8__)
/* stbar is available (and required for both PSO, RMO), membar isn't */
__asm__ __volatile__ ("stbar \n":::"memory");
 #else
@@ -438,13 +438,13 @@
return (int) _res;
 }
 
-#if defined(__sparcv7)
+#if defined(__sparcv7) || defined(__sparc_v7__)
 /*
  * No stbar or membar available, luckily no actually produced hardware
  * requires a barrier.
  */
 #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0)
-#elif defined(__sparcv8)
+#elif defined(__sparcv8) || defined(__sparc_v8__)
 /* stbar is available (and required for both PSO, RMO), membar isn't */
 #define S_UNLOCK(lock) \
 do \


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing



Hello Andres,


You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.


Argh, too bad, I assumed very naively that palloc was malloc in disguise.


[...]

Well, then everytime the checkpointer is restarted.


Hm...


The point is that it's done at postmaster startup, and we're pretty much
guaranteed that the memory will availabl.e.


Ok ok, I stop resisting... I'll have a look.

Would it also fix the 1 GB palloc limit on the same go? I guess so...



That reasoning makes it impossible to move the fsyncing of files into the
loop (whenever we move to a new file). That's not nice.


I do not see why.


Because it means that the sorting isn't necessarily correct. I.e. we
can't rely on it to determine whether a file has already been fsynced.


Ok, I understand your point.

Then the file would be fsynced twice: if the fsync is done properly (data 
have already been flushed to disk) then it would not cost much, and doing 
it sometimes twice on some file would not be a big issue. The code could 
also detect such event and log a warning, which would give a hint about 
how often it occurs in practice.



Hm. Is that actually the case for our qsort implementation?


I think that it is hard to write a qsort which would fail that. That would
mean that it would compare the same items twice, which would be inefficient.


What? The same two elements aren't frequently compared pairwise with 
each other, but of course an individual element is frequently compared 
with other elements.


Sure.

Consider what happens when the chosen pivot element changes its identity 
after already dividing half. The two partitions will not be divided in 
any meaning full way anymore. I don't see how this will results in a 
meaningful sort.


It would be partly meaningful, which is enough for performance, and does 
not matter for correctness: currently buffers are not sorted at all and it 
works, even if it does not work well.



If the pivot element changes its identity won't the result be pretty much
random?


That would be a very unlikely event, given the short time spent in
qsort.


Meh, we don't want to rely on "likeliness" on such things.


My main argument is that even if it occurs, and the qsort result is partly 
wrong, it does not change correctness, it just mean that the actual writes 
will be less in order than wished. If it occurs, one pivot separation 
would be quite strange, but then others would be right, so the buffers 
would be "partly sorted".


Another issue I see is that even if buffers are locked within cmp, the 
status may change between two cmp... I do not think that locking all 
buffers for sorting them is an option. So on the whole, I think that 
locking buffers for sorting is probably not possible with the simple (and 
efficient) lightweight approach used in the patch.


The good news, as I argued before, is that the order is only advisory to 
help with performance, but the correctness is really that all checkpoint 
buffers are written and fsync is called in the end, and does not depend on 
the buffer order. That is how it currently works anyway.


If you block on this then I'll put a heavy weight approach, but that would 
be a waste of memory in my opinion, hence my argumentation for the 
lightweight approach.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Josh Berkus

On 08/10/2015 10:31 AM, Simon Riggs wrote:
> Freezing is not a necessary pre-condition for either of those things, I
> am happy to say. There is confusion here because for ( 1 ) the shrink
> was performed after freezing, but when you have access to the epoch
> there is no need for exhaustive freezing - only in special cases, as
> noted. If we are lucky those special cases will mean a massive reduction
> in I/O. For ( 2 ) a normal VACUUM is sufficient and as Robert observes,
> maybe just HOT is enough.

Yeah, saw your explanation on this on the other thread.  Good point.

Question: does regular vacuum update the visibility map in the same way
vacuum freeze does?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2015-08-10 Thread Jeff Janes

On Thu, Oct 30, 2014 at 5:30 AM, Fujii Masao  wrote:

> On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita
>  wrote:
>
> > +   {
> > +   {"pending_list_cleanup_size", PGC_USERSET,
> > CLIENT_CONN_STATEMENT,
> > +   gettext_noop("Sets the maximum size of the
> pending
> > list for GIN index."),
> > +NULL,
> > +   GUC_UNIT_KB
> > +   },
> > +   &pending_list_cleanup_size,
> > +   4096, 0, MAX_KILOBYTES,
> > +   NULL, NULL, NULL
> > +   },
> >
> > ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?
>
> Yes if the pending list always exists in the memory. But not, IIUC.
> Thought?
>
> > Also why not set min to 64, not to 0, in accoradance with that of
> work_mem?
>
> I'm OK to use 64. But I just chose 0 because I could not think of any
> reasonable
> reason why 64k is suitable as the minimum size of the pending list.
> IOW, I have no idea about whether it's reasonable to use the min value of
> work_mem as the min size of the pending list.
>

I know I am late to the party here, but would like to have the minimum be
0, not 64.  As long as by zero, it means it doesn't insert anything into
the pending list, rather than inserting and promptly cleaning it up.

The reason for this is that if I am trying to decide what
pending_list_cleanup_size I want to set for the index in the indexes
storage parameters, the way to find that out is try a bunch of different
ones through the guc while the index is still at the default of no
overriding storage parameter.  It would be nice to try the fastupdate=off
alternative (i.e. the same as pending_list_cleanup_size=0) without having
to change the index itself and change the syntax used in the testing.

Cheers,

Jeff

Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

2015-08-10 Thread Bruce Momjian

On Fri, Jun 19, 2015 at 07:10:40PM +0300, Marti Raudsepp wrote:
> Hi list

Sorry I am just getting this report.  Thanks to the people who "stalled"
for me.

> One of my databases failed to upgrade successfully and produced this error in
> the copying phase:
> 
> error while copying relation "pg_catalog.pg_largeobject" ("/srv/ssd/
> PG_9.3_201306121/1/12023" to "/PG_9.4_201409291/1/12130"): No such file or
> directory

As with all good bug reports, there are multiple bugs here.  ;-)  Notice
the second path has an invalid prefix:  "/PG_9.4_201409291/1/12130" ---
obviously something is seriously wrong.  The failure of pg_dumpall to
move 'postgres' and 'template1' databases to the new tablespace cannot
explain that, i.e. I could understand pg_upgrade looking for
pg_largeobject in the default data directory, or in the new one, but not
in a path that doesn't even exist.  pg_upgrade tracks old and new
tablespaces separately, so there is obviously something wrong.

And I found it, patch attached.  The pg_upgrade code was testing for the
tablespace of the _old_ object, then assigning the old and _new_
tablespaces based on that.  The first patch fixes that, and should be
backpatched to all supported pg_upgrade versions.

> Turns out this happens when either the "postgres" or "template1" databases 
> have
> been moved to a non-default tablespace. pg_dumpall does not dump attributes
> (such as tablespace) for these databases; pg_upgrade queries the new cluster
> about the tablespace for these relations and builds a broken destination path
> for the copy/link operation.
> 
> The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE
> commands for these from pg_dumpall. Previously a --globals-only dump didn't
> generate psql \connect commands, but you can't run SET TABLESPACE from within
> the same database you're connected to. So to move "postgres", it needs to
> connect to "template1" and vice versa. That seems fine for the purpose of
> pg_upgrade which can assume a freshly created cluster with both databases
> intact.

Yes, seems like a good solution.

> I have implemented a proof of concept patch for this. Currently I'm only
> tackling the binary upgrade failure and not general pg_dumpall.
> 
> Alternatively, we could allow SET TABLESPACE in the current database, which
> seems less ugly to me. A code comment says "Obviously can't move the tables of
> my own database", but it's not obvious to me why. If I'm the only connected
> backend, it seems that any caches and open files could be invalidated. But I
> don't know how big of an undertaking that would be.

Your submitted patch, attached, also looks good, and should be
backpatched.  My only question is whether this should be for all runs of
pg_dumpall, not just in binary upgrade mode.  Comments?

Once we agree on this, I will apply these to all back branches and run
tests by moving template1 before the upgrade to make sure it works for
all PG versions.

> pg_upgrade) misses:
> * Nothing at all is dumped for the template0 database, although ACLs, settings
> and the tablespace can be changed by the user

Uh, we _assume_ no one is connecting to template0, but you are right
people can change things _about_ template0.  It would be nice to
preserve those.

> * template1 & postgres databases retain ACLs and settings, but not attributes
> like TABLESPACE or CONNECTION LIMIT. Other attributes like LC_COLLATE and
> LC_CTYPE can't be changed without recreating the DB, but those don't  matter
> for the pg_upgrade case anyway.
> 
> It seems those would be good material for another patch?

Agreed.  I am not sure how we have gotten this far with so few
complaints about this problem.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
new file mode 100644
index e158c9f..390fc62
*** a/src/bin/pg_upgrade/info.c
--- b/src/bin/pg_upgrade/info.c
*** create_rel_filename_map(const char *old_
*** 140,145 
--- 140,146 
  		const RelInfo *old_rel, const RelInfo *new_rel,
  		FileNameMap *map)
  {
+ 	/* Someday the old/new tablespaces might not match, so handle it. */
  	if (strlen(old_rel->tablespace) == 0)
  	{
  		/*
*** create_rel_filename_map(const char *old_
*** 147,162 
  		 * exist in the data directories.
  		 */
  		map->old_tablespace = old_data;
- 		map->new_tablespace = new_data;
  		map->old_tablespace_suffix = "/base";
- 		map->new_tablespace_suffix = "/base";
  	}
  	else
  	{
  		/* relation belongs to a tablespace, so use the tablespace location */
  		map->old_tablespace = old_rel->tablespace;
- 		map->new_tablespace = new_rel->tablespace;
  		map->old_tablespace_suffix = old_cluster.tablespace_suffix;
  		map->new_tablespace_suffix = new_cluster.tablespace_suffix;
  	}

--- 148,171 
  		 * exist in the data directories.
  		 */
  		map->old_table

Re: [HACKERS] Commitfest remaining "Needs Review" items




* extends pgbench expressions with functions


Robert signed up as reviewer for this a long time ago. Ping, do you still 
plan to review this?


I seem to recall that I nominated Robert when adding the patch, because
this is an extension of his expression patch. So the "sign up" is really
just a hint.


* checkpoint continuous flushing


Per discussion on that thread, let's just do the sorting part now. Needs some 
cleanup, but it's almost there.


Given the performance improvements (yes some tps, but also latency), I 
think that the full patch deserves consideration.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Simon Riggs

On 10 August 2015 at 18:02, Josh Berkus  wrote:

> There's a lesser version of this item which remains relevant unless we
> implement (5).  That is, currently the same autovacuum_vaccuum_delay
> (AVVD) applies to regular autovacuums as does to anti-wraparound
> autovacuums.  If the user has set AV with a high delay, this means that
> anti-wraparound AV may never complete.  For that reason, we ought to
> have a separate parameter for AVVD, which defaults to a lower number
> (like 5ms), or even to zero.
>
> Of course, if we implement (5), that's not necessary, since AV will
> never trigger an anti-wraparound freeze.
>

Good idea.

> > Having a freeze map would be wholly unnecessary if we don't ever need to
> > freeze whole tables again. Freezing would still be needed on individual
> > blocks where an old row has been updated or deleted; a freeze map would
> > not help there either.
> >
> > So there is no conflict, but options 2) and 3) are completely redundant
> > if we go for 5). After investigation, I now think 5) is achievable in
> > 9.6, but if I am wrong for whatever reason, we have 2) as a backstop.
>
> It's not redundant.  Users may still want to freeze for two reasons:
>
> 1. to shrink the clog and multixact logs
>
> 2. to support INDEX-ONLY SCAN
>

Freezing is not a necessary pre-condition for either of those things, I am
happy to say. There is confusion here because for ( 1 ) the shrink was
performed after freezing, but when you have access to the epoch there is no
need for exhaustive freezing - only in special cases, as noted. If we are
lucky those special cases will mean a massive reduction in I/O. For ( 2 ) a
normal VACUUM is sufficient and as Robert observes, maybe just HOT is
enough.

In the new world, the clog can be shrunk when everything has been hinted.
Given that is possible with just a normal VACUUM, I think the new
anti-freeze design (hey, we need a name, right?) will mean the clog
actually stays smaller in most cases than it does now.

> In both of those cases, having a freeze map would speed up the manual
> vacuum freeze considerably.  Otherwise, we're just punting on the
> problem, and making it worse for users who wait too long.
>

There would be no further need for the VACUUM FREEZE command. It would do
nothing desirable.

> Now, it might still be the case that the *overhead* of a freeze map is a
> bad tradeoff if we don't have to worry about forced wraparound.  But
> that's a different argument.
>

I myself was in favour of the freeze map solution for some time, but I'm
not anymore. Thank discussions at Pgcon slowly working their way into my
brain.

> BTW, has it occured to anyone that implementing XID epoch headers is
> going to mean messing with multixact logs again?  Just thought I'd open
> a papercut and pour some lemon juice on it.
>

 I doubt we have seen the last of that pain, but its not my fingers on the
chopping board, so squeeze all you want. ;-)

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] checkpointer continuous flushing

On 2015-08-10 19:07:12 +0200, Fabien COELHO wrote:
> I think that there is no issue with the current shared_buffers limit. I
> could allocate and use 4 GB on my laptop without problem. I added a cast to
> ensure that unsigned int are used for the size computation.

You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.

> >>+ /* + * Lazy allocation: this function is called through the
> >>checkpointer, + * but also by initdb. Maybe the allocation could be
> >>moved to the callers. + */ + if (CheckpointBufferIds == NULL) +
> >>AllocateCheckpointBufferIds(); +
> >>
> >
> >I don't think it's a good idea to allocate this on every round.
> >That just means a lot of page table entries have to be built and torn down
> >regularly. It's not like checkpoints only run for 1% of the time or such.
> 
> Sure. It is not allocated on every round, it is allocated once on the first
> checkpoint, the variable tested is static. There is no free. Maybe
> the allocation could be moved to the callers, though.

Well, then everytime the checkpointer is restarted.

> >FWIW, I still think it's a much better idea to allocate the memory once
> >in shared buffers.
> 
> Hmmm. The memory does not need to be shared with other processes?

The point is that it's done at postmaster startup, and we're pretty much
guaranteed that the memory will availabl.e.

> >It's not like that makes us need more memory overall, and it'll be huge
> >page allocations if configured. I also think that sooner rather than later
> >we're going to need more than one process flushing buffers, and then it'll
> >need to be moved there.
> 
> That is an argument. I think that it could wait for the need to actually
> arise.

Huge pages are used today.

> >>+   /*
> >>+* Sort buffer ids to help find sequential writes.
> >>+*
> >>+* Note: buffers are not locked in anyway, but that does not matter,
> >>+* this sorting is really advisory, if some buffer changes status during
> >>+* this pass it will be filtered out later.  The only necessary property
> >>+* is that marked buffers do not move elsewhere.
> >>+*/
> >
> >That reasoning makes it impossible to move the fsyncing of files into the
> >loop (whenever we move to a new file). That's not nice.
> 
> I do not see why.

Because it means that the sorting isn't necessarily correct. I.e. we
can't rely on it to determine whether a file has already been fsynced.

> >> Also, qsort implementation
> >>+* should be resilient to occasional contradictions (cmp(a,b) != 
> >>-cmp(b,a))
> >>+* because of these possible concurrent changes.
> >
> >Hm. Is that actually the case for our qsort implementation?
> 
> I think that it is hard to write a qsort which would fail that. That would
> mean that it would compare the same items twice, which would be inefficient.

What? The same two elements aren't frequently compared pairwise with
each other, but of course an individual element is frequently compared
with other elements. Consider what happens when the chosen pivot element
changes its identity after already dividing half. The two partitions
will not be divided in any meaning full way anymore. I don't see how
this will results in a meaningful sort.

> >If the pivot element changes its identity won't the result be pretty much
> >random?
> 
> That would be a very unlikely event, given the short time spent in
> qsort.

Meh, we don't want to rely on "likeliness" on such things.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Daniel Verite

Tom Lane wrote:

> I'm not sure how pushing it out to psql makes that better.  There is
> no way to do further processing on something that psql has printed,
> so you've punted on solving that issue just as much if not more.

It's the same spirit as \x : the only thing it achieves is better
readability in certain cases, I don't expect more from it.
But I admit that \pivot would be much more of a "niche use case"
than \x

> Besides which, psql is not all that great for looking at very wide tables,
> so I'm not sure that this would be very useful for dynamic column sets
> anyway.

I use \x all the time with wide tables and \x and \pivot happen to play
out well together (In fact I was pleasantly surprised by this)

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump and search_path

2015-08-10 Thread Steve Thames

I earliest reference I found to this issue is here
  and refers to the search_path being arbitrarily set in
the file created by pg_dump. This is apparently still the case in 9.4.

 

I found this issue because I use SERIAL/BIGSERIAL columns and when I created
schema-specific tables in a schema other than the first listed in
search_path the nextval() sequence references were schema-qualified.

 

When I created a backup file with pg_dump and then restored using psql, the
nextval() sequence references were no longer schema-qualified because the
backup file set my table schema as the first schema in search_path. I saw
the same result with pg_restore.

 

While the results of \d testschema.testtable shows the schema-qualified
sequence name in nextval():

 

\d testschema.testtable;
Table "testschema.testtable"
 Column |  Type  | Modifiers

++--
-
 id | integer| not null default
nextval('testschema.testtable_id_seq'::regclass)

 

The actual default read from pg_attrdef does not:

 

   SELECT a.attnum, n.nspname, c.relname, d.adsrc AS default_value 

 FROM pg_attribute AS a 

 JOIN pg_class AS c ON a.attrelid = c.oid 

 JOIN pg_namespace AS n ON c.relnamespace = n.oid 

LEFT JOIN pg_attrdef   AS d ON d.adrelid  = c.oid AND d.adnum = a.attnum


WHERE a.attnum > 0   

  AND n.nspname = 'testschema'  

  AND c.relname = 'testtable';

 

attnum |  nspname   |  relname  | default_value 
++---+---
  1 | testschema | testtable | nextval('testtable_id_seq'::regclass)
  2 | testschema | testtable | 

 

This insistency is described here
 . 

 

This is not a documented behavior-at least I couldn't find it and I searched
quite a bit. There was no indication to me that when I run pg_dump it will
do something more than I asked it to do and it took me a while to figure out
why. I solved the problem by setting the search_path as pg_dump does when
creating the database so now the restore does not create a different
database than I did.

 

Certainly it would seem a bug that \d and a direct read from pg_attrdef give
different results even though pg_dump determining on its own what the
search_path should be is no doubt an intended behavior. But it seems to me
this should be an option. I expected pg_dump to do what I asked it to do and
when it did something other than that it was quite a headache.

 

What's more, I like schema-qualified references. Schemas are an effective
database organization tool and I teach my people to use them and not depend
on the search path as doing so leads to sloppy and inconsistent thinking as
well as coding.

 

Please consider making the arbitrary determination of search_path by pg_dump
an optional behavior. Or better yet, just have it generate a backup that
accurately reflects the database it is backing up.

 

BTW, I am a huge fan of PostgreSQL.

Cheers!

Re: [HACKERS] [patch] A \pivot command for psql

2015-08-10 Thread Daniel Verite

David Fetter wrote:

> I'm working up a proposal to add (UN)PIVOT support to the back-end.

I was under the impression that a server-side PIVOT *with dynamic
columns* was just unworkable as an SQL query, because it couldn't
be prepared if it existed.

I am wrong on that? I feel like you guys are all telling me that \pivot
should happen on the server, but the point that it would not be
realistic to begin with is not considered.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing



Hello Andres,

Thanks for your comments. Some answers and new patches included.


+ /*
+ * Array of buffer ids of all buffers to checkpoint.
+ */
+static int *CheckpointBufferIds = NULL;


Should be at the beginning of the file. There's a bunch more cases of that.


done.


+/* Compare checkpoint buffers
+ */
+static int bufcmp(const int * pa, const int * pb)
+{
+   BufferDesc
+   *a = GetBufferDescriptor(*pa),
+   *b = GetBufferDescriptor(*pb);


This definitely needs comments about ignoring the normal buffer header
locking.


Added.


Why are we ignoring the database directory? I doubt it'll make a huge
difference, but grouping metadata affecting operations by directory
helps.


I wanted to do the minimal comparisons to order buffers per file, so I 
skipped everything else. My idea of a checkpoint is a lot of data in a few 
files (at least compared to the data...), so I do not think that it is 
worth it. I may be proven wrong!



+static void
+AllocateCheckpointBufferIds(void)
+{
+   /* Safe worst case allocation, all buffers belong to the checkpoint...
+* that is pretty unlikely.
+*/
+   CheckpointBufferIds = (int *) palloc(sizeof(int) * NBuffers);
+}


(wrong comment style...)


Fixed.

Heikki, you were concerned about the size of the allocation of this, 
right? I don't think it's relevant - we used to allocate an array of 
that size for the backend's private buffer pin array until 9.5, so in 
theory we should be safe agains that. NBuffers is limited to INT_MAX/2 
in guc.ċ, which ought to be sufficient?


I think that there is no issue with the current shared_buffers limit. I 
could allocate and use 4 GB on my laptop without problem. I added a cast 
to ensure that unsigned int are used for the size computation.


+ /* + * Lazy allocation: this function is called through the 
checkpointer, + * but also by initdb. Maybe the allocation could be 
moved to the callers. + */ + if (CheckpointBufferIds == NULL) + 
AllocateCheckpointBufferIds(); +




I don't think it's a good idea to allocate this on every round.
That just means a lot of page table entries have to be built and torn 
down regularly. It's not like checkpoints only run for 1% of the time or 
such.


Sure. It is not allocated on every round, it is allocated once on the 
first checkpoint, the variable tested is static. There is no free. Maybe

the allocation could be moved to the callers, though.


FWIW, I still think it's a much better idea to allocate the memory once
in shared buffers.


Hmmm. The memory does not need to be shared with other processes?

It's not like that makes us need more memory overall, and it'll be huge 
page allocations if configured. I also think that sooner rather than 
later we're going to need more than one process flushing buffers, and 
then it'll need to be moved there.


That is an argument. I think that it could wait for the need to actually 
arise.



+   /*
+* Sort buffer ids to help find sequential writes.
+*
+* Note: buffers are not locked in anyway, but that does not matter,
+* this sorting is really advisory, if some buffer changes status during
+* this pass it will be filtered out later.  The only necessary property
+* is that marked buffers do not move elsewhere.
+*/


That reasoning makes it impossible to move the fsyncing of files into 
the loop (whenever we move to a new file). That's not nice.


I do not see why. Moving rsync ahead is definitely an idea that you 
already pointed out, I have given it some thoughts, and it would require 
a carefull implementation and some restructuring. For instance, you do not 
want to issue fsync right after having done writes, you want to wait a 
little bit so that the system had time to write the buffers to disk.



The formulation with "necessary property" doesn't seem very clear to me?


Removed.

How about: /* * Note: Buffers are not locked in any way during sorting, 
but that's ok: * A change in the buffer header is only relevant when it 
changes the * buffer's identity. If the identity has changed it'll have 
been * written out by BufferAlloc(), so there's no need for checkpointer 
to * write it out anymore. The buffer might also get written out by a * 
backend or bgwriter, but that's equally harmless. */


This new version included.


 Also, qsort implementation
+* should be resilient to occasional contradictions (cmp(a,b) != 
-cmp(b,a))
+* because of these possible concurrent changes.


Hm. Is that actually the case for our qsort implementation?


I think that it is hard to write a qsort which would fail that. That would 
mean that it would compare the same items twice, which would be 
inefficient.


If the pivot element changes its identity won't the result be pretty 
much random?


That would be a very unlikely event, given the short time spent in qsort. 
Anyway, this is not a problem, and is the beauty of the "advisory" sort

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Josh Berkus

Simon,

Thank you for this summary!  I was losing track, myself.

On 08/09/2015 11:03 PM, Simon Riggs wrote:
> Freezing is painful for VLDBs and high transaction rate systems. We have
> a number of proposals to improve things...

> 3. Speed up autovacuums when they are triggered to avoid wraparounds (Simon)
> Idea is to do a VACUUM scan which only freezes tuples. If we dirty a
> page from freezing we then also prune it, but don't attempt to scan
> indexes to remove the now-truncated dead tuples.
> This looks very straightforward, no technical issues. Might even be able
> to backpatch it.
> [patch investigated but not finished yet]

There's a lesser version of this item which remains relevant unless we
implement (5).  That is, currently the same autovacuum_vaccuum_delay
(AVVD) applies to regular autovacuums as does to anti-wraparound
autovacuums.  If the user has set AV with a high delay, this means that
anti-wraparound AV may never complete.  For that reason, we ought to
have a separate parameter for AVVD, which defaults to a lower number
(like 5ms), or even to zero.

Of course, if we implement (5), that's not necessary, since AV will
never trigger an anti-wraparound freeze.

> Having a freeze map would be wholly unnecessary if we don't ever need to
> freeze whole tables again. Freezing would still be needed on individual
> blocks where an old row has been updated or deleted; a freeze map would
> not help there either.
>
> So there is no conflict, but options 2) and 3) are completely redundant
> if we go for 5). After investigation, I now think 5) is achievable in
> 9.6, but if I am wrong for whatever reason, we have 2) as a backstop.

It's not redundant.  Users may still want to freeze for two reasons:

1. to shrink the clog and multixact logs

2. to support INDEX-ONLY SCAN

In both of those cases, having a freeze map would speed up the manual
vacuum freeze considerably.  Otherwise, we're just punting on the
problem, and making it worse for users who wait too long.

Now, it might still be the case that the *overhead* of a freeze map is a
bad tradeoff if we don't have to worry about forced wraparound.  But
that's a different argument.

BTW, has it occured to anyone that implementing XID epoch headers is
going to mean messing with multixact logs again?  Just thought I'd open
a papercut and pour some lemon juice on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Alvaro Herrera

Masahiko Sawada wrote:

> This topic may have been already discussed but, why don't we use just
> total scanned pages and total pages?

Because those numbers don't extrapolate nicely.  If the density of dead
tuples is irregular across the table, such absolute numbers might be
completely meaningless: you could scan 90% of the table without seeing
any index scan, and then at the final 10% be hit by many index scans
cleaning dead tuples.  Thus you would see progress go up to 90% very
quickly and then take hours to have it go to 91%.  (Additionally, and a
comparatively minor point: since you don't know how many index scans are
going to happen, there's no way to know the total number of blocks
scanned, unless you don't count index blocks at all, and then the
numbers become a lie.)

If you instead track number of heap pages separately from index pages,
and indicate how many index scans have taken place, you have a chance of
actually figuring out how many heap pages are left to scan and how many
more index scans will occur.

> The mechanism of VACUUM is complicated a bit today,

Understatement of the week ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

Alvaro Herrera  writes:
> Tom Lane wrote:
>> I think that's likely for the best anyway; there are too many catalogs
>> that think a pg_am OID identifies an index AM.  Better to create other
>> catalogs for other types of AMs.

> That means we won't be able to reuse pg_class.relam as a pointer to the
> AM-of-the-other-kind either.

Hm.  So one way or the other we're going to end up violating relational
theory somewhere.  OK, I yield: let's say that pg_am has amname, amkind,
amhandler, and nothing else.  Then we will need SQL functions to expose
whatever information we think needs to be available to SQL code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators

2015-08-10 Thread Geoff Winkless

On 10 August 2015 at 16:31, Tom Lane  wrote:

> Pavel Stehule  writes:
> >> On Sun, Aug 09, 2015 at 08:06:11PM -0400, Tom Lane wrote:
> >>> So yeah, I do think that getting a syntax error if you don't use
> >>> parentheses is the preferable behavior here.
>
> > If we raise a syntax error, then there should be very informative
> message,
>
> Yeah, it would sure be nice to throw something better than "syntax error".
> But I've looked at bison's facilities for that, and they're pretty bad.
> I'm not sure there's much we can do short of moving to some other parser
> generator tool, which would be a Large Undertaking ... and I don't know
> of any arguably-better tool anyway.
>

I would say that anyone who's tricksy enough to be using boolean logic in
the way you describe would be expected to have enough nouse about them to
either a) know what the precedences are or b) know that their lack of
knowledge means they should sprinkle their code with
brackets
.

Returning a syntax error for something that isn't actually an error in
syntax is a poor showing.
Are we to start
expecting
syntax errors when people use comparison operators on
NULLable

columns

without a COALESCE

because the comparison might do something they don't expect
 if they haven't tested their code with NULL values
?

IMO using a = b < c as described is unnecessarily* horrid, *whichever* way
you mean it, and will only produce the sort of unreadability that generates
errors in the long run anyway (even if you understand it, chances are the
next poor sap reading your code won't) and deserves code that breaks,
especially if you're the sort of person who uses it without fully
understanding it.

(*Unnecessarily because there are clearer constructs - CASE springs to mind
- that do the same thing without the associated unreadability and/or
ambiguity)

Geoff

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-08-10 Thread Pavel Stehule

2015-08-10 9:11 GMT+02:00 Heikki Linnakangas :

> On 07/26/2015 08:34 AM, Pavel Stehule wrote:
>
>> Hi
>>
>> here is complete patch, that introduce context filtering on client side.
>> The core of this patch is trivial and small - almost all of size are
>> trivial changes in regress tests - removing useless context.
>>
>> Documentation, check-world
>>
>
> Looks good to me at first glance. I'll mark this as Ready for Committer.
>

Is it acceptable for all?

I have not a problem with this way.

Regards

Pavel


>
> - Heikki
>
>

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alvaro Herrera

Tom Lane wrote:
> Petr Jelinek  writes:
> > On 2015-08-10 17:47, Tom Lane wrote:
> >> I don't see any particularly good reason to remove amsupport and
> >> amstrategies from pg_am.  Those are closely tied to the other catalog
> >> infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
> >> candidates for getting changed by this patch.
> 
> > Ok, in that case it seems unlikely that we'll be able to use pg_am for 
> > any other access methods besides indexes in the future.
> 
> I think that's likely for the best anyway; there are too many catalogs
> that think a pg_am OID identifies an index AM.  Better to create other
> catalogs for other types of AMs.

That means we won't be able to reuse pg_class.relam as a pointer to the
AM-of-the-other-kind either.  I don't think this is the best course of
action.  We have the sequence AM patch that already reuses
pg_class.relam to point to pg_seqam.oid, but you objected to that on
relational theory grounds, which seemed reasonable to me.  The other
option is to create yet another pg_class column with an OID of some
other AM catalog, but this seems a waste.

FWIW the column store patch we're working on also wants to have its own
AM-like catalog.  In our current code we have a separate catalog
pg_colstore_am, but are eagerly waiting for the discussion on this to
settle so that we can just use pg_am and pg_class.relam instead.  (The
patch itself is not public yet since it's nowhere near usable, and this
detail is a pretty minor issue, but I thought reasonable to bring it up
here.  We're also waiting on upper-planner "path-ification" since it
seems likely that some code will collide there, anyway.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 18:16, Alvaro Herrera wrote:

Tom Lane wrote:


There are a couple of other pg_am columns, such as amstorage and
amcanorderbyop, which similarly bear on what's legal to appear in
related catalogs such as pg_opclass.  I'd be sort of inclined to
leave those in the catalog as well.  I do not see that exposing
a SQL function is better than exposing a catalog column; either
way, that property is SQL-visible.


If we do that, it doesn't seem reasonable to use the same catalog for
other things such as sequence AM, right?  IMO it'd be better to keep the
catalog agnostic for exactly what each row is going to be an AM for.



Yeah I said the same, the question is if we should have pg_am agnostic 
or just assume that it's index AM and let other AM types create separate 
catalogs. Tom seems to prefer the latter, I don't see problem with that, 
except that I would really hate to add more am related columns to 
pg_class. I would not mind having relam pointing to different AM catalog 
for different relkinds but dunno if that's ok for others (it's not 
really normalized design).


We could also keep pg_am agnostic and add pg_index_am for additional 
info about index AMs, but that would make this patch more invasive.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to compare different datums within from a tuple?

2015-08-10 Thread Peter Moser


Hello,
I try to write my first patch. It is too early to tell more about it, 
but I am just fiddling around with some prototypes.


Can someone tell me, how I can compare two datum fields, when I do not 
know the data type in advance inside an executor function?


For example, "x less than y" where x and y are of various types that 
form intervals. I have found the method ExecTuplesMatch, but it is only 
for equality comparison, I think. Another one is ApplySortComparator... 
maybe that's the correct way to go?


Some code to make things clearer...

Datum x = heap_getattr(out->tts_tuple,
node->xpos,
out->tts_tupleDescriptor,
&isNull1);
Datum y = slot_getattr(curr, node->ypos, &isNull2);

if (compareDatumWithCorrectMethod(x,y) < 0)
{
 /* do something */
}

Thx,
Peter


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 18:08, Tom Lane wrote:

Alexander Korotkov  writes:

On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane  wrote:

There are a couple of other pg_am columns, such as amstorage and
amcanorderbyop, which similarly bear on what's legal to appear in
related catalogs such as pg_opclass.  I'd be sort of inclined to
leave those in the catalog as well.  I do not see that exposing
a SQL function is better than exposing a catalog column; either
way, that property is SQL-visible.



That answers my question, thanks!


BTW, just to clarify: we can still have the desirable property that
CREATE INDEX ACCESS METHOD needs no parameters other than the AM handler
function name.  The AM code would still expose all of its properties
through the struct returned by the handler.  What is at issue here is
how information in that struct that needs to be available to SQL code
gets exposed.  We can do that either by exposing SQL functions to get
it, or by having CREATE INDEX ACCESS METHOD call the handler and then
copy some fields into the new pg_am row.  I'm suggesting that we should
do the latter for any fields that determine validity of pg_opclass,
pg_amop, etc entries associated with the AM type.  The AM could not
reasonably change its mind about such properties (within a major
release at least) so there is no harm in making a long-lived copy
in pg_am.  And we might as well not break SQL code unnecessarily
by removing fields that used to be there.



That's definitely the case for built-in AMs but 3rd party ones won't 
necessarily follow PG release schedule/versioning and I can imagine 
change of for example amcanorderbyop between AM releases as the AM 
matures. This could probably be solved by some kind of invalidation 
mechanism (even if it's some additional SQL).


However I am not sure if using catalog as some sort of cache for 
function output is a good idea in general. IMHO it would be better to 
just have those options as part of CREATE and ALTER DDL for INDEX ACCESS 
METHODS if we store them in pg_am.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cache invalidation skip logic

2015-08-10 Thread Qingqing Zhou

On Sun, Aug 9, 2015 at 8:24 AM, Robert Haas  wrote:
>
> In step 1, AcceptInvalidationMessages() should process all pending
> invalidation messages.  So if step 2 did AcceptInvalidationMessages()
> again it would be a no-op, because no messages should remain at that
> point.
>

That's what I think at first. I would try to see if I can manually repro a case.

Thanks,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

2015-08-10 Thread Alvaro Herrera

Tom Lane wrote:

> There are a couple of other pg_am columns, such as amstorage and
> amcanorderbyop, which similarly bear on what's legal to appear in
> related catalogs such as pg_opclass.  I'd be sort of inclined to
> leave those in the catalog as well.  I do not see that exposing
> a SQL function is better than exposing a catalog column; either
> way, that property is SQL-visible.

If we do that, it doesn't seem reasonable to use the same catalog for
other things such as sequence AM, right?  IMO it'd be better to keep the
catalog agnostic for exactly what each row is going to be an AM for.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2015-08-10 Thread Jeff Janes

On Fri, Nov 21, 2014 at 2:04 AM, Heikki Linnakangas  wrote:

> On 11/20/2014 05:52 AM, Michael Paquier wrote:
>
>> On Wed, Nov 19, 2014 at 7:01 AM, Peter Geoghegan  wrote:
>>
>>> On Tue, Nov 4, 2014 at 7:26 AM, Amit Kapila 
>>> wrote:
>>> 1. Documentation seems to be missing, other API's exposed
>>> via pageinspect are documented at:
>>> http://www.postgresql.org/docs/devel/static/pageinspect.html
>>>
>> Done.
>>
>> 2.
>>> +CREATE FUNCTION gin_metapage(IN page bytea,
>>> +OUT pending_head bigint,
>>> +OUT pending_tail bigint,
>>> +OUT version int4)
>>> +AS 'MODULE_PATHNAME', 'gin_metapage'
>>> +LANGUAGE C STRICT;
>>> a. Isn't it better to name the function as gin_metap(..) similar to
>>> existing function bt_metap(..)?
>>>
>> I actually liked more gin_metapage_info, a name similar to the
>> newly-introduced brin indexes.
>>
>> b. Can this function have a similar signature as bt_metap() which means
>>> it should take input as relname?
>>>
>> That's mostly a matter of taste but I think we should definitely pass
>> a raw page to it as it is now. This has the advantage to add an extra
>> check if the page passed is really a meta page of not, something
>> useful for development.
>>
>> 3. Can gin_dataleafpage() API have similar name and signature as
>>> API bt_page_items() exposed for btree?
>>>
>> What about gin_leafpage_items then?
>>
>
> The signature of bt_page_items() isn't a good example to follow. It
> existed before the get_raw_page() function, and the other functions that
> are designed to work with that, was added. gin_leafpage_items() name seems
> fine to me.



When I call gin_leafpage_items on a {leaf} page, I get the ERROR:

ERROR:  input page is not a compressed GIN data leaf page
DETAIL:  Flags 0002, expected 0083

I'm don't know why it won't work on an uncompressed leaf page (or for that
matter, why my index pages are not compressed), but the docs should
probably note the restriction.

Cheers,

Jeff


pageinspect_gin_leaf.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

Petr Jelinek  writes:
> On 2015-08-10 17:47, Tom Lane wrote:
>> I don't see any particularly good reason to remove amsupport and
>> amstrategies from pg_am.  Those are closely tied to the other catalog
>> infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
>> candidates for getting changed by this patch.

> Ok, in that case it seems unlikely that we'll be able to use pg_am for 
> any other access methods besides indexes in the future.

I think that's likely for the best anyway; there are too many catalogs
that think a pg_am OID identifies an index AM.  Better to create other
catalogs for other types of AMs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

Alexander Korotkov  writes:
> On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane  wrote:
>> There are a couple of other pg_am columns, such as amstorage and
>> amcanorderbyop, which similarly bear on what's legal to appear in
>> related catalogs such as pg_opclass.  I'd be sort of inclined to
>> leave those in the catalog as well.  I do not see that exposing
>> a SQL function is better than exposing a catalog column; either
>> way, that property is SQL-visible.

> That answers my question, thanks!

BTW, just to clarify: we can still have the desirable property that
CREATE INDEX ACCESS METHOD needs no parameters other than the AM handler
function name.  The AM code would still expose all of its properties
through the struct returned by the handler.  What is at issue here is
how information in that struct that needs to be available to SQL code
gets exposed.  We can do that either by exposing SQL functions to get
it, or by having CREATE INDEX ACCESS METHOD call the handler and then
copy some fields into the new pg_am row.  I'm suggesting that we should
do the latter for any fields that determine validity of pg_opclass,
pg_amop, etc entries associated with the AM type.  The AM could not
reasonably change its mind about such properties (within a major
release at least) so there is no harm in making a long-lived copy
in pg_am.  And we might as well not break SQL code unnecessarily
by removing fields that used to be there.

There may be some other AM properties that would be better to expose
through SQL functions because they could potentially change without
invalidating information stored in the other catalogs.  I'm not sure
whether there is any such data that needs to be accessible at SQL
level, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface


On 2015-08-10 17:47, Tom Lane wrote:

Petr Jelinek  writes:

On 2015-08-10 16:58, Alexander Korotkov wrote:

That should work, thanks! Also we can have SQL-visible functions to get
amsupport and amstrategies and use them in the regression tests.



SQL-visible functions would be preferable to storing it in pg_am as
keeping the params in pg_am would limit the extensibility of pg_am itself.


I don't see any particularly good reason to remove amsupport and
amstrategies from pg_am.  Those are closely tied to the other catalog
infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
candidates for getting changed by this patch.



Ok, in that case it seems unlikely that we'll be able to use pg_am for 
any other access methods besides indexes in the future.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Masahiko Sawada

On Tue, Aug 11, 2015 at 12:20 AM, Simon Riggs  wrote:
> On 10 August 2015 at 15:59, Syed, Rahila  wrote:
>>
>> Hello,
>>
>> >When we're in Phase2 or 3, don't we need to report the number of total
>> > page scanned or percentage of how many table pages scanned, as well?
>> The total heap pages scanned need to be reported with phase 2 or 3.
>> Complete progress report need to have numbers from each phase when
>> applicable.
>>
>> > Phase 1. Report 2 integer counters: heap pages scanned and total heap
>> > pages,
>> > 1 float counter: percentage_complete and progress message.
>> > Phase 2. Report 2 integer counters: index pages scanned and total
>> > index pages(across all indexes) and progress message.
>> > Phase 3. 1 integer counter: heap pages vacuumed.
>>
>> Sorry for being unclear here. What I meant to say is, each phase of a
>> command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a
>> separate array element and
>> will comprise of n integers, n floats, string. So , in the view reporting
>> progress, VACUUM command can have 3 entries one for each phase.
>
>
> VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an
> unbounded number of phases. But that assumes that we don't count truncation
> as a 4th phase of VACUUM...

Yeah.
This topic may have been already discussed but, why don't we use just
total scanned pages and total pages?

The mechanism of VACUUM is complicated a bit today, and other
maintenance command is as well.
It would be tough to trace these processing, and these might be
changed in the future.
But basically, we can trace total scanned pages of target relation
easily, and such information would be enough at many case.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane  wrote:

> Petr Jelinek  writes:
> > On 2015-08-10 16:58, Alexander Korotkov wrote:
> >> That should work, thanks! Also we can have SQL-visible functions to get
> >> amsupport and amstrategies and use them in the regression tests.
>
> > SQL-visible functions would be preferable to storing it in pg_am as
> > keeping the params in pg_am would limit the extensibility of pg_am
> itself.
>
> I don't see any particularly good reason to remove amsupport and
> amstrategies from pg_am.  Those are closely tied to the other catalog
> infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
> candidates for getting changed by this patch.
>
> There are a couple of other pg_am columns, such as amstorage and
> amcanorderbyop, which similarly bear on what's legal to appear in
> related catalogs such as pg_opclass.  I'd be sort of inclined to
> leave those in the catalog as well.  I do not see that exposing
> a SQL function is better than exposing a catalog column; either
> way, that property is SQL-visible.
>

That answers my question, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Rework access method interface

Petr Jelinek  writes:
> On 2015-08-10 16:58, Alexander Korotkov wrote:
>> That should work, thanks! Also we can have SQL-visible functions to get
>> amsupport and amstrategies and use them in the regression tests.

> SQL-visible functions would be preferable to storing it in pg_am as 
> keeping the params in pg_am would limit the extensibility of pg_am itself.

I don't see any particularly good reason to remove amsupport and
amstrategies from pg_am.  Those are closely tied to the other catalog
infrastructure for indexes (pg_amproc, pg_amop) which I don't think are
candidates for getting changed by this patch.

There are a couple of other pg_am columns, such as amstorage and
amcanorderbyop, which similarly bear on what's legal to appear in
related catalogs such as pg_opclass.  I'd be sort of inclined to
leave those in the catalog as well.  I do not see that exposing
a SQL function is better than exposing a catalog column; either
way, that property is SQL-visible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

On Mon, Aug 10, 2015 at 6:36 PM, Petr Jelinek  wrote:

> On 2015-08-10 16:58, Alexander Korotkov wrote:
>
>> That should work, thanks! Also we can have SQL-visible functions to get
>> amsupport and amstrategies and use them in the regression tests.
>>
>>
> SQL-visible functions would be preferable to storing it in pg_am as
> keeping the params in pg_am would limit the extensibility of pg_am itself.

I actually meant just two particular functions (not per AM) which both get
AM oid as argument. They should call amhandler and return amsupport and
amstrategies correspondingly.
These functions can be used in regression tests to check opclass and
opfamilies correctness.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Rework access method interface

On 2015-08-10 16:58, Alexander Korotkov wrote:

On Mon, Aug 10, 2015 at 5:48 PM, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

Alexander Korotkov mailto:a.korot...@postgrespro.ru>> writes:
> On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek mailto:p...@2ndquadrant.com>> wrote:
>> I don't understand this, there is already AmRoutine in RelationData, why
>> the need for additional field for just amsupport?

> We need amsupport in load_relcache_init_file() which reads
> "pg_internal.init". I'm not sure this is correct place to call am_handler.
> It should work in the case of built-in AM. But if AM is defined in the
> extension then we wouldn't be able to do catalog lookup for am_handler on
> this stage of initialization.

This is an issue we'll have to face before there's much hope of having
index AMs as extensions: how would you locate any extension function
without catalog access?  Storing raw function pointers in
pg_internal.init
is not an answer in an ASLR world.

I think we can dodge the issue so far as pg_internal.init is
concerned by
decreeing that system catalogs can only have indexes with built-in AMs.
Calling a built-in function doesn't require catalog access, so there
should be no problem with re-calling the handler function by OID during
load_relcache_init_file().

That should work, thanks! Also we can have SQL-visible functions to get
amsupport and amstrategies and use them in the regression tests.

SQL-visible functions would be preferable to storing it in pg_am as 
keeping the params in pg_am would limit the extensibility of pg_am itself.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Precedence of standard comparison operators

Pavel Stehule  writes:
>> On Sun, Aug 09, 2015 at 08:06:11PM -0400, Tom Lane wrote:
>>> So yeah, I do think that getting a syntax error if you don't use
>>> parentheses is the preferable behavior here.

> If we raise a syntax error, then there should be very informative message,

Yeah, it would sure be nice to throw something better than "syntax error".
But I've looked at bison's facilities for that, and they're pretty bad.
I'm not sure there's much we can do short of moving to some other parser
generator tool, which would be a Large Undertaking ... and I don't know
of any arguably-better tool anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Simon Riggs

On 10 August 2015 at 15:59, Syed, Rahila  wrote:

> Hello,
>
> >When we're in Phase2 or 3, don't we need to report the number of total
> page scanned or percentage of how many table pages scanned, as well?
> The total heap pages scanned need to be reported with phase 2 or 3.
> Complete progress report need to have numbers from each phase when
> applicable.
>
> > Phase 1. Report 2 integer counters: heap pages scanned and total heap
> > pages,
> > 1 float counter: percentage_complete and progress message.
> > Phase 2. Report 2 integer counters: index pages scanned and total
> > index pages(across all indexes) and progress message.
> > Phase 3. 1 integer counter: heap pages vacuumed.
>
> Sorry for being unclear here. What I meant to say is, each phase of a
> command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be
> a separate array element and
> will comprise of n integers, n floats, string. So , in the view reporting
> progress, VACUUM command can have 3 entries one for each phase.
>

VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an
unbounded number of phases. But that assumes that we don't count truncation
as a 4th phase of VACUUM...

SELECT statements also have a variable number of phases, hash, materialize,
sorts all act as blocking nodes where you cannot progress to next phase
until it is complete and you don't know for certain how much data will come
in later phases.

I think the best you'll do is an array of pairs of values [(current blocks,
total blocks), ... ]

Knowing how many phases there are is a tough problem. I think the only way
forwards is to admit that we will publish our best initial estimate of
total workload size and then later we may realise it was wrong and publish
a better number (do until complete). It's not wonderful, but la vida es
loca.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] checkpointer continuous flushing

Hi,

On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote:
> I ripped out the "flushing" part, keeping only the sorting. I refactored the
> logic in BufferSync() a bit. There's now a separate function,
> nextCheckpointBuffer(), that returns the next buffer ID from the sorted
> list. The tablespace-parallelization behaviour in encapsulated there,
> keeping the code in BufferSync() much simpler. See attached. Needs some
> minor cleanup and commenting still before committing, and I haven't done any
> testing besides a simple "make check".

Thought it'd be useful to review the current version as well. Some of
what I'm commenting on you'll probably already have though of under the
label of "minor cleanup".

>  /*
> + * Array of buffer ids of all buffers to checkpoint.
> + */
> +static int *CheckpointBufferIds = NULL;
> +
> +/* Compare checkpoint buffers
> + */

Should be at the beginning of the file. There's a bunch more cases of that.

> +/* Compare checkpoint buffers
> + */
> +static int bufcmp(const int * pa, const int * pb)
> +{
> + BufferDesc
> + *a = GetBufferDescriptor(*pa),
> + *b = GetBufferDescriptor(*pb);
> +
> + /* tag: rnode, forkNum (different files), blockNum
> +  * rnode: { spcNode (ignore: not really needed),
> +  *   dbNode (ignore: this is a directory), relNode }
> +  * spcNode: table space oid, not that there are at least two
> +  * (pg_global and pg_default).
> +  */
> + /* compare relation */
> + if (a->tag.rnode.spcNode < b->tag.rnode.spcNode)
> + return -1;
> + else if (a->tag.rnode.spcNode > b->tag.rnode.spcNode)
> + return 1;
> + if (a->tag.rnode.relNode < b->tag.rnode.relNode)
> + return -1;
> + else if (a->tag.rnode.relNode > b->tag.rnode.relNode)
> + return 1;
> + /* same relation, compare fork */
> + else if (a->tag.forkNum < b->tag.forkNum)
> + return -1;
> + else if (a->tag.forkNum > b->tag.forkNum)
> + return 1;
> + /* same relation/fork, so same segmented "file", compare block number
> +  * which are mapped on different segments depending on the number.
> +  */
> + else if (a->tag.blockNum < b->tag.blockNum)
> + return -1;
> + else /* should not be the same block anyway... */
> + return 1;
> +}

This definitely needs comments about ignoring the normal buffer header
locking.

Why are we ignoring the database directory? I doubt it'll make a huge
difference, but grouping metadata affecting operations by directory
helps.

> +
> +static void
> +AllocateCheckpointBufferIds(void)
> +{
> + /* Safe worst case allocation, all buffers belong to the checkpoint...
> +  * that is pretty unlikely.
> +  */
> + CheckpointBufferIds = (int *) palloc(sizeof(int) * NBuffers);
> +}

(wrong comment style...)

Heikki, you were concerned about the size of the allocation of this,
right? I don't think it's relevant - we used to allocate an array of
that size for the backend's private buffer pin array until 9.5, so in
theory we should be safe agains that. NBuffers is limited to INT_MAX/2
in guc.ċ, which ought to be sufficient?

> + /*
> +  * Lazy allocation: this function is called through the checkpointer,
> +  * but also by initdb. Maybe the allocation could be moved to the 
> callers.
> +  */
> + if (CheckpointBufferIds == NULL)
> + AllocateCheckpointBufferIds();
> +
> 

I don't think it's a good idea to allocate this on every round. That
just means a lot of page table entries have to be built and torn down
regularly. It's not like checkpoints only run for 1% of the time or
such.

FWIW, I still think it's a much better idea to allocate the memory once
in shared buffers. It's not like that makes us need more memory overall,
and it'll be huge page allocations if configured. I also think that
sooner rather than later we're going to need more than one process
flushing buffers, and then it'll need to be moved there.

> + /*
> +  * Sort buffer ids to help find sequential writes.
> +  *
> +  * Note: buffers are not locked in anyway, but that does not matter,
> +  * this sorting is really advisory, if some buffer changes status during
> +  * this pass it will be filtered out later.  The only necessary property
> +  * is that marked buffers do not move elsewhere.
> +  */

That reasoning makes it impossible to move the fsyncing of files into
the loop (whenever we move to a new file). That's not nice. The
formulation with "necessary property" doesn't seem very clear to me?

How about:
/*
 * Note: Buffers are not locked in any way during sorting, but that's ok:
 * A change in the buffer header is only relevant when it changes the
 * buffer's identity. If the identity has changed it'll have been
 * written out by BufferAlloc(), so there's no need for checkpointer to
 * write it out anymore. The buffer might also get written out by a
 * b

Re: [HACKERS] Asynchronous execution on FDW

2015-08-10 Thread Robert Haas

On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas  wrote:
> I've marked this as rejected in the commitfest, because others are
> working on a more general solution with parallel workers. That's still
> work-in-progress, and it's not certain if it's going to make it into
> 9.6, but if it does it will largely render this obsolete. We can revisit
> this patch later in the release cycle, if the parallel scan patch hasn't
> solved the same use case by then.

I think the really important issue for this patch is the one discussed here:

http://www.postgresql.org/message-id/ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com

You raised an important issue there but never really expressed an
opinion on the points I raised, here or on the other thread.  And
neither did anyone else except the patch author who, perhaps
unsurprisingly, thinks it's OK.  I wish we could get more discussion
about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Syed, Rahila

Hello,

>When we're in Phase2 or 3, don't we need to report the number of total page 
>scanned or percentage of how many table pages scanned, as well?
The total heap pages scanned need to be reported with phase 2 or 3. Complete 
progress report need to have numbers from each phase when applicable. 

> Phase 1. Report 2 integer counters: heap pages scanned and total heap 
> pages,
> 1 float counter: percentage_complete and progress message.
> Phase 2. Report 2 integer counters: index pages scanned and total 
> index pages(across all indexes) and progress message.
> Phase 3. 1 integer counter: heap pages vacuumed.

Sorry for being unclear here. What I meant to say is, each phase of a command 
will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a separate 
array element and 
will comprise of n integers, n floats, string. So , in the view reporting 
progress, VACUUM command can have 3 entries one for each phase. 

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

On Mon, Aug 10, 2015 at 5:48 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek 
> wrote:
> >> I don't understand this, there is already AmRoutine in RelationData, why
> >> the need for additional field for just amsupport?
>
> > We need amsupport in load_relcache_init_file() which reads
> > "pg_internal.init". I'm not sure this is correct place to call
> am_handler.
> > It should work in the case of built-in AM. But if AM is defined in the
> > extension then we wouldn't be able to do catalog lookup for am_handler on
> > this stage of initialization.
>
> This is an issue we'll have to face before there's much hope of having
> index AMs as extensions: how would you locate any extension function
> without catalog access?  Storing raw function pointers in pg_internal.init
> is not an answer in an ASLR world.
>
> I think we can dodge the issue so far as pg_internal.init is concerned by
> decreeing that system catalogs can only have indexes with built-in AMs.
> Calling a built-in function doesn't require catalog access, so there
> should be no problem with re-calling the handler function by OID during
> load_relcache_init_file().
>

That should work, thanks! Also we can have SQL-visible functions to get
amsupport and amstrategies and use them in the regression tests.


> We could also have problems with WAL replay, though I think the consensus
> there is that extension AMs have to use generic WAL records that don't
> require any index-specific replay code.
>

Yes, I've already showed one version of generic WAL records. The main
objecting against them was it's hard insure that composed WAL-record is
correct.
Now I'm working on new version which would be much easier and safe to use.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning

David Rowley  writes:
> On 10 August 2015 at 07:50, Tom Lane  wrote:
>> I've started to work on path-ification of the upper planner (finally),

> I was digging around the grouping_planner() last week with the intentions
> of making some changes there to allow GROUP BY before join, but in the end
> decided to stay well clear of this area until this pathification is done.
> So far I've managed to keep my changes away from the upper planner and I've
> added "GroupingPath" types, which from what I can predict of what you'll be
> making changes to, I think you'll also need to have grouping_planner()
> return a few variations of "GroupingPath" to allow the paths list to be
> passed up to subquery_planner() and on up to functions
> like recurse_set_operations() so that they have the option of choosing
> GroupAggregate / MergeAppend to implement UNION.

> If I'm right on this, then maybe there's a few things you can copy and
> paste from the patch I posted here:
> http://www.postgresql.org/message-id/CAKJS1f-sEcm=gtfs-dqjsocsz-vlhrp_hsrntjjq-s7egn8...@mail.gmail.com
>  specifically around create_grouping_path()?

Yeah, I saw your patch, but have not yet had time to think about what
parts of it I could borrow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

Alexander Korotkov  writes:
> On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek  wrote:
>> I don't understand this, there is already AmRoutine in RelationData, why
>> the need for additional field for just amsupport?

> We need amsupport in load_relcache_init_file() which reads
> "pg_internal.init". I'm not sure this is correct place to call am_handler.
> It should work in the case of built-in AM. But if AM is defined in the
> extension then we wouldn't be able to do catalog lookup for am_handler on
> this stage of initialization.

This is an issue we'll have to face before there's much hope of having
index AMs as extensions: how would you locate any extension function
without catalog access?  Storing raw function pointers in pg_internal.init
is not an answer in an ASLR world.

I think we can dodge the issue so far as pg_internal.init is concerned by
decreeing that system catalogs can only have indexes with built-in AMs.
Calling a built-in function doesn't require catalog access, so there
should be no problem with re-calling the handler function by OID during
load_relcache_init_file().

We could also have problems with WAL replay, though I think the consensus
there is that extension AMs have to use generic WAL records that don't
require any index-specific replay code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests remove working directories




On 08/10/2015 09:55 AM, Tom Lane wrote:

Andrew Dunstan  writes:

On 08/09/2015 08:58 PM, Michael Paquier wrote:

Sure. Attached is what I have in mind. Contrary to your version we
keep around temp paths should a run succeed after one that has failed
when running make check multiple times in a row. Perhaps it does not
matter much in practice as log files get removed at each new run but I
think that this allows a finer control in case of failure. Feel free
to have a look.

Actually, I don't think this is a very good idea. You could end up with
a whole series of opaquely named directories from a series of failing
runs. If you want to keep the directory after a failure, and want to do
another run, then rename the directory. That's what you have to do with
the main regression tests, too. My version also has the benefit of
changing exactly 3 lines in the source :-)

FWIW, I think we should keep the behavior of the TAP tests as close as
possible to the established behavior of the main regression tests.

It's certainly possible that there's room for improvement in that
benchmark behavior.  But let's first converge the test behaviors,
and then have a discussion about whether we want changes, and if
so change all the tests at the same time.



Yeah. To do that we should probably stop using File::Temp to make the 
directory, and just use a hardcoded name given to File::Path::mkpath. If 
the directory exists, we'd just remove it first.


That won't be a very big change - probably just a handful of lines.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication slot restart_lsn initialization

On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote:
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -40,10 +40,10 @@
>  #include 
>  
>  #include "access/transam.h"
> +#include "access/xlog_internal.h"
>  #include "common/string.h"
>  #include "miscadmin.h"
>  #include "replication/slot.h"
> -#include "storage/fd.h"
>  #include "storage/proc.h"
>  #include "storage/procarray.h"

Why did you remove fd.h? The file definitely uses infrastructure from
there. We're not terribly consistent about that but I'd rather not rely
on it being included via xlog_internal.h -> xlog.h.

>  /*
> + * Grab and save an LSN value to prevent WAL recycling past that point.
> + */
> +void
> +ReplicationSlotRegisterRestartLSN()
> +{

Didn't like that description and function name very much. What does
'grabbing' mean here? Should probably mention that it works on the
currently active slot and modifies it.

It's now ReplicationSlotReserveWal()

> + ReplicationSlot *slot = MyReplicationSlot;
> +
> + Assert(slot != NULL);
> + Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
> +
> + /*
> +  * The replication slot mechanism is used to prevent removal of required
> +  * WAL. As there is no interlock between this and checkpoints required, 
> WAL
> +  * segment could be removed before ReplicationSlotsComputeRequiredLSN() 
> has
> +  * been called to prevent that. In the very unlikely case that this 
> happens
> +  * we'll just retry.
> +  */

You removed some punctuation in that sentence converting a sentence in
bad english into one without the original meaning ;). See the attached
for a new version.

> + while (true)
> + {
> + XLogSegNo   segno;
> +
> + /*
> +  * Let's start with enough information if we can, so log a 
> standby
> +  * snapshot and start logical decoding at exactly that position.
> +  */
> + if (!RecoveryInProgress())
> + {
> + XLogRecPtr  flushptr;
> +
> + /* start at current insert position */
> + slot->data.restart_lsn = GetXLogInsertRecPtr();
> +
> + /*
> +  * Log an xid snapshot for logical replication. This 
> snapshot is not
> +  * needed for physical replication, as it relies on the 
> snapshot
> +  * created by checkpoint when the base backup starts.
> +  */
> + if (slot->data.database != InvalidOid)
> + {
> + /* make sure we have enough information to 
> start */
> + flushptr = LogStandbySnapshot();
> +
> + /* and make sure it's fsynced to disk */
> + XLogFlush(flushptr);
> + }
> + }
> + else
> + slot->data.restart_lsn = GetRedoRecPtr();
> +
> + /* prevent WAL removal as fast as possible */
> + ReplicationSlotsComputeRequiredLSN();
> +
> + /*
> +  * If all required WAL is still there, great, otherwise retry. 
> The
> +  * slot should prevent further removal of WAL, unless there's a
> +  * concurrent ReplicationSlotsComputeRequiredLSN() after we've 
> written
> +  * the new restart_lsn above, so normally we should never need 
> to loop
> +  * more than twice.
> +  */
> + XLByteToSeg(slot->data.restart_lsn, segno);
> + if (XLogGetLastRemovedSegno() < segno)
> + break;
> + }
> +}

The way you added the check for logical vs. physical slots in there
looks wrong to me. For a physical slot created !InRecovy we'll possibly
return a xlog position from the future (it's the insert position *and*
not flushed to disk), which then cannot be received.

> +/*
>   * Flush all replication slots to disk.
>   *
>   * This needn't actually be part of a checkpoint, but it's a convenient
> @@ -876,7 +942,7 @@ StartupReplicationSlots(void)
>  }
>  
>  /* 
> - * Manipulation of ondisk state of replication slots
> + * Manipulation of on-disk state of replication slots
>   *
>   * NB: none of the routines below should take any notice whether a slot is 
> the
>   * current one or not, that's all handled a layer above.
> diff --git a/src/backend/replication/slotfuncs.c 
> b/src/backend/replication/slotfuncs.c
> index 9a2793f..01b376a 100644
> --- a/src/backend/replication/slotfuncs.c
> +++ b/src/backend/replication/slotfuncs.c
> @@ -40,6 +40,7 @@ Datum
>  pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
>  {
>   Namename = PG_GETARG_NAME(0);
> + boolimmediately_reserve = PG_GETARG_BOOL(1);
>   Datum   values[2];
>   boolnulls[2];
>   TupleDesc   tupdesc;
> @@ -58,10 +59

Re: [HACKERS] tap tests remove working directories

Andrew Dunstan  writes:
> On 08/09/2015 08:58 PM, Michael Paquier wrote:
>> Sure. Attached is what I have in mind. Contrary to your version we
>> keep around temp paths should a run succeed after one that has failed
>> when running make check multiple times in a row. Perhaps it does not
>> matter much in practice as log files get removed at each new run but I
>> think that this allows a finer control in case of failure. Feel free
>> to have a look.

> Actually, I don't think this is a very good idea. You could end up with 
> a whole series of opaquely named directories from a series of failing 
> runs. If you want to keep the directory after a failure, and want to do 
> another run, then rename the directory. That's what you have to do with 
> the main regression tests, too. My version also has the benefit of 
> changing exactly 3 lines in the source :-)

FWIW, I think we should keep the behavior of the TAP tests as close as
possible to the established behavior of the main regression tests.

It's certainly possible that there's room for improvement in that
benchmark behavior.  But let's first converge the test behaviors,
and then have a discussion about whether we want changes, and if
so change all the tests at the same time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests remove working directories




On 08/09/2015 08:58 PM, Michael Paquier wrote:

On Sun, Aug 9, 2015 at 11:19 PM, Andrew Dunstan  wrote:


On 08/09/2015 08:41 AM, Michael Paquier wrote:

On Sun, Aug 9, 2015 at 1:40 AM, Andrew Dunstan 
wrote:

On 08/08/2015 09:31 AM, Robert Haas wrote:

On Fri, Aug 7, 2015 at 7:17 PM, Andrew Dunstan 
wrote:

That certainly isn't what happens, and given the way this is done in
TestLib.pm, using the CLEANUP parameter of File::Temp's tempdir()
function,
it's not clear how we could do that easily.



Set cleanup to false and manually remove the directory later in the
code, so that stuff runs only if we haven't died sooner?




Yeah, maybe. I'm thinking of trying to do it more globally, like in
src/Makefile.global.in. That way we wouldn't have to add new code to
every
test file.

If we rely on END to clean up the temporary data folder, there is no
need to impact each test file, just the test functions called in
TestLib.pm that could switch a flag to not perform any cleanup at the
end of the run should an unexpected result be found. Now I am as well
curious to see what you have in mind with manipulating
Makefile.global.

See attached. If you have a better idea please post your patch.

Sure. Attached is what I have in mind. Contrary to your version we
keep around temp paths should a run succeed after one that has failed
when running make check multiple times in a row. Perhaps it does not
matter much in practice as log files get removed at each new run but I
think that this allows a finer control in case of failure. Feel free
to have a look.



Actually, I don't think this is a very good idea. You could end up with 
a whole series of opaquely named directories from a series of failing 
runs. If you want to keep the directory after a failure, and want to do 
another run, then rename the directory. That's what you have to do with 
the main regression tests, too. My version also has the benefit of 
changing exactly 3 lines in the source :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

2015-08-10 Thread Simon Riggs

On 10 August 2015 at 13:55, Robert Haas  wrote:

> On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs 
> wrote:
> > On another review I suggested we add a function to core to allow it to be
> > used in regression tests. A long debate ensued, deciding that we must be
> > consistent and put diagnostic functions in contrib. My understanding is
> that
> > we are not allowed to write regression tests that use contrib modules,
> yet
> > the consistent place to put diagnostic functions is contrib - therefore,
> we
> > are never allowed to write tests utilizing diagnostic functions. We are
> > allowed to put modules for testing in another directory, but these are
> not
> > distributed to users so cannot be used for production diagnosis. Systemic
> > fail, advice needed.
>
> There are actually two ways to do this.
>
> One model is the dummy_seclabel module.  The build system arranges for
> that to be available when running the core regression tests, so they
> can use it.  And the dummy_seclabel test does.  There are actually a
> couple of other loadable modules that are used by the core regression
> tests in this kind of way (refint and autoinc, from contrib/spi).
>
> The other model is what I'll call the test_decoding model.  Since the
> core code can't be tested without a module, we have a module.  But
> then the tests are in the module's directory
> (contrib/test_decoding/{sql,expected}) not the core regression tests.
>
> In general, I have a mild preference for the second model.  It seems
> more scalable, and keeps the core tests quick to run, which is
> appropriate for more obscure tests that are unlikely to break very
> often.  But the first model can also be done, as show by the fact that
> we have in fact done it several times.
>

Neither of those uses a diagnostic function that also has value in
production environments - for logical decoding we added something to core
specifically to allow this pg_recvlogical.

Anyway, resolving this isn't important anymore because I wish to pursue a
different mechanism for freezing, but its possible I hit the same issue.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-10 Thread Masahiko Sawada

On Mon, Aug 10, 2015 at 1:36 PM, Rahila Syed  wrote:
> Hello,
>
>>Say, 6 bigint counters, 6 float8
>>counters, and 3 strings up to 80 characters each.  So we have a
>>fixed-size chunk of shared memory per backend, and each backend that
>>wants to expose progress information can fill in those fields however
>>it likes, and we expose the results.
>>This would be sorta like the way pg_statistic works: the same columns
>>can be used for different purposes depending on what estimator will be
>>used to access them.
>
> After thinking more on this suggestion, I came up with following generic
> structure which can be used to store progress of any command per backend in
> shared memory.
>
> Struct PgBackendProgress
> {
> int32 *counter[COMMAND_NUM_SLOTS];
> float8 *counter_float[COMMAND_NUM_SLOTS];
>
> char *progress_message[COMMAND_NUM_SLOTS];
> }
>
> COMMAND_NUM_SLOTS will define maximum number of slots(phases)  for any
> command.
> Progress of command will be measured using progress of each phase in
> command.
> For some command the number of phases can be singular and rest of the slots
> will be NULL.
>
> Each phase will report n integer counters, n float counters and a progress
> message.
> For some phases , any of the above fields can be NULL.
>
> For VACUUM , there can 3 phases as discussed in the earlier mails.
>
> Phase 1. Report 2 integer counters: heap pages scanned and total heap pages,
> 1 float counter: percentage_complete and progress message.
> Phase 2. Report 2 integer counters: index pages scanned and total index
> pages(across all indexes) and progress message.
> Phase 3. 1 integer counter: heap pages vacuumed.
>
> This structure can be accessed by statistics collector to display progress
> via new view.

I have one question about this.

When we're in Phase2 or 3, don't we need to report the number of total
page scanned or percentage of how many table pages scanned, as well?
As Robert said, Phase2(means lazy_vacuum_index here) and 3(means
lazy_vacuum_heap here) could be called whenever lazy_scan_heap fills
up the maintenance_work_mem. And phase 3 could be called at the end of
scanning single page if table doesn't have index.
So if vacuum progress checker reports the only current phase
information, we would not be able to know where we are in now.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

2015-08-10 Thread Robert Haas

On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs  wrote:
> Almost every patch I review has either zero or insufficient tests.
>
> If we care about robustness, then we must discuss tests. Here are my two
> recent experiences:
>
> I agree we could do with x10 as many tests, but that doesn't mean all tests
> make sense, so there needs to be some limiting principles to avoid adding
> pointless test cases. I recently objected to adding a test case to Isolation
> tester for the latest ALTER TABLE patch because in that case there is no
> concurrent behavior to test. There is already a regression test that tests
> lock levels for those statements, so in my view it is more sensible to
> modify the existing test than to add a whole new isolation test that does
> nothing more than demonstrate the lock levels work as described, which we
> already knew.
>
> On another review I suggested we add a function to core to allow it to be
> used in regression tests. A long debate ensued, deciding that we must be
> consistent and put diagnostic functions in contrib. My understanding is that
> we are not allowed to write regression tests that use contrib modules, yet
> the consistent place to put diagnostic functions is contrib - therefore, we
> are never allowed to write tests utilizing diagnostic functions. We are
> allowed to put modules for testing in another directory, but these are not
> distributed to users so cannot be used for production diagnosis. Systemic
> fail, advice needed.

There are actually two ways to do this.

One model is the dummy_seclabel module.  The build system arranges for
that to be available when running the core regression tests, so they
can use it.  And the dummy_seclabel test does.  There are actually a
couple of other loadable modules that are used by the core regression
tests in this kind of way (refint and autoinc, from contrib/spi).

The other model is what I'll call the test_decoding model.  Since the
core code can't be tested without a module, we have a module.  But
then the tests are in the module's directory
(contrib/test_decoding/{sql,expected}) not the core regression tests.

In general, I have a mild preference for the second model.  It seems
more scalable, and keeps the core tests quick to run, which is
appropriate for more obscure tests that are unlikely to break very
often.  But the first model can also be done, as show by the fact that
we have in fact done it several times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statement_timeout affects query results fetching?

2015-08-10 Thread Robert Haas

On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky  wrote:
> Thanks for the explanation Robert, that makes total sense. However, it seems
> like the utility of PG's statement_timeout is much more limited than I
> thought.
>
> In case you're interested, I dug a little further and it seems that
> Microsoft's client for SQL Server implements the following timeout (source):
>
> cumulative time-out (for all network packets that are read during the
> invocation of a method) for all network reads during command execution or
> processing of the results. A time-out can still occur after the first row is
> returned, and does not include user processing time, only network read time.
>
> Since it doesn't seem possible to have a clean query-processing-only timeout
> at the backend, we may be better off doing something similar to the above
> and enforce timeouts on the client only. Any further thoughts on this would
> be appreciated.

An alternative you may want to consider is using the Execute message
with a non-zero row count and reading all of the returned rows as they
come back, buffering them in memory.  When those have all been
consumed, issue another Execute message and get some more rows.

AFAICS, the biggest problem with this is that there's no good way to
bound the number of rows returned by size rather than by number, which
has been complained about before by somebody else in a situation
similar to yours.  Another problem is that I believe it will cause
cursor_tuple_fraction to kick in, which may change query plans.  But
it does have the advantage that the query will be suspended from the
server's point of view, which I *think* will toll statement_timeout.

You might also consider exposing some knobs to the user, so that they
can set the number of rows fetched in one go, and let that be all the
rows or only some of them.

We really need a better way of doing this, but I think this is the way
other drivers are handling it now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek  wrote:

> On 2015-08-09 23:56, Alexander Korotkov wrote:
>
>> Hacker,
>>
>> some time before I proposed patches implementing CREATE ACCESS METHOD.
>>
>> http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
>> As I get from comments to my patches and also from Tom's comment about
>> AM interface in tablesampling thread – AM interface needs reworking. And
>> AFAICS AM interface rework is essential to have CREATE ACCESS METHOD
>> command.
>> http://www.postgresql.org/message-id/5438.1436740...@sss.pgh.pa.us
>>
>
> Cool, I was planning to take a stab at this myself when I have time, so I
> am glad you posted this.
>
> This is why I'd like to show a WIP patch implementing AM interface
>> rework. Patch is quite dirty yet, but I think it illustrated the idea
>> quite clear. AM now have single parameter – handler function. This
>> handler returns pointer to AmRoutine which have the same data as pg_am
>> had before. Thus, API is organized very like FDW.
>>
>>
> I wonder if it would make sense to call it IndexAmRoutine instead in case
> we add other AMs (like the sequence am, or maybe column store if that's
> done as AM) in the future.
>

Good point!


> The patch should probably add the am_handler type which should be return
> type of the am handler function (see fdw_handler and tsm_handler types).
>

Sounds reasonable!


> I also think that the CHECK_PROCEDUREs should be in the place of the
> original GET_REL_PROCEDUREs  (just after relation check). If the interface
> must exist we may as well check for it at the beginning and not after we
> did some other work which is useless without the interface.


Ok, good point too.

However, this patch appears to take more work than I expected. It have
>> to do many changes spread among many files.
>>
>
> Yeah you need to change the definition and I/O handling of every AM
> function, but that's to be expected.
>

Yes, this is why I decided to get some feedback on this stage of work.


> Also, it appears not so easy
>> to hide amsupport into AmRoutine, because it's needed for relcache. As a
>> temporary solution it's duplicated in RelationData.
>>
>
> I don't understand this, there is already AmRoutine in RelationData, why
> the need for additional field for just amsupport?
>

We need amsupport in load_relcache_init_file() which reads
"pg_internal.init". I'm not sure this is correct place to call am_handler.
It should work in the case of built-in AM. But if AM is defined in the
extension then we wouldn't be able to do catalog lookup for am_handler on
this stage of initialization.

Another point is that amsupport and amstrategies are used for regression
tests of opclasses and opfamilies. Thus, we probably can keep them in pg_am.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

On 2015-08-10 11:25:37 +0300, Heikki Linnakangas wrote:
> On 08/10/2015 11:17 AM, Andres Freund wrote:
> >On 2015-08-10 07:26:29 +0100, Simon Riggs wrote:
> >>So there is no conflict, but options 2) and 3) are completely redundant if
> >>we go for 5). After investigation, I now think 5) is achievable in 9.6, but
> >>if I am wrong for whatever reason, we have 2) as a backstop.
> >
> >I don't think that's true. You can't ever delete the clog without
> >freezing. There's no need for anti-wraparound scans anymore, but you
> >still need to freeze once.
> 
> What's your definition of freezing? As long as you remove all dead tuples,
> you can just leave the rest in place with their original XID+epoch in place,
> and assume that everything old enough is committed.

Hm. Right. -ENCOFFEE (I really ran out of beans). Sorry for that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expose confirmed_flush for replication slots

2015-08-10 Thread Marko Tiikkaja


On 8/10/15 1:29 PM, Andres Freund wrote:

On 2015-07-08 15:01:15 +0300, Marko Tiikkaja wrote:

+   if (confirmed_flush_lsn != InvalidTransactionId)
+   values[i++] = LSNGetDatum(confirmed_flush_lsn);
+   else
+   nulls[i++] = true;
+


Hm. That comparison is using the wrong datatype, but it appears you only
copied my earlier mistake... Fixed back to 9.4 and in your patch.

Other notes:

* you missed to touch test_decoding's regression test output files.
* None of the docs were touched. catalogs.sgml definitely needs docs
   about the new columns, and I see no reason to leave the examples
   elsewhere stale.


Yeah.  I should've grepped around a bit more.  Sorry about that.


Fixed those and committed it. Thanks for the patch!


Thank you very much!


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Don'st start streaming after creating a slot in pg_receivexlog

On 2015-07-30 17:14:16 +0900, Michael Paquier wrote:
>So, perhaps the attached is more convincing then? It just changes
>--create-slot to leave immediately after creation to address the
>complain of this thread.  -- Michael

Pushed that. Thanks!

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expose confirmed_flush for replication slots

On 2015-07-08 15:01:15 +0300, Marko Tiikkaja wrote:
> + if (confirmed_flush_lsn != InvalidTransactionId)
> + values[i++] = LSNGetDatum(confirmed_flush_lsn);
> + else
> + nulls[i++] = true;
> +

Hm. That comparison is using the wrong datatype, but it appears you only
copied my earlier mistake... Fixed back to 9.4 and in your patch.

Other notes:

* you missed to touch test_decoding's regression test output files.
* None of the docs were touched. catalogs.sgml definitely needs docs
  about the new columns, and I see no reason to leave the examples
  elsewhere stale.

Fixed those and committed it. Thanks for the patch!

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Using contrib modules in check (Re: pgsql: Fix BRIN to use SnapshotAny during summarization)

2015-08-10 Thread Christoph Berg

Re: Tom Lane 2015-08-07 <928.1438900...@sss.pgh.pa.us>
> Alvaro Herrera  writes:
> > Fix BRIN to use SnapshotAny during summarization
> 
> This patch added an isolation test that fails unless contrib/pageinspect
> has been built and installed.  I do not find that acceptable.  It causes
> "make check-world" to fail ... and no, installing the extension during
> make check-world isn't going to make me happier.
> 
> I don't really think we need this isolation test at all, but if we do,
> please fix it to not rely on any extensions.  Perhaps looking at
> pg_relation_size or some such would do?  Or you could just issue
> a query that should use the index, and see if it finds the rows it
> ought to.

Hi,

this breaks the Debian package builds as well because we run
check-world as a build step.

Any chance for a fix/workaround so the nightly master/head builds will
succeed again?

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest remaining "Needs Review" items

2015-08-10 Thread Etsuro Fujita


On 2015/08/10 17:10, Ashutosh Bapat wrote:

On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas mailto:hlinn...@iki.fi>> wrote:
* Join pushdown support for foreign tables



Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but
there hasn't been any activity during this commitfest. What's the
status of this patch? Do you plan to review this now?



There are three patches involved here. I have reviewed one of them. I am
planning to look at them within few days.


Sorry for not having taken any action.

I think we should address the issue dicussed in [1] first because I 
think that would affect the design of join pushdown API in the core. 
I'll submit a patch for that within a few days.  I'd like to review this 
in detail after addressing that, if it's OK.


Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/558a18b3.9050...@lab.ntt.co.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Rework access method interface

On 2015-08-09 23:56, Alexander Korotkov wrote:

Hacker,

some time before I proposed patches implementing CREATE ACCESS METHOD.
http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
As I get from comments to my patches and also from Tom's comment about
AM interface in tablesampling thread – AM interface needs reworking. And
AFAICS AM interface rework is essential to have CREATE ACCESS METHOD
command.
http://www.postgresql.org/message-id/5438.1436740...@sss.pgh.pa.us

Cool, I was planning to take a stab at this myself when I have time, so
I am glad you posted this.

This is why I'd like to show a WIP patch implementing AM interface
rework. Patch is quite dirty yet, but I think it illustrated the idea
quite clear. AM now have single parameter – handler function. This
handler returns pointer to AmRoutine which have the same data as pg_am
had before. Thus, API is organized very like FDW.

I wonder if it would make sense to call it IndexAmRoutine instead in
case we add other AMs (like the sequence am, or maybe column store if
that's done as AM) in the future.

The patch should probably add the am_handler type which should be return
type of the am handler function (see fdw_handler and tsm_handler types).

I also think that the CHECK_PROCEDUREs should be in the place of the
original GET_REL_PROCEDUREs (just after relation check). If the
interface must exist we may as well check for it at the beginning and
not after we did some other work which is useless without the interface.

However, this patch appears to take more work than I expected. It have
to do many changes spread among many files.

Yeah you need to change the definition and I/O handling of every AM
function, but that's to be expected.

Also, it appears not so easy
to hide amsupport into AmRoutine, because it's needed for relcache. As a
temporary solution it's duplicated in RelationData.

I don't understand this, there is already AmRoutine in RelationData, why
the need for additional field for just amsupport?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning

2015-08-10 Thread David Rowley

On 10 August 2015 at 07:50, Tom Lane  wrote:

> I've started to work on path-ification of the upper planner (finally),
> and since that's going to be a large patch in any case, I've been looking
> for pieces that could be bitten off and done separately.  One such piece
> is the fact that SS_finalize_plan (computation of extParams/allParams)
> currently gets run at the end of subquery_planner; as long as that's true,
> we cannot have subquery_planner returning paths rather than concrete
> plans.  The attached patch moves that processing to occur just before
> set_plan_references is run.
>

> Basically what this patch does is to divide what had been done in
> SS_finalize_plan into three steps:
>
> * SS_identify_outer_params determines which outer-query-level Params will
> be available for the current query level to use, and saves that aside in
> a new PlannerInfo field root->outer_params.  This processing turns out
> to be the only reason that SS_finalize_plan had to be called in
> subquery_planner: we have to capture this data before exiting
> subquery_planner because the upper levels' plan_params lists may change
> as they plan other subplans.
>
> * SS_attach_initplans does the work of attaching initplans to the parent
> plan node and adjusting the parent's cost estimate accordingly.
>
> * SS_finalize_plan now *only* does extParam/allParam calculation.
>
> I had to change things around a bit in planagg.c (which was already a
> hack, and the law of conservation of cruft applies).  Otherwise it's
> pretty straightforward and removes some existing hacks.  One notable
> point is that there's no longer an assumption within SS_finalize_plan
> that initPlans can only appear in the top plan node.
>
> Any objections?
>

Great! I'm very interested in this work, specifically around the
grouping_planner() changes too.

I've looked over the patch and from what I understand it seems like a good
solid step in the right direction.

I was digging around the grouping_planner() last week with the intentions
of making some changes there to allow GROUP BY before join, but in the end
decided to stay well clear of this area until this pathification is done.
So far I've managed to keep my changes away from the upper planner and I've
added "GroupingPath" types, which from what I can predict of what you'll be
making changes to, I think you'll also need to have grouping_planner()
return a few variations of "GroupingPath" to allow the paths list to be
passed up to subquery_planner() and on up to functions
like recurse_set_operations() so that they have the option of choosing
GroupAggregate / MergeAppend to implement UNION.

If I'm right on this, then maybe there's a few things you can copy and
paste from the patch I posted here:
http://www.postgresql.org/message-id/CAKJS1f-sEcm=gtfs-dqjsocsz-vlhrp_hsrntjjq-s7egn8...@mail.gmail.com
 specifically around create_grouping_path()?

Kind Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] statement_timeout affects query results fetching?

2015-08-10 Thread Shay Rojansky

Thanks for the explanation Robert, that makes total sense. However, it
seems like the utility of PG's statement_timeout is much more limited than
I thought.

In case you're interested, I dug a little further and it seems that
Microsoft's client for SQL Server implements the following timeout (source

):

cumulative time-out (for all network packets that are read during the
invocation of a method) for all network reads during command execution or
processing of the results. A time-out can still occur after the first row
is returned, and does not include user processing time, only network read
time.

Since it doesn't seem possible to have a clean query-processing-only
timeout at the backend, we may be better off doing something similar to the
above and enforce timeouts on the client only. Any further thoughts on this
would be appreciated.

On Sun, Aug 9, 2015 at 5:21 PM, Robert Haas  wrote:

> On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky  wrote:
> > the entire row in memory (imagine rows with megabyte-sized columns). This
> > makes sense to me; Tom, doesn't the libpq behavior you describe of
> absorbing
> > the result set as fast as possible mean that a lot of memory is wasted on
> > the client side?
>
> It sure does.
>
> > I can definitely appreciate the complexity of changing this behavior. I
> > think that some usage cases (such as mine) would benefit from a timeout
> on
> > the time until the first row is sent, this would allow to put an upper
> cap
> > on stuff like query complexity, for example.
>
> Unfortunately, it would not do any such thing.  It's possible for the
> first row to be returned really really fast and then for an arbitrary
> amount of time to pass and computation to happen before all the rows
> are returned.  A plan can have a startup cost of zero and a run cost
> of a billion (or a trillion).  This kind of scenario isn't even
> particularly uncommon.  You just need a plan that looks like this:
>
> Nested Loop
> -> Nested Loop
>   -> Nested Loop
> -> Seq Scan
> -> Index Scan
>   -> Index Scan
> -> Index Scan
>
> You can just keep pushing more nested loop/index scans on there and
> the first row will still pop out quite fast.  But if the seq-scanned
> table is large, generating the whole result set can take a long, long
> time.
>
> Even worse, you could have a case like this:
>
> SELECT somefunc(a) FROM foo;
>
> Now suppose somefunc is normally very quick, but if a = 42 then it
> does pg_sleep() in a loop until the world ends.   You're going to have
> however many rows of foo have a != 42 pop out near-instantaneously,
> and then it will go into the tank and not come out until the meaning
> of life, the universe, and everything is finally revealed.
>
> That second case is a little extreme, and a little artificial, but the
> point is this: just as you don't want the client to have to buffer the
> results in memory, the server doesn't either.  It's not the case that
> the server computes the rows and sends them to you.  Each one is sent
> to you as it is computed, and in the normal case, at the time the
> first row is sent, only a small percentage of the total work of the
> query has been performed.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang

In case you missed the link to the previous discussion at the bottom,
http://www.postgresql.org/message-id/CAHGQGwEqWD=ynqe+zojbpoxywt3xlk52-v_q9s+xofckjd5...@mail.gmail.com

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang

Hi,

This thread has a pretty thorough discussion of pros and cons of applying
SELECT policy to other commands. Besides what have been mentioned, I think
there is another potential pro: we can enable references to pseudorelations
OLD and NEW in predicates. Now, for UPDATE, the references to the target
table in USING clause are actually references to OLD and the references in
WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK
are combined by AND, so we cannot have predicates like

  foo(NEW) > 1 OR bar(OLD) > 1   (combine predicates referencing OLD
and NEW by an operation other than AND)
  NEW.id <> OLD.id(reference both in the same expression)

If we apply SELECT policy to other commands, we only need one predicate for
INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like
predicates for triggers and rules. For UPDATE and DELETE, the predicate of
SELECT will be applied first (when the target table is scanned) to ensure no
information leakage and their own predicate will be applied later. This
doesn't change much for INSERT and DELETE, but it gives users more
flexibility when they set predicates for UPDATE.

Thanks,
Zhaomo



--
View this message in context: 
http://postgresql.nabble.com/CREATE-POLICY-and-RETURNING-tp5823192p5861550.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote:
> * I think we should drop the "flush" part of this for now. It's not as
> clearly beneficial as the sorting part, and adds a great deal more code
> complexity. And it's orthogonal to the sorting patch, so we can deal with it
> separately.

I don't agree. For one I've seen it cause rather big latency
improvements, and we're horrible at that. But more importantly I think
the requirements of the flush logic influences how exactly the sorting
is done. Splitting them will just make it harder to do the flushing in a
not too big patch.

> * Is it really necessary to parallelize the I/O among tablespaces? I can see
> the point, but I wonder if it makes any difference in practice.

Today it's somewhat common to have databases that are bottlenecked on
write IO and all those writes being done by the checkpointer. If we
suddenly do the writes to individual tablespaces separately and
sequentially we'll be bottlenecked on the peak IO of a single
tablespace.

> * Is there ever any harm in sorting the buffers? The GUC is useful for
> benchmarking, but could we leave it out of the final patch?

Agreed.

> * Do we need to worry about exceeding the 1 GB allocation limit in
> AllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's
> a lot, but it's not totally crazy these days that someone might do that. At
> the very least, we need to lower the maximum of shared_buffers so that you
> can't hit that limit.

We can just use the _huge variant?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Heikki Linnakangas


On 08/10/2015 11:17 AM, Andres Freund wrote:

On 2015-08-10 07:26:29 +0100, Simon Riggs wrote:

On 10 August 2015 at 07:14, Peter Geoghegan  wrote:


On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs 
wrote:

If 5) fails to bring a workable solution by the Jan 2016 CF then we

commit

2) instead.


Is there actually a conflict there? I didn't think so.



I didn't explain myself fully, thank you for asking.

Having a freeze map would be wholly unnecessary if we don't ever need to
freeze whole tables again. Freezing would still be needed on individual
blocks where an old row has been updated or deleted; a freeze map would not
help there either.

So there is no conflict, but options 2) and 3) are completely redundant if
we go for 5). After investigation, I now think 5) is achievable in 9.6, but
if I am wrong for whatever reason, we have 2) as a backstop.


I don't think that's true. You can't ever delete the clog without
freezing. There's no need for anti-wraparound scans anymore, but you
still need to freeze once.


What's your definition of freezing? As long as you remove all dead 
tuples, you can just leave the rest in place with their original 
XID+epoch in place, and assume that everything old enough is committed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing