Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-14 Thread Kyotaro HORIGUCHI
Hello, thank you for the discussion.

At Tue, 1 Apr 2014 11:41:20 -0400, Robert Haas wrote
>> I don't find that very radical at all.  The backup_label file is
>> *supposed* to be removed on the master if it crashes during the
>> backup; and it should never be removed from the backup itself.  At
>> least that's how I understand it.  Unfortunately, people too often

The code indeed seems to assume that, and I couldn't think of any
measure to avoid that dead-end once recovery sequence reads
backup label accidentially left behind. I thought up to remove
backup label during immediate shutdown on prvious versions, like
9.4 does.

CancelBackup does only stat-unlink-rename sequence so I think
this doesn't obstruct immediate shutdown sequence. And this
doesn't change any seeming behavior or interfaces just except for
this case. What do you think about this? Isn't this also
applicable for older versions?

postmaster.c@9.3.3:2339
pmdie(SIGNAL_ARGS)
{
...
switch (postgres_signal_arg)
{
...
case SIGQUIT:
...
SignalUnconnectedWorkers(SIGQUIT);
+
+/*
+ * Terminate exclusive backup mode. This is done in
+ * PostmasterStateMachine() for other shutdown modes.
+ */
+if (ReachedNormalRunning)
+CancelBackup();
ExitPostmaster(0);
break;


Aside from this, I'll post the new option for pg_resetxlog for
the next CF.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-14 Thread Rajeev rastogi
On 14 April 2014 20:10, Simon Riggs wrote:

>>Autonomous Transaction Storage:
>>As for main transaction, structure PGXACT is used to store main transactions, 
>>which are created in shared memory of size:
>>   (Number of process)*sizeof(struct PGXACT)
>>Similarly a new structure will be defined to store autonomous transaction:
>>   Struct PGAutonomousXACT

Oh...I had already added this patch for 2014-June CommitFest, thinking that 
everyone is busy with  work to wrap up 9.4.

> I already proposed exactly this design two years ago and it was rejected at 
> the PgCon hackers meeting.
> I have a better design worked out now and will likely be working on it for 9.5

Can we work together to take this feature to final goal.
May be you can go through my complete patch and see whatever part of the patch 
and related design can be re-used along with your new design.
Also if possible you can share your design (even rough is OK), I will see if I 
can contribute to that in some-way.

Thanks and Regards,
Kumar Rajeev Rastogi



Re: [HACKERS] Minor improvements in alter_table.sgml

2014-04-14 Thread Etsuro Fujita

(2014/04/14 23:53), Robert Haas wrote:

On Fri, Apr 11, 2014 at 5:00 AM, Etsuro Fujita
 wrote:

Attached is an updated version of the patch.


I applied the first two hunks of this, which seem like clear
oversights; and also the bit fixing the constraint_name language.

I think the other changes deserve to be considered separately, and in
particular I'm still not sure it's a good idea to document both OF
type_name and type_name.


OK, Thanks!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] integrate pg_upgrade analyze_new_cluster.sh into vacuumdb

2014-04-14 Thread Peter Eisentraut
Committed, with your suggestions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-14 Thread Jim Nasby

On 4/14/14, 7:43 PM, Stephen Frost wrote:

* Jim Nasby (j...@nasby.net) wrote:

I think it's important to mention that OS implementations (at least all I know of) 
have multiple page pools, each of which has it's own clock. IIRC one of the 
arguments for us supporting a count>1 was we could get the benefits of multiple 
page pools without the overhead. In reality I believe that argument is false, 
because the clocks for each page pool in an OS *run at different rates* based on 
system demands.


They're also maintained in *parallel*, no?  That's something that I've
been talking over with a few folks at various conferences- that we
should consider breaking up shared buffers and then have new backend
processes which work through each pool independently and in parallel.


I suspect that varies based on the OS, but it certainly happens in a separate 
process from user processes. The expectation is that there should always be 
pages on the free list so requests for memory can happen quickly.

http://www.freebsd.org/doc/en/articles/vm-design/freeing-pages.html contains a 
good overview of what FreeBSD does. See 
http://www.freebsd.org/doc/en/articles/vm-design/allen-briggs-qa.html#idp62990256
 as well.


I don't know if multiple buffer pools would be good or bad for Postgres, but I 
do think it's important to remember this difference any time we look at what 
OSes do.


It's my suspicion that the one-big-pool is exactly why we see many cases
where PG performs worse when the pool is more than a few gigs.  Of
course, this is all speculation and proper testing needs to be done..


I think there some critical take-aways from FreeBSD that apply here (in no 
particular order):

1: The system is driven by memory pressure. No pressure means no processing.
2: It sounds like the active list is LFU, not LRU. The cache list is LRU.
3: *The use counter is maintained by a clock.* Because the clock only runs so 
often this means there is no run-away incrementing like we see in Postgres.
4: Once a page is determined to not be active it goes onto a separate list 
depending on whether it's clean or dirty.
5: Dirty pages are only written to maintain a certain clean/dirty ratio and 
again, only when there's actual memory pressure.
6: The system maintains a list of free pages to serve memory requests quickly. 
In fact, lower level functions (ie: 
http://www.leidinger.net/FreeBSD/dox/vm/html/d4/d65/vm__phys_8c_source.html#l00862)
 simply return NULL if they can't find pages on the free list.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-14 Thread Amit Kapila
On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas  wrote:
> On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila  wrote:
>> I have checked that other place in code also check handle to
>> decide if API has failed.  Refer function PGSharedMemoryIsInUse().
>> So I think fix to call GetLastError() after checking handle is okay.
>> Attached patch fixes this issue.  After patch, the server shows below
>> log which is exactly what is expected from test_shm_mq
>
> In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
> way to test for a NULL pointer.  I notice that the !hmap style is used
> throughout this code, so I guess cleaning that up is a matter for a
> separate commit.

I think in that case we might want to cleanup some other similar usage
(PGSharedMemoryCreate) of !hmap.

> For the create case, I'm wondering if we should put the block that
> tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
> is your opinion?

Either way is okay, but I think the way you are suggesting is better as it
will make code consistent with other place (PGSharedMemoryCreate()).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-14 Thread Amit Kapila
On Tue, Apr 15, 2014 at 4:21 AM, Bruce Momjian  wrote:
> On Mon, Apr 14, 2014 at 09:34:14AM +0530, Amit Kapila wrote:
>> The problem can be solved this way, but the only question here is whether
>> it is acceptable for users to have a new console window for server.
>>
>> Can others also please share their opinion if this fix (start server in new
>> console) seems acceptable or shall we try by passing some information
>> from pg_ctl and then ignore CTRL+C && CTRL+BREAK for it?
>
> I wanted to point out that I think this patch is only appropriate for
> head, not backpatching.  Also, on Unix we have to handle signals that
> come from the kill command.  Can you send CTRL+C from other
> applications on Windows?

I think there are ways to achieve it, but might not be very straightforward.
If we start server in new console window through pg_ctl, then we *don't*
need to change any signal handler to catch and do something specific
for CTRL+C/CTRL+BREAK.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-14 Thread Amit Kapila
On Mon, Apr 14, 2014 at 11:46 AM, Christian Ullrich
 wrote:
> * From: Amit Kapila
>> Do you mean to say use some existing environment variable?
>> Introducing an environment variable to solve this issue or infact using
>> some existing environ variable doesn't seem to be the best way to pass
>> such information.
>
> I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set,
> the postmaster etc. would ignore the events.

Do you plan to reset it and if yes when?
I think there is chance that after setting this environment variable, some other
instance of server (postmaster) can read it and missed the signal
which it should
have otherwise processed.

Irrespective of above problem, I think using environment variable might not be a
good way to solve this problem.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

2014-04-14 Thread Andrew Dunstan


On 04/14/2014 10:30 PM, Andrew Dunstan wrote:


On 04/14/2014 10:17 PM, Tom Lane wrote:

Peter Eisentraut  writes:

Add TAP tests for client programs

I assume the buildfarm would need to be taught about this?





Yes. It probably won't be a huge change, but it will need a bit of code.





And it won't work on most or all Windows machines, since they almost 
certainly won't have IPC::Run installed.


That's *very* annoying.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: [BUGS] Debug strategy for musl Postgres?

2014-04-14 Thread John Mudd
On Mon, Apr 14, 2014 at 2:06 PM, Stefan Kaltenbrunner <
ste...@kaltenbrunner.cc> wrote:

> On 04/13/2014 10:19 PM, John Mudd wrote:
> >
> > On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira  > > wrote:
> >
> > On 13-04-2014 00:40, John Mudd wrote:
> > > I built Postgres 9.3.4 from source on top of the musl C library,
> > > http://www.musl-libc.org/
> > > I also built zlib, bzip2, ncurses, openssl, readline and Python
> > using musl
> > > as a foundation for Postgres.
> > >
> > This is not a bug. This kind of discussion belongs to -hackers.
> >
> > While reading this email, I give musl a try. I'm using Debian jessie
> > which contains musl 1.0.0. I compiled the source (git master) using
> > CC="musl-gcc" and disabled zlib and readline. It passed all
> regression
> > tests. I also tried a pgbench which ran like a charm. (After
> installed
> > the binaries I had to set the libray path for musl in
> > /etc/ld-musl-x86_64.d.)
> >
> > > I'm using musl to increase the portability of the Postgres binary.
> > I build
> > > on Ubuntu 13.10 but will runs on older Linux boxes.
> > >
> > Could you give details about your architecture?
> >
> >
> > Built on 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013
> > i686 i686 i686 GNU/Linux
> > Runs fine there.
> >
> > Moved postgres install directory to  2.4.21-4.EL #1 Fri Oct 3 18:13:58
> > EDT 2003 i686 i686 i386 GNU/Linux
> > Not working fully there.
> > Note: It's says 2.4 kernel but I've been told that's misleading. The
> > kernel has upgrades that make it effectively 2.6.
>
> This looks like a RHEL3 version number, and while that kernel was kind
> of creepy thing with a lot of patches (also from the 2.6 era) backport
> it is definititly not a 2.6 kernel(also note that 2.6.0 was released in
> december of 2003 while RHEL 3 was released in october that year. Juding
> from the version number this also seems to be based on the very first
> RHEL3 kernel missing all follow up bugfixed during the RHEL3 lifetime.
>
> So I would be very much not surprised if a modern and young C-library
> running on a >10 year old kernel that never looked like the upstream
> kernel misbehaved with a complex userspace app like postgresql.
>
>
Update:
I contacted musl developers, received a one line patch to the
gettimeofday() fallback code, rebuilt the musl libc, copied the lib to my
old linux box and Postgres is running well now.

===
 All 136 tests passed.
===

It's interesting that when I built Postgres on this same old Linux but it
fails to run.

== removing existing temp installation==
== creating temporary installation==
== initializing database system   ==
== starting postmaster==

pg_regress: postmaster did not respond within 60 seconds

Building with musl on a modern Linux works on an old Linux. But building
Postgres on the old Linux with the native libc gives me a broken Postgres.
That's why I'm interested in musl libc.


>
>
> Stefan
>


Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

2014-04-14 Thread Andrew Dunstan


On 04/14/2014 10:17 PM, Tom Lane wrote:

Peter Eisentraut  writes:

Add TAP tests for client programs

I assume the buildfarm would need to be taught about this?





Yes. It probably won't be a huge change, but it will need a bit of code.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSONB in-place updates?

2014-04-14 Thread Michael Paquier
On Tue, Apr 15, 2014 at 5:16 AM, Josh Berkus  wrote:
> On 04/14/2014 09:27 AM, kelas wrote:
>> Are there any plans to add "in-place at-depth" update operator for JSONB
>> type, e.g.:
>>
>> UPDATE test SET attrs->'anwser' = 42 where attrs->'answer' = 41
>>
>
> Plans, yes.  But not until 9.5, or maybe as an extension.
A custom function of the type find_and_modify(jsonb_data, key_name,
new_value) combined with the uniqueness of jsonb keys would be a good
workaround for now.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan


On 04/14/2014 10:02 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


and here the stack trace:

#0  0x00361ba36285 in __GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00361ba37b9b in __GI_abort () at abort.c:91
#2  0x0075c157 in ExceptionalCondition
(conditionName=, errorType=,
fileName=, lineNumber=)
 at /home/andrew/pgl/pg_9_3/src/backend/utils/error/assert.c:54
#3  0x0048c2af in MultiXactIdGetUpdateXid (xmax=, t_infomask=) at
/home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c:5873
#4  0x0078ad50 in HeapTupleSatisfiesMVCC
(tuple=0x7feb3be8a790, snapshot=0x1025d70, buffer=2172) at
/home/andrew/pgl/pg_9_3/src/backend/utils/time/tqual.c:1221

Clearly this is a bug related to multixacts and the related tqual.c
changse.  Will look.





OK. If you need me to test patches I can do so.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I've uploaded the latest patch, rebased against master, with my changes
> > to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz as I don't
> > believe it'd clear the mailing list (it's 29k).
> 
> Please actually post it, for the archives' sake.  29k is far below the
> list limit.  (Which I don't know exactly what it is ... but certainly
> in the hundreds of KB.)

Huh, thought it was more like 25k.  Well, here goes then...

> > I'll take a look at changing the cache key to include user ID and
> > ripping out the plan invalidation logic from the current patch tomorrow
> > but I seriously doubt I'll be able to get all of that done in the next
> > day or two.
> 
> TBH I think we are up against the deadline.  April 15 was the agreed-to
> drop dead date for pushing new features into 9.4.

Yeah. :/  May be for the best anyway, this should be able to go in early
in the 9.5 cycle and get more testing and refinement.  Still stinks
though as I feel like this patch didn't get the attention it should have
due to a simple misunderstanding, but we do need to stop at some point
to get a release together.

Thanks,

Stephen


rls_ringerc_sf.patch.gz
Description: Binary data


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

2014-04-14 Thread Tom Lane
Peter Eisentraut  writes:
> Add TAP tests for client programs

I assume the buildfarm would need to be taught about this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-14 Thread Tom Lane
Stephen Frost  writes:
> I've uploaded the latest patch, rebased against master, with my changes
> to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz as I don't
> believe it'd clear the mailing list (it's 29k).

Please actually post it, for the archives' sake.  29k is far below the
list limit.  (Which I don't know exactly what it is ... but certainly
in the hundreds of KB.)

> I'll take a look at changing the cache key to include user ID and
> ripping out the plan invalidation logic from the current patch tomorrow
> but I seriously doubt I'll be able to get all of that done in the next
> day or two.

TBH I think we are up against the deadline.  April 15 was the agreed-to
drop dead date for pushing new features into 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-14 Thread Bruce Momjian
On Mon, Apr 14, 2014 at 05:45:56PM -0700, Peter Geoghegan wrote:
> On Mon, Apr 14, 2014 at 5:30 PM, Bruce Momjian  wrote:
> > I am glad you are looking at this.  You are right that it requires a
> > huge amount of testing, but clearly our code needs improvement in this
> > area.
> 
> Thanks.
> 
> Does anyone recall the original justification for the recommendation
> that shared_buffers never exceed 8GiB? I'd like to revisit the test
> case, if such a thing exists.

I have understood it be that the overhead of managing over 1 million
buffers is too large if you aren't accessing more than 8GB of data in a
five-minute period.  If are accessing that much, it might be possible to
have a win over 8GB.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-04-14 Thread Stephen Frost
Craig, Tom, all,

I've been through the RLS code over the past couple of days which I
pulled from Craig's repo and have a bunch of minor updates.  In general,
the patch seems pretty reasonable- except for the issues discussed
below.  Quite a bit of this patch is tied up in plan invalidation and
tracking if the security quals depend on the current user, all of which
seems pretty grotty and the wrong way around to me.

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Craig Ringer  writes:
> > It's only that the plan depends on the user ID. There's no point keeping
> > the plan around if the user no longer exists.
> 
> [ shrug... ]  Leaving such a plan cached would be harmless, though.

Agreed.

> Furthermore, the user ID we'd be talking about is either the owner
> of the current session, or the owner of some view or security-definer
> function that the plan is already dependent on, so it's fairly hard
> to credit that the plan would survive long enough for the issue to
> arise.

I don't entirely follow which 'issue' is being referred to here, but we
need to consider that 'set role' changes should also cause a new plan.

> Even if there is a scenario where invalidating by user ID is actually
> useful, I think adding infrastructure to cause invalidation in such a case
> is optimizing for the wrong thing.  You're adding cycles to every query to
> benefit a case that is going to be quite infrequent in practice.

Yeah, I have a hard time seeing that there's an issue w/ keeping the
cached plans around even if the session never goes back to being under
the user ID for which those older plans were built.  Also, wouldn't a
'RESET ALL' clear any of them anyway?

> > Yes, that would be good, but is IMO more of a separate optimization. I'm
> > currently using KaiGai's code to invalidate and re-plan when a user ID
> > change is detected.
> 
> I'm unlikely to accept a patch that does that; wouldn't it be catastrophic
> for performance in the presence of security-definer functions?  You can't
> just trash the whole plan cache when a user ID switch occurs.

Yeah, this doesn't seem like the right approach.  Adding the user ID to
the cache key definitely strikes me as the right way to fix this.

I've uploaded the latest patch, rebased against master, with my changes
to here: http://snowman.net/~sfrost/rls_ringerc_sf.patch.gz as I don't
believe it'd clear the mailing list (it's 29k).

I'll take a look at changing the cache key to include user ID and
ripping out the plan invalidation logic from the current patch tomorrow
but I seriously doubt I'll be able to get all of that done in the next
day or two.  If anyone else is able to help out, it'd certainly be
appreciated; I really think that's the main hurdle to address at this
point with this patch- without the plan invalidation complexity, the
the patch is really just building out the catalog, the SQL-level
statements for managing it, and the bit of code required to add the
conditional to statements involving RLS-enabled tables.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Alvaro Herrera
Andrew Dunstan wrote:

> and here the stack trace:
> 
>#0  0x00361ba36285 in __GI_raise (sig=6) at
>../nptl/sysdeps/unix/sysv/linux/raise.c:64
>#1  0x00361ba37b9b in __GI_abort () at abort.c:91
>#2  0x0075c157 in ExceptionalCondition
>(conditionName=, errorType=,
>fileName=, lineNumber=)
> at /home/andrew/pgl/pg_9_3/src/backend/utils/error/assert.c:54
>#3  0x0048c2af in MultiXactIdGetUpdateXid (xmax=out>, t_infomask=) at
>/home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c:5873
>#4  0x0078ad50 in HeapTupleSatisfiesMVCC
>(tuple=0x7feb3be8a790, snapshot=0x1025d70, buffer=2172) at
>/home/andrew/pgl/pg_9_3/src/backend/utils/time/tqual.c:1221

Clearly this is a bug related to multixacts and the related tqual.c
changse.  Will look.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tests for client programs

2014-04-14 Thread Peter Eisentraut
On 4/4/14, 10:44 AM, Andres Freund wrote:
> I personally would very much like to get this patch commited. It doesn't
> have much risk in destabilizing stuff, rather the contrary.
> 
> Peter, what's you opinion about the current state?

I opine it's committed. ;-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL hang on FreeBSD,with CFLAGS='-O2 -pthread' workaround

2014-04-14 Thread Tom Lane
Jov  writes:
> I find some problems when use pg on FreeBSD.On FreeBSD,If installed
> extension which pthread  lib is used,for example plv8,pljava,imcs etc,when
> query touch these extenstions,the PG backend will hang.

> there is a solution,which configure postgresql with CFLAGS='-O2 -pthread'
> and compile pg from source ,then install the extension.But this solution is
> not offical documented and not easy to use(you must compile pg from
> source).It  may make some extension developers or user waste much time to
> solve it,and make people complain that PG or FreeBSD not stable.

> Is it a PG problem and can we fix it?

Sounds more like a bug with FreeBSD's libc and/or libpthread.

In general, we do not and will not support any third-party code that
causes multiple threads to become instantiated inside a PG backend.
So long as these libraries aren't actually forking new threads, you'd
think the presence of libpthread in them would be a non-issue.  It
sounds like the mere presence of libpthread may be changing behavior
somewhere, which I would argue to be a bug in either libpthread or
whatever is changing behavior.

As for building PG itself with -pthread: not bloody likely.  That
would send *entirely* the wrong message about whether threading
is permitted.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan


On 04/14/2014 09:28 PM, Andrew Dunstan wrote:


With a client's code I have just managed to produce the following 
assertion failure on 9.3.4:


   2014-04-15 01:02:46 GMT [19854] 76299: LOG:  execute :
   select * from "asp_ins_event_task_log"( job_id:=1, event_id:=3164,
   task_name:='EventUtcComputeTask', task_status_code:='VALID'
   , task_start_utc:='04/15/2014 01:02:44.563',
   task_end_utc:='04/15/2014 01:02:44.563')
   TRAP: FailedAssertion("!(update_xact == ((TransactionId) 0))", File:
   "/home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c", Line: 
5873)

   2014-04-15 01:02:46 GMT [11959] 11: LOG:  server process (PID 19854)
   was terminated by signal 6: Aborted
   2014-04-15 01:02:46 GMT [11959] 12: DETAIL:  Failed process was
   running: select * from "asp_ins_event_task_log"( job_id:=1,
   event_id:=3164, task_name:='EventUtcComputeTask', task_status_code
   :='VALID', task_start_utc:='04/15/2014 01:02:44.563',
   task_end_utc:='04/15/2014 01:02:44.563')
   2014-04-15 01:02:46 GMT [11959] 13: LOG:  terminating any other
   active server processes


When running without assertions, the client reports experiencing 
tables with duplicate primary keys among other things. It's apparently 
quite reproducible.


I'm digging into this, but it's a nasty bug and any useful thoughts 
would be appreciated.



and here the stack trace:

   #0  0x00361ba36285 in __GI_raise (sig=6) at
   ../nptl/sysdeps/unix/sysv/linux/raise.c:64
   #1  0x00361ba37b9b in __GI_abort () at abort.c:91
   #2  0x0075c157 in ExceptionalCondition
   (conditionName=, errorType=,
   fileName=, lineNumber=)
at /home/andrew/pgl/pg_9_3/src/backend/utils/error/assert.c:54
   #3  0x0048c2af in MultiXactIdGetUpdateXid (xmax=, t_infomask=) at
   /home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c:5873
   #4  0x0078ad50 in HeapTupleSatisfiesMVCC
   (tuple=0x7feb3be8a790, snapshot=0x1025d70, buffer=2172) at
   /home/andrew/pgl/pg_9_3/src/backend/utils/time/tqual.c:1221
   #5  0x0048aad2 in heapgetpage (scan=0x11522f0, page=6) at
   /home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c:397
   #6  0x0048aee9 in heapgettup_pagemode (scan=0x11522f0,
   dir=, nkeys=0, key=0x0) at
   /home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c:939
   #7  0x0048d646 in heap_getnext (scan=0x11522f0,
   direction=) at
   /home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c:1459
   #8  0x005becab in SeqNext (node=) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/nodeSeqscan.c:66
   #9  0x005ac66e in ExecScanFetch (recheckMtd=0x5bec70
   , accessMtd=0x5bec80 , node=0x1151488) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/execScan.c:82
   #10 ExecScan (node=0x1151488, accessMtd=0x5bec80 ,
   recheckMtd=0x5bec70 ) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/execScan.c:167
   #11 0x005a5338 in ExecProcNode (node=0x1151488) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/execProcnode.c:400
   #12 0x005b9bcf in ExecLockRows (node=0x1151278) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/nodeLockRows.c:56
   #13 0x005a51d0 in ExecProcNode (node=0x1151278) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/execProcnode.c:496
   #14 0x005a250a in ExecutePlan (dest=0xb952e0,
   direction=, numberTuples=1, sendTuples=1 '\001',
   operation=CMD_SELECT, planstate=0x1151278, estate=0x1151038)
at /home/andrew/pgl/pg_9_3/src/backend/executor/execMain.c:1473
   #15 standard_ExecutorRun (queryDesc=0x114f230, direction=, count=1) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/execMain.c:307
   #16 0x7feb5cf2f27d in explain_ExecutorRun (queryDesc=0x114f230,
   direction=ForwardScanDirection, count=1) at
   /home/andrew/pgl/pg_9_3/contrib/auto_explain/auto_explain.c:233
   #17 0x7feb5cd2a235 in pgss_ExecutorRun (queryDesc=0x114f230,
   direction=ForwardScanDirection, count=1) at
   /home/andrew/pgl/pg_9_3/contrib/pg_stat_statements/pg_stat_statements.c:717
   #18 0x005c8b97 in _SPI_pquery (tcount=1, fire_triggers=0
   '\000', queryDesc=) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/spi.c:2369
   #19 _SPI_execute_plan (plan=0x10afac8, paramLI=0x114f028,
   snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\000',
   fire_triggers=0 '\000', tcount=1)
at /home/andrew/pgl/pg_9_3/src/backend/executor/spi.c:2157
   #20 0x005c8fd1 in SPI_execute_snapshot (plan=0x10afac8,
   Values=0x7fff147d8500, Nulls=0x7fff147d8700 " ", snapshot=0x0,
   crosscheck_snapshot=0x0, read_only=0 '\000',
fire_triggers=0 '\000', tcount=1) at
   /home/andrew/pgl/pg_9_3/src/backend/executor/spi.c:488
   #21 0x00722635 in ri_PerformCheck (riinfo=0xf98168,
   qkey=0x7fff147d8a20, qplan=0x10afac8, fk_rel=0x7feb39048048,
   pk_rel=0x7feb38ff9b40, old_tuple=0x0, new_tuple=0x7fff147d9010,
detectNewRows=0 '\000', expect_OK=5) at
   /home/andrew/pgl/pg_9_3/src/backend/utils/adt/ri_triggers.

Re: [HACKERS] [GENERAL] CLOB & BLOB limitations in PostgreSQL

2014-04-14 Thread Tom Lane
Bruce Momjian  writes:
> Uh, I had not thought of this before but I think we need oids for toast
> storage, which would explain this wiki text:

>   https://wiki.postgresql.org/wiki/BinaryFilesInDB

>   Storing binary data using bytea or text data types 

>   Minus

>   bytea and text data type both use TOAST
>   limited to 1G per entry
>   --> 4 Billion entries per table 

> Is that correct?

No.  It'd be 4 billion toasted-out-of-line entries per table (actually,
you'd start hitting performance issues well below that, but 4G would be
the hard limit).  Small values, up to probably a KB or so, don't count
against the limit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] CLOB & BLOB limitations in PostgreSQL

2014-04-14 Thread David G Johnston
Bruce Momjian wrote
> On Mon, Apr 14, 2014 at 02:01:19PM +0200, Ivan Voras wrote:
>> On 11/04/2014 16:45, Jack.O'

> Sullivan@

>  wrote:
>> 
>> > With point two, does this mean that any table with a bytea datatype is
>> > limited to 4 billion rows (which would seem in conflict with the
>> > "unlimited rows" shown by http://www.postgresql.org/about)? If we had
>> > rows where the bytea was a "null" entry would they contribute towards
>> > this total or is it 4 billion non-null entries?
>> 
>> This seems strange. A core developer should confirm this but it doesn't
>> make much sense - "bytea" fields are stored the same as "text" fields
>> (including varchar etc), i.e. the "varlena" internal representation, so
>> having the limit you are talking about would mean that any non-trivial
>> table with long-ish text fields would be limited to 2^32 entries...
> 
> [ moved to hackers ]
> 
> Uh, I had not thought of this before but I think we need oids for toast
> storage, which would explain this wiki text:
> 
>   https://wiki.postgresql.org/wiki/BinaryFilesInDB
> 
>   Storing binary data using bytea or text data types 
>   
>   Minus
>   
>   bytea and text data type both use TOAST
>   limited to 1G per entry
>   --> 4 Billion entries per table 
> 
> 
> Is that correct?

Reading only http://www.postgresql.org/docs/9.3/static/storage-toast.html
...

Since only actual out-of-line values require chunk_id (an OID) the number of
main table rows has a minimum but not a maximum.  However, the minimum would
appear to be "2^32 / {# of toast-able columns }" - each table can only have
one "pg_class.reltoastrelid" so all toast-able columns on that table pull
from the same OID pool.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-GENERAL-CLOB-BLOB-limitations-in-PostgreSQL-tp5800032p5800037.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan


With a client's code I have just managed to produce the following 
assertion failure on 9.3.4:


   2014-04-15 01:02:46 GMT [19854] 76299: LOG:  execute :
   select * from "asp_ins_event_task_log"( job_id:=1, event_id:=3164,
   task_name:='EventUtcComputeTask', task_status_code:='VALID'
   , task_start_utc:='04/15/2014 01:02:44.563',
   task_end_utc:='04/15/2014 01:02:44.563')
   TRAP: FailedAssertion("!(update_xact == ((TransactionId) 0))", File:
   "/home/andrew/pgl/pg_9_3/src/backend/access/heap/heapam.c", Line: 5873)
   2014-04-15 01:02:46 GMT [11959] 11: LOG:  server process (PID 19854)
   was terminated by signal 6: Aborted
   2014-04-15 01:02:46 GMT [11959] 12: DETAIL:  Failed process was
   running: select * from "asp_ins_event_task_log"( job_id:=1,
   event_id:=3164, task_name:='EventUtcComputeTask', task_status_code
   :='VALID', task_start_utc:='04/15/2014 01:02:44.563',
   task_end_utc:='04/15/2014 01:02:44.563')
   2014-04-15 01:02:46 GMT [11959] 13: LOG:  terminating any other
   active server processes


When running without assertions, the client reports experiencing tables 
with duplicate primary keys among other things. It's apparently quite 
reproducible.


I'm digging into this, but it's a nasty bug and any useful thoughts 
would be appreciated.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 05:40 PM, Stephen Frost wrote:
> This sounds like a great example of the unlogged table -> logged
> table use-case and makes me wonder if we could provide an
> optimization similar to the existing CREATE TABLE + COPY under
> wal_level = minimal case, where we wouldn't WAL log anything for
> CREATE TABLE + COPY even when wal_level is above minimal, until
> COMMIT, at which point we'll blast the whole thing out in one
> shot.

I was definitely thinking that it would be nice if we could alter this
table to make it unlogged, load it, and then alter again to make it
logged. Obviously easier said than done though ;-)

> Another option that you might consider is ordering your input, if 
> possible, to improve the chances that the same page is changed
> multiple times inside a given checkpoint, hopefully reducing the
> number of pages changed.

Nice idea but doesn't work well when you have two indexes with
different first column correlations.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTIj5AAoJEDfy90M199hlJ5AP/RQBGkdOpcVzFsZOmj0keN5z
SB7BmyX8qzzL9C8xNirHektyJ0QvFFuUIMN3vTC9hvQJQjoJEn8FO3GMlvmNlK4J
edFDEQVWj3e1etQhEnGJlDeALAqWc9emwOVvCsdllONq0KWMYIKW8ex5i921W3ya
1yzCc9fHuihCBb3UYakkEKoKgpJ4JkLPA3Ak9cEcHZOUWp+JEudiUpD12OgKqyZ7
HwmW5clpQWLwyIZtx9e+/5psjDTytBNIYb1xpF4XuSv1CqbL8UEynNs5KjgDdziY
0w+DOF+xe1gWCj4Qou/c466YybFeo68YU5O1YSydfl/e0IxX7AwYTu+LfiUCUutP
0SNhrpPksoZQQ9hwlYcJLDskl8AAlUbTmSGfU7rvfZq1FDe5AramBkGUhQkwigIn
HXpZ2CWLMvive9NwXM2s69Rsnb51lu9LQ9ewmbDY9mPsdOye1WEy89xr3JG624UV
CuOQAzCmPI0HXJsPuDNdqJY9Hgk3Prypq3viBbfmQzDCIc+v1dT3S9SELOtERnZp
HejpcXoSck6A3SrTRPAe1F8kx5ssiJF83Tnc6kdiHndzK2vN85RXHIAA1DX4lKpF
oS7+l1y6Gx1Y1K3Tru87+P3j6CNHcdaca2ZdzMlPyEcuZ08/M949SSQkoILNrHAA
BtB+B/XiJE8Sn9cBhG1J
=7qHf
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL hang on FreeBSD,with CFLAGS='-O2 -pthread' workaround

2014-04-14 Thread Jov
hi~ pg hackers,
I find some problems when use pg on FreeBSD.On FreeBSD,If installed
extension which pthread  lib is used,for example plv8,pljava,imcs etc,when
query touch these extenstions,the PG backend will hang.

there is a solution,which configure postgresql with CFLAGS='-O2 -pthread'
and compile pg from source ,then install the extension.But this solution is
not offical documented and not easy to use(you must compile pg from
source).It  may make some extension developers or user waste much time to
solve it,and make people complain that PG or FreeBSD not stable.

Is it a PG problem and can we fix it?

I will also report this to the FreeBSD mail list.

ref:
http://www.postgresql.org/message-id/534785d2.6050...@matrix.gatewaynet.com

https://github.com/knizhnik/imcs/issues/25

http://code.google.com/p/plv8js/issues/detail?id=34

Jov
blog: http:amutu.com/blog 


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-14 Thread Kouhei Kaigai
> Simon Riggs  writes:
> > [ assorted comments about custom-scan patch, but particularly ]
> 
> > * The prune hook makes me feel very uneasy. It seems weirdly specific
> > implementation detail, made stranger by the otherwise lack of data
> > maintenance API calls. Calling that for every dirty page sounds like
> > an issue and my patch rejection indicator is flashing red around that.
> 
> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
> seems to me that my original fears are largely confirmed: the custom scan
> patch is not going to be sufficient to allow development of any truly new
> plan type.  Yeah, you can plug in some new execution node types, but actually
> doing anything interesting is going to require patching other parts of the
> system.  Are we going to say to all comers, "sure, we'll put a hook call
> anywhere you like, just ask"?  I can't see this as being the way to go.
> 
Here is two different points to be discussed; one is generic to the custom-
plan API, and other is specific to my cache-only scan implementation.

Because existing plan/exec nodes are all built-in and some functional stuffs
are consolidated to a particular source file (like createplan.c, setrefs.c),
so it does not make problems if commonly called functions are declared as
static functions.
Custom-plan API changes this assumption, in other words, it allows to have
some portion of jobs in createplan.c or setrefs.c externally, so it needs
to have the commonly used functions being external.
Because I had try & error during development, I could not list up all the
functions to be public at once. However, it is not a fundamental matter,
should be solved during the discussion on pgsql-hackers.

Regarding to the specific portion in the cache-only scan, it may happen
if we want to create an extension that tracks vacuuming, independent from
custom-scan.
Usually, extension utilizes multiple hooks and interfaces to implement
the feature they want to do. In case of cache-only scan, unfortunately,
PG lacks a way to track heap vacuuming even though it needed to invalidate
cached data. It is unrelated issue from the custom-scan API. We may see
same problem if I tried to create an extension to count number of records
being vacuumed.

> Another way of describing the problem is that it's not clear where the API
> boundaries are for potential users of a custom-scan feature.  (Simon said
> several things that are closely related to this point.)  One thing I don't
> like at all about the patch is its willingness to turn anything whatsoever
> into a publicly exported function, which basically says that the design
> attitude is there *are* no boundaries.  But that's not going to lead to
> anything maintainable.  We're certainly not going to want to guarantee that
> these suddenly-exported functions will all now have stable APIs
> forevermore.
> 
I'd like to have *several* existing static functions as (almost) stable
APIs, but not all. Indeed, my patch randomly might pick up static functions
to redefine as external functions, however, it does not mean custom-plan
eventually requires all the functions being external.
According to my investigation, here is two types of functions to be exposed.
- A function that walks on plan/exec node tree recursively
  (Eg: create_plan_recurse)
- A function that adjusts internal state of the core backend
  (Eg: fix_expr_common)

At least, these functions are not majority. I don't think it should be
a strong blocker of this new feature.
(I may have oversights of course, please point out.)

> Overall I concur with Simon's conclusion that this might be of interest
> for R&D purposes, but it's hard to see anyone wanting to support a production
> feature built on this.  It would be only marginally less painful than
> supporting a patch that just adds the equivalent code to the backend in
> the traditional way.
> 
As we adjusted FDW APIs through the first several releases, in general,
any kind of interfaces takes time to stabilize. Even though it *initially*
sticks on R&D purpose (I don't deny), it shall be brushed up to production
stage. I think a feature for R&D purpose is a good start-point.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] CLOB & BLOB limitations in PostgreSQL

2014-04-14 Thread Bruce Momjian
On Mon, Apr 14, 2014 at 02:01:19PM +0200, Ivan Voras wrote:
> On 11/04/2014 16:45, Jack.O'sulli...@tessella.com wrote:
> 
> > With point two, does this mean that any table with a bytea datatype is
> > limited to 4 billion rows (which would seem in conflict with the
> > "unlimited rows" shown by http://www.postgresql.org/about)? If we had
> > rows where the bytea was a "null" entry would they contribute towards
> > this total or is it 4 billion non-null entries?
> 
> This seems strange. A core developer should confirm this but it doesn't
> make much sense - "bytea" fields are stored the same as "text" fields
> (including varchar etc), i.e. the "varlena" internal representation, so
> having the limit you are talking about would mean that any non-trivial
> table with long-ish text fields would be limited to 2^32 entries...

[ moved to hackers ]

Uh, I had not thought of this before but I think we need oids for toast
storage, which would explain this wiki text:

https://wiki.postgresql.org/wiki/BinaryFilesInDB

Storing binary data using bytea or text data types 

Minus

bytea and text data type both use TOAST
limited to 1G per entry
--> 4 Billion entries per table 


Is that correct?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-14 Thread Peter Geoghegan
On Mon, Apr 14, 2014 at 5:30 PM, Bruce Momjian  wrote:
> I am glad you are looking at this.  You are right that it requires a
> huge amount of testing, but clearly our code needs improvement in this
> area.

Thanks.

Does anyone recall the original justification for the recommendation
that shared_buffers never exceed 8GiB? I'd like to revisit the test
case, if such a thing exists.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-14 Thread Stephen Frost
* Jim Nasby (j...@nasby.net) wrote:
> I think it's important to mention that OS implementations (at least all I 
> know of) have multiple page pools, each of which has it's own clock. IIRC one 
> of the arguments for us supporting a count>1 was we could get the benefits of 
> multiple page pools without the overhead. In reality I believe that argument 
> is false, because the clocks for each page pool in an OS *run at different 
> rates* based on system demands.

They're also maintained in *parallel*, no?  That's something that I've
been talking over with a few folks at various conferences- that we
should consider breaking up shared buffers and then have new backend
processes which work through each pool independently and in parallel.

> I don't know if multiple buffer pools would be good or bad for Postgres, but 
> I do think it's important to remember this difference any time we look at 
> what OSes do.

It's my suspicion that the one-big-pool is exactly why we see many cases
where PG performs worse when the pool is more than a few gigs.  Of
course, this is all speculation and proper testing needs to be done..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Stephen Frost
* Joe Conway (m...@joeconway.com) wrote:
> That's the thing. I'm sure there is tuning and other things to improve
> this particular case, but creating over 20 times as much WAL as real
> data seems like pathological behavior to me.

Setting things up such that you are updating a single value on each page
in an index during each checkpoint, which then requires basically
rewriting the entire index as full page writes at checkpoint time, is
definitely pathological behavior- but sadly, not behavior we are likely
to be able to fix..

This sounds like a great example of the unlogged table -> logged table
use-case and makes me wonder if we could provide an optimization similar
to the existing CREATE TABLE + COPY under wal_level = minimal case,
where we wouldn't WAL log anything for CREATE TABLE + COPY even when
wal_level is above minimal, until COMMIT, at which point we'll blast the
whole thing out in one shot.

Another option that you might consider is ordering your input, if
possible, to improve the chances that the same page is changed multiple
times inside a given checkpoint, hopefully reducing the number of pages
changed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-14 Thread Bruce Momjian
On Mon, Apr 14, 2014 at 10:11:53AM -0700, Peter Geoghegan wrote:
> Has anyone thought about this in the last few years? I know that Tom
> examined the LRU-K paper back in 2000 [5], but was discouraged by some
> kind of contention or CPU overhead (although he did say he intended to
> revisit the question). Obviously a lot has changed in the past 14
> years, particularly with regard to CPU characteristics.

I am glad you are looking at this.  You are right that it requires a
huge amount of testing, but clearly our code needs improvement in this
area.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-14 Thread Kouhei Kaigai
> On 24 March 2014 10:25, Kouhei Kaigai  wrote:
> 
> > Brief summary of the current approach that has been revised from my
> > original submission through the discussion on pgsql-hackers:
> >
> > The plannode was renamed to CustomPlan, instead of CustomScan, because
> > it dropped all the hardcoded portion that assumes the custom-node
> > shall perform as alternative scan or join method, because it prevents
> > this custom-node to perform as other stuff; like sort or append
> potentially.
> > According to the suggestion by Tom, I put a structure that contains
> > several function pointers on the new CustomPlan node, and extension
> > will allocate an object that extends CustomPlan node.
> > It looks like polymorphism in object oriented programming language.
> > The core backend knows abstracted set of methods defined in the tables
> > of function pointers, and extension can implement its own logic on the
> > callback, using private state on the extended object.
> 
> I just wanted to add some review comments here. I also apologise for not
> reviewing this earlier; I had misunderstood the maturity of the patch and
> had assumed it was a request for comments/WIP.
> 
Thanks for your interest and many comments.

> Overall, I very much support the concept of providing for alternate scans.
> I like the placement of calls in the optimizer and we'll be able to do much
> with that. Other comments in order that I consider them important.
> 
> * There is no declarative structure for this at all. I was expecting to
> see a way to declare that a specific table might have an alternate scan
> path, but we just call the plugin always and it has to separately make a
> cache lookup to see if anything extra is needed. The Index AM allows us
> to perform scans, yet indexes are very clearly declared and easily and
> clearly identifiable. We need the same thing for alternate plans.
> 
> * There is no mechanism at all for maintaining other data structures.
> Are we supposed to use the Index AM? Triggers? Or? The lack of clarity there
> is disturbing, though I could be simply missing something big and obvious.
> 
> * There is no catalog support. Complex APIs in Postgres typically have a
> structure like pg_am which allows the features to be clearly identified.
> I'd be worried about our ability to keep track of so many calls in such
> pivotal places without that. I want to be able to know what a plugin is
> doing, especially when it will likely come in binary form. I don't see an
> easy way to have plugins partially override each other or work together.
> What happens when I want to use Mr.X's clever new join plugin at the same
> time as Mr.Y's GPU accelerator?
> 
It was a choice on implementation. I just followed usual PG's hook manner;
that expects loaded extension saves the original function pointer and
has secondary call towards the function on its invocation. Thus, it needs
to walk on chain of extensions if multiple custom providers are loaded.

Even though I initially chose this design, it is an option to have catalog
support to track registered custom-scan providers and its metadata; what
function generate paths, what flags are turned on or what kind of relations
are supported...etc. Probably, optimizer skip some extensions that don't
support the target relation obviously.


> * How do we control security? What stops the Custom Scan API from overriding
> privileges? Shouldn't the alternate data structures be recognised as
> objects so we can grant privileges? Or do we simply say if an alternate
> data structure is linked to a heap then has implied privileges. It would
> be a shame to implement better security in one patch and then ignore it
> in another (from the same author).
> 
In general, we have no mechanism to prevent overriding privilege mechanism
by c-binary extensions. Extension can override (existing) hooks and modify
requiredPerms bits of RangeTblEntry; that eventually cause privilege bypass.
But it is neutral for custom-scan API itself. Even though we implements
an alternative scan logic on the API, the core backend still checks required
privileges on ExecCheckRTPerms being called on the head of executor (unless
author of extension does not do something strange).


> All of the above I can let pass in this release, but in the longer term
> we need to look for more structure around these ideas so we can manage and
> control what happens. The way this is now is quite raw - suitable for R&D
> but not for longer term production usage by a wider audience, IMHO. I
> wouldn't like to make commitments about the longevity of this API either;
> if we accept it, it should have a big "may change radically" sign hanging
> on it. Having said that, I am interested in progress here and I accept that
> things will look like this at this stage of the ideas process, so these
> are not things to cause delay.
> 
> Some things I would like to see change on in this release are...
> 
> * It's not clear to me when we load/create the alte

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 04:25 PM, Andres Freund wrote:
> On 2014-04-14 16:22:48 -0700, Joe Conway wrote:
>> That'll help performance, but lets say I generally keep WAL files
>> for PITR and don't turn that off before starting -- shouldn't I
>> be very surprised to need over 3TB of archive storage when
>> loading a 50GB table with a couple of indexes?
> 
> The point is that more frequent checkpoints will increase the WAL
> volume *significantly* because more full page writes will have to
> be generated.

OK, I'll see how much it can be brought down through checkpoint tuning
and report back.

Thanks!

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTHCgAAoJEDfy90M199hlRnUP/3kJket9GmwEQoMUhndpxmDu
7+qDX4gFsZEGkndF6OxivSxZdRrjdioJA0STinx82w+GTpwZuBbjBfPmmjbW7kY+
t8QfuJzAauoSnscaE/NRVL2IJivRsAWbNpwWvC7nbLIhZmE4rXytN53izJwAHwP6
qJlyLmlvMB3IH75uoVo3AMQggE+gMy6kmDSfQST7e0RpkxE7Zsp51B7m6MLOxXjc
gpf8ukZztCUvxFwI1ds++hEeUR7qpAT5V/1hhcFBqKOCoYXeTxH7PUrPofdcg7PT
rzlGyBARQNz6apnEUCB3/bhQENHdXjmrXBeeyNUNug2jSXsi7BJpccJhPYN49XB4
FAGGxyQLB4Gl75oSO9cYJ8B0b0peYYIm9VhBOPUw1iPgouNsdlh+yA4QagPp3VnW
O8+Fdf/6lcBgbABEfDwCRBBkyaVSvckqrzp946YnasiCENad3zz4UEWfekEpRzk6
QTyVsObjjKHmOFt9mGeUOH7hGpTwleT5oXDttCWOxPtDkwB6tT1pC7DnRPpmYCvt
FWQC11sN0RNIeOl2W2Zo18P3dtlI8anQ/ano4FE1Dx8GKypncxAIm7RlRCKuax+C
anL7fNie/ObZRMNxQ0m7nJvbzOKjVooslUDnBTje4um1Aw1ljS74brqqKkWUOP6C
3GReDBEE20wLsD/5t6L+
=/L/M
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Andres Freund
On 2014-04-14 16:22:48 -0700, Joe Conway wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 04/14/2014 04:17 PM, Tom Lane wrote:
> > Andres Freund  writes:
> >> On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
> >>> checkpoint_segments = 96 checkpoint_timeout = 10min
> > 
> >> I bet you'll see noticeably - while still not great - better
> >> performance by setting checkpoint_timeout to an hour (with a
> >> corresponding increase in checkpoint_segments). Have you checked
> >> how often checkpoints are actually created? I'd bet it's far more
> >> frequent than every 10min with that _segments setting and such a
> >> load.
> > 
> > My thoughts exactly.  It's not hard to blow through WAL at
> > multiple megabytes per second with modern machines.  I'd turn on
> > checkpoint logging and then do whatever you need to do to get the
> > actual intercheckpoint time up to 10-15 minutes at least.
> 
> That'll help performance, but lets say I generally keep WAL files for
> PITR and don't turn that off before starting -- shouldn't I be very
> surprised to need over 3TB of archive storage when loading a 50GB
> table with a couple of indexes?

The point is that more frequent checkpoints will increase the WAL volume
*significantly* because more full page writes will have to be generated.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 04:17 PM, Tom Lane wrote:
> Andres Freund  writes:
>> On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
>>> checkpoint_segments = 96 checkpoint_timeout = 10min
> 
>> I bet you'll see noticeably - while still not great - better
>> performance by setting checkpoint_timeout to an hour (with a
>> corresponding increase in checkpoint_segments). Have you checked
>> how often checkpoints are actually created? I'd bet it's far more
>> frequent than every 10min with that _segments setting and such a
>> load.
> 
> My thoughts exactly.  It's not hard to blow through WAL at
> multiple megabytes per second with modern machines.  I'd turn on
> checkpoint logging and then do whatever you need to do to get the
> actual intercheckpoint time up to 10-15 minutes at least.

That'll help performance, but lets say I generally keep WAL files for
PITR and don't turn that off before starting -- shouldn't I be very
surprised to need over 3TB of archive storage when loading a 50GB
table with a couple of indexes?

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTG3IAAoJEDfy90M199hluaIP/00NYTg+AiRNTaMkhZAqFxxl
8Fysfbe9UXedGU/3hzcq0rCNuQEuG4qiNjGEBCgsQuW9smxvzIzuT5EAAmdOP6jR
lWGW1574g9qaRT2GNTnlt5hKArVJtE+wlmzspAK12aiLlhSax4o0dAIibRliZ+nZ
a7Ay8ZcrwcNCyZKg0UjXhZ75SXQyxdYxygIhMzmYgB9UyfTxh0Dbujd692QvpzyG
gnBl6iqZH/EJFkU821QILf7UNzGALdZ3aSpfijwtkAnIyMt5ZB5JzuEFCd/+Xpe7
GZnl4hKpyD9chqQ+vv4YRJrdAxH3pfsYPo/ksyMZrRnBl5ezDLehdopLXEsh4hZI
XDVqQPgC1tPR6DNAYAWT2bR2iO11GZLyhmZ8aU7eDVbBlUe7bE37L3f4yr3shzsm
A98J1GbDq4NYWJPeta8x0o8xg3A+HR/Q/+qYqH4hgRU+RhuV4kQ5Vl1xIP9a0gqV
+95y6sznGM0mtDfZMvqf3uNotKpIKeBCsHyshMXXYiPr4JxkymIAuh1zYLzQMBN5
wrJ2hUG2wIH2hra3sihokyZeqK9CeO7jEtVEtaGz8CEL2ihHnAMkO6uRWzpsOgW4
Xk8+iWt+RO7dfPBNa1N0urCCgr3KYOE0M5TaxtrGnfT7/bGlcNKpVIC9a48SqK+U
acttmBm6Ev8XVyEqpUit
=GT5+
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
>> checkpoint_segments = 96
>> checkpoint_timeout = 10min

> I bet you'll see noticeably - while still not great - better performance
> by setting checkpoint_timeout to an hour (with a corresponding increase
> in checkpoint_segments).
> Have you checked how often checkpoints are actually created? I'd bet
> it's far more frequent than every 10min with that _segments setting and
> such a load.

My thoughts exactly.  It's not hard to blow through WAL at multiple
megabytes per second with modern machines.  I'd turn on checkpoint logging
and then do whatever you need to do to get the actual intercheckpoint time
up to 10-15 minutes at least.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Including replication slot data in base backups

2014-04-14 Thread Michael Paquier
On Tue, Apr 15, 2014 at 1:55 AM, Robert Haas  wrote:
> On Mon, Apr 14, 2014 at 9:26 AM, Fujii Masao  wrote:
>> This makes me think that it's safer to just remove replication slot files
>> at the beginning of the recovery when both backup_label and recovery.conf
>> exist.
>
> Well, we could do that, but that would preempt anyone who *does* want
> to keep those replication slots around.  I'm not sure that's a good
> idea, either.
Same here, I still see cases where people might want to keep the
replication slot information around, particularly if they take an
atomic snapshot of PGDATA, which is something that some of our users
do. Enforcing removal of such files when recovery begins would only
bring less flexibility to the way recovery can be done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Jim Nasby

On 4/14/14, 5:51 PM, Joe Conway wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 03:17 PM, Jim Nasby wrote:

On 4/14/14, 4:50 PM, Andres Freund wrote:

On 2014-04-14 14:33:03 -0700, Joe Conway wrote:

I realize there are many things that can be done to improve my
specific scenario, e.g. drop indexes before loading, change
various configs, etc. My purpose for this post is to ask if it
is really expected to get over 20 times as much WAL as heap
data?


I'd bet a large percentage of this will be full page images of
the index. The values you index are essentially distributed over
the whole index, so you'll modifiy the same indx values
repeatedly. But often enough it won't be in the same checkpoint
and thus will create full page images.


My thought exactly...

ISTM that we should be able to push all the index inserts to the
end of the transaction. That should greatly reduce the amount of
full page writes. That would also open the door for doing all the
index inserts in parallel.


That's the thing. I'm sure there is tuning and other things to improve
this particular case, but creating over 20 times as much WAL as real
data seems like pathological behavior to me.


Can you take a look at what's actually going into WAL when the wheels fall off? 
I think it should be pretty easy to test the theory that it's a ton of full 
page writes of index leaf pages...
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-14 Thread Jim Nasby

On 4/14/14, 12:11 PM, Peter Geoghegan wrote:

I have some theories about the PostgreSQL buffer manager/clock sweep.
To motivate the reader to get through the material presented here, I
present up-front a benchmark of a proof-of-concept patch of mine:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/3-sec-delay/

Test Set 4 represents the patches performance here.

This shows some considerable improvements for a tpc-b workload, with
15 minute runs, where the buffer manager struggles with moderately
intense cache pressure. shared_buffers is 8GiB, with 32GiB of system
memory in total. The scale factor is 5,000 here, so that puts the
primary index of the accounts table at a size that makes it impossible
to cache entirely within shared_buffers, by a margin of couple of
GiBs. pgbench_accounts_pkey is ~"10GB", and pgbench_accounts is ~"63
GB". Obviously the heap is much larger, since for that table heap
tuples are several times the size of index tuples (the ratio here is
probably well below the mean, if I can be permitted to make a vast
generalization).

PostgreSQL implements a clock sweep algorithm, which gets us something
approaching an LRU for the buffer manager in trade-off for less
contention on core structures. Buffers have a usage_count/"popularity"
that currently saturates at 5 (BM_MAX_USAGE_COUNT). The classic CLOCK
algorithm only has one bit for what approximates our "usage_count" (so
it's either 0 or 1). I think that at its core CLOCK is an algorithm
that has some very desirable properties that I am sure must be
preserved. Actually, I think it's more accurate to say we use a
variant of clock pro, a refinement of the original CLOCK.


I think it's important to mention that OS implementations (at least all I know of) 
have multiple page pools, each of which has it's own clock. IIRC one of the 
arguments for us supporting a count>1 was we could get the benefits of multiple 
page pools without the overhead. In reality I believe that argument is false, 
because the clocks for each page pool in an OS *run at different rates* based on 
system demands.

I don't know if multiple buffer pools would be good or bad for Postgres, but I 
do think it's important to remember this difference any time we look at what 
OSes do.


If you look at the test sets that this patch covers (with all the
tricks applied), there are pretty good figures throughout. You can
kind of see the pain towards the end, but there are no dramatic falls
in responsiveness for minutes at a time. There are latency spikes, but
they're *far* shorter, and much better hidden. Without looking at
individual multiple minute spikes, at the macro level (all client
counts for all runs) average latency is about half of what is seen on
master.


My guess would be that those latency spikes are caused by a need to run the 
clock for an extended period. IIRC there's code floating around that makes it 
possible to measure that.

I suspect it would be very interesting to see what happens if your patch is 
combined with the work that (Greg?) did to reduce the odds of individual 
backends needing to run the clock. (I know part of that work looked at 
proactively keeping pages on the free list, but I think there was more to it 
than that).
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 03:17 PM, Jim Nasby wrote:
> On 4/14/14, 4:50 PM, Andres Freund wrote:
>> On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
>>> I realize there are many things that can be done to improve my 
>>> specific scenario, e.g. drop indexes before loading, change
>>> various configs, etc. My purpose for this post is to ask if it
>>> is really expected to get over 20 times as much WAL as heap
>>> data?
>> 
>> I'd bet a large percentage of this will be full page images of
>> the index. The values you index are essentially distributed over
>> the whole index, so you'll modifiy the same indx values
>> repeatedly. But often enough it won't be in the same checkpoint
>> and thus will create full page images.
> 
> My thought exactly...
> 
> ISTM that we should be able to push all the index inserts to the
> end of the transaction. That should greatly reduce the amount of
> full page writes. That would also open the door for doing all the
> index inserts in parallel.

That's the thing. I'm sure there is tuning and other things to improve
this particular case, but creating over 20 times as much WAL as real
data seems like pathological behavior to me.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTGZ8AAoJEDfy90M199hlhXQQAJfs/FOk6W83bVdU4pRN5bVI
HW0jeMwX4NtUigW2vk5tKWcCgWKDTZvvV2TE3C7XPQnoa4TC51bjFJDHErKxNV8i
vFk47OFvg4AEoILeRgsemLJFCc0jDlc5VClnNiH8esUjmOAv9vFktJ3JymVdaIYL
3ytxMyF/KYiCeWQlu6WZTfFD9qqdZh6dWIkm6m8gVXJstr+jVVkxHe2lNQe77nEi
DycHy/4dmMd4QThxw3sRzEGW1GNGGk/6X1zmZECXYu7v95E5dFLl1oD2CFUMpoGh
D5LWZqfuyhN0lHLe5nwTvvYeTGMg5+r/fVm1Y4oWbAQPjWycZcrMCFPho7U+5CHC
XPt6FuaIZlZ4GBPCNj398xyPZdwWkOBEhfvhu601ibOVbQwBECnWQxGpMTukCvxT
giaZD8C1Ty/MAq0lleAPdkNN91GPqMkhd46sG/aVMDOGtjfJkfYFeqj6b7rbFknw
+wdioB0+vTFQ+hJ3yzVIAR09RoL0o3UW/8C1kOE5jIjJZPxdta5or7ZD77y1RLJI
/UVU2LVcyS82ddmWcWM6/q5LaqlPgityZZmIoi8Hxp1ywNzIZcyY0t1RJkMrrb0I
LIOTSizFA1zFM3lDNV7sF261DQS9IjOSgeSMIfB9zJQArWWwJ7c/DiTEbwpZu7iz
0VKmaJk15zqf1FWEdX+I
=l6F9
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-14 Thread Bruce Momjian
On Mon, Apr 14, 2014 at 09:34:14AM +0530, Amit Kapila wrote:
> The problem can be solved this way, but the only question here is whether
> it is acceptable for users to have a new console window for server.
> 
> Can others also please share their opinion if this fix (start server in new
> console) seems acceptable or shall we try by passing some information
> from pg_ctl and then ignore CTRL+C && CTRL+BREAK for it?

I wanted to point out that I think this patch is only appropriate for
head, not backpatching.  Also, on Unix we have to handle signals that
come from the kill command.  Can you send CTRL+C from other
applications on Windows?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Jim Nasby

On 4/14/14, 12:44 PM, Tom Lane wrote:

Andres Freund  writes:

>On 2014-04-14 13:06:21 -0400, Tom Lane wrote:

>>In particular I'm not sold on the use-case
>>for being able to tell that a process is waiting without being able to
>>tell what it's waiting for.  I can figure that much out already.



>You can? How? It could also be io or something else that's problematic.



If the process is not consuming any CPU time at all, it's waiting on
something.  (Now admittedly, that might be hard to tell remotely ---
but Simon seems to be assuming you have access to "ps" output.)


Right... and then I always find myself wondering what it's actually waiting on. 
IO? lwlock? Something else?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Jim Nasby

On 4/14/14, 12:06 PM, Tom Lane wrote:

One concrete reason not to do the proposed trivial hack is that the lock
readout views are asynchronous.  Right now, if someone sees a process that
claims to be waiting but they don't see any entry in pg_locks, they know
they saw inconsistent state.  If we add a valid state where waiting can be
true without a pg_locks entry, they won't know what to think.  I don't
want to go there.


FWIW, I really wish we had a way to eliminate that inconsistency. It makes 
already difficult to debug problems even harder to deal with.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Jim Nasby

On 4/14/14, 4:50 PM, Andres Freund wrote:

Hi,

On 2014-04-14 14:33:03 -0700, Joe Conway wrote:

checkpoint_segments = 96
checkpoint_timeout = 10min



I realize there are many things that can be done to improve my
specific scenario, e.g. drop indexes before loading, change various
configs, etc. My purpose for this post is to ask if it is really
expected to get over 20 times as much WAL as heap data?


I'd bet a large percentage of this will be full page images of the
index. The values you index are essentially distributed over the whole
index, so you'll modifiy the same indx values repeatedly. But often
enough it won't be in the same checkpoint and thus will create full page
images.


My thought exactly...

ISTM that we should be able to push all the index inserts to the end of the 
transaction. That should greatly reduce the amount of full page writes. That 
would also open the door for doing all the index inserts in parallel.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Andres Freund
Hi,

On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
> checkpoint_segments = 96
> checkpoint_timeout = 10min

> I realize there are many things that can be done to improve my
> specific scenario, e.g. drop indexes before loading, change various
> configs, etc. My purpose for this post is to ask if it is really
> expected to get over 20 times as much WAL as heap data?

I'd bet a large percentage of this will be full page images of the
index. The values you index are essentially distributed over the whole
index, so you'll modifiy the same indx values repeatedly. But often
enough it won't be in the same checkpoint and thus will create full page
images.
I bet you'll see noticeably - while still not great - better performance
by setting checkpoint_timeout to an hour (with a corresponding increase
in checkpoint_segments).
Have you checked how often checkpoints are actually created? I'd bet
it's far more frequent than every 10min with that _segments setting and
such a load.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Excessive WAL generation and related performance issue

2014-04-14 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have run into a situation where bulk loading a table with fairly
narrow rows and two indexes causes WAL to be generated at about 20:1
or higher ratio to the actual heap data (table plus indexes).

There are 560 million loaded rows which ultimately produce a table of
about 50 GB and indexes consuming about 75 GB for a grand total of 125
GB table+index heap size. This generates over 3 TB of WAL. The loading
progresses fairly fast at first and then drops off sharply so that the
load takes 4 plus days on production hardware. Near the end of the
load the heap grows very slowly -- measured in tens of MBs per hour.

I have reproduced the case on a test machine like so:

create table foo
(
  pid varchar(11) not null,
  sprid varchar(5) not null,
  cid varchar(11) not null,
  csdt date not null,
  pcind char(1) not null,
  cedt date,
  csnum int2 not null default 767,
  rcd char(2) not null default '99'::bpchar,
  rrind char(1) not null default 'K'::bpchar,
  its timestamp not null default now(),
  luts timestamp not null default now(),
  primary key (pid,sprid,cid,csdt,rcd)
);
create index fooidx on foo(cid,pid);

- -- creates 77GB file for test load
copy
(
  select
   (id % 2604380)::text,
   (id % 30)::text,
   (id % 7906790)::text,
   now() - ((id % 8777)::text || ' days')::interval,
   (id % 3)::text,
   now() - ((id % 6351)::text || ' days')::interval,
   (id % 5)::int2,
   (id % 27)::text,
   (id % 2)::text,
   now() - ((id % 16922)::text || ' seconds')::interval,
   now() - ((id % 27124)::text || ' seconds')::interval
  from generate_series(100,100 + 559245675) as g(id)
) to '/home/postgres/foo.dump';


grep -E "^($|#|\s.*#?)" -v $PGDATA/postgresql.conf
max_connections = 1200
superuser_reserved_connections = 30
shared_buffers = 12288MB
work_mem = 8MB
maintenance_work_mem = 128MB
wal_level = hot_standbyhot_standby
checkpoint_segments = 96
checkpoint_timeout = 10min
checkpoint_completion_target = 0.7
archive_mode = on
archive_command = 'cp -i %p /home/postgres/archive/%f http://www.joeconway.com/presentations/test01.png

Also worth noting is that once this thing gets going, using "perf top"
I could see XLogInsert consuming 20-40% of all cycles, and the vast
majority of that was going into the CRC calculation.


I realize there are many things that can be done to improve my
specific scenario, e.g. drop indexes before loading, change various
configs, etc. My purpose for this post is to ask if it is really
expected to get over 20 times as much WAL as heap data?

Oh, and this is all on postgres 9.2.latest. I have not yet tried to
reproduce on git master but intend to do so.

Thoughts?

Thanks,

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTFQPAAoJEDfy90M199hlO+cP/jthKJJqkNMyVwWADHjZaeOC
4fgYSPvEGi8y2AtRZwBIKMWSVhCKbwZ+ZLdpUk3jJ1eIPOvxTAlbdf/haXctRDHv
ZLgs6uBSihbG2guijkxnMtSPjXql6WKmai+UncuGGcsqHvEwvnqIGdr7eg5Rd+c4
Q0b36DhoadKVJEwT3qWGfUXxcpkQIG5hgh39GOhQtL7xW+Tf6odLrep0/lmiNCaz
9eIcbVCzc8cIG8jGugSsGGHo1UA/s5A0aEd3mx5ZnroHjUvEWdHuTNY7ijXs6pqk
ynSHSJtDikTGVxJlj68nv/wGtHN0+xbsci/qv0sTZfOUh8mwfkBZAiOEyzK6lDcc
cxzafqAkFIpIL9cDPyhxWbSOI8LAqXfRGTiM5rsguX4iCuf3SEYl48f6sZ9CW3Jt
zNDlns2BZKKWWx88H/EL84sajo/SJS0Ml+9ppmV3TpA4zHHfiEpn3Rg4/Nsj0qzj
H/kx2UwSXA7iQiQ860gL+EggN0MmVcN+/KOAAg/9cJ4DL8TiVTv/vn5Rspv19Hu1
A12tWnioxGaFsuPAjtZCpxHPUOD3jUUSEJEAYJTCW3eJ7a6SPgAIq3eGE0Fj4v00
RWsCVhqydteuvB59MSAvtA06MHfer/fDvOK2rquY/HnOtwwdPkGG7QZ21lyfhdH1
UKo0u4OmvQuHJsiT7Z2W
=xxw3
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-14 Thread Tom Lane
Simon Riggs  writes:
> [ assorted comments about custom-scan patch, but particularly ]

> * The prune hook makes me feel very uneasy. It seems weirdly specific
> implementation detail, made stranger by the otherwise lack of data
> maintenance API calls. Calling that for every dirty page sounds like
> an issue and my patch rejection indicator is flashing red around that.

Yeah.  After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type.  Yeah, you can plug in some new execution node types, but
actually doing anything interesting is going to require patching other
parts of the system.  Are we going to say to all comers, "sure, we'll put
a hook call anywhere you like, just ask"?  I can't see this as being the
way to go.

Another way of describing the problem is that it's not clear where the API
boundaries are for potential users of a custom-scan feature.  (Simon said
several things that are closely related to this point.)  One thing I don't
like at all about the patch is its willingness to turn anything whatsoever
into a publicly exported function, which basically says that the design
attitude is there *are* no boundaries.  But that's not going to lead to
anything maintainable.  We're certainly not going to want to guarantee
that these suddenly-exported functions will all now have stable APIs
forevermore.

Overall I concur with Simon's conclusion that this might be of interest
for R&D purposes, but it's hard to see anyone wanting to support a
production feature built on this.  It would be only marginally less
painful than supporting a patch that just adds the equivalent code
to the backend in the traditional way.

So I'm feeling that this was kind of a dead end.  It was worth doing
the legwork to see if this sort of approach could be useful, but the
answer seems like "no".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSONB in-place updates?

2014-04-14 Thread Josh Berkus
On 04/14/2014 09:27 AM, kelas wrote:
> Are there any plans to add "in-place at-depth" update operator for JSONB
> type, e.g.:
> 
> UPDATE test SET attrs->'anwser' = 42 where attrs->'answer' = 41
> 

Plans, yes.  But not until 9.5, or maybe as an extension.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JSONB in-place updates?

2014-04-14 Thread kelas
Are there any plans to add "in-place at-depth" update operator for JSONB
type, e.g.:

UPDATE test SET attrs->'anwser' = 42 where attrs->'answer' = 41


Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-14 Thread Tom Lane
Peter Eisentraut  writes:
> What is the difference (on affected platforms) between

> Datum funcname(PG_FUNCTION_ARGS);

> and writing (effectively)

> PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);
> Datum funcname(PG_FUNCTION_ARGS);

> or for that matter

> Datum funcname(PG_FUNCTION_ARGS);
> PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);

According to
http://www.postgresql.org/message-id/52d25aa2.50...@2ndquadrant.com

the latter fails outright.  Which is problematic because that'd be the
more common case (particularly if you put manually-written externs in a
header file).  So while we could do this, and it'd probably be an
improvement in the very long run, it'd definitely cause pain in the short
run.  Not sure if it's worth it.

I still wish we could get rid of this problem by fixing the Windows build
recipes so that the PGDLLEXPORT marking wasn't needed.  We proved to
ourselves recently that getting rid of PGDLLIMPORT on global variables
wouldn't work, but I'm not sure that the function end of it was really
investigated.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2014-04-14 Thread Fabrízio de Royes Mello
On Tue, Apr 1, 2014 at 2:46 PM, Robert Haas  wrote:
>
> >> Where this is a bit more interesting is in the case of sequences, where
> >> resetting the sequence to zero may cause further inserts into an
> >> existing table to fail.
> >
> > Yeah.  Sequences do have contained data, which makes COR harder to
define
> > --- that's part of the reason why we have CINE not COR for tables, and
> > maybe we have to do the same for sequences.  The point being exactly
> > that if you use CINE, you're implicitly accepting that you don't know
> > the ensuing state fully.
>
> Yeah.  I think CINE is more sensible than COR for sequences, for
> precisely the reason that they do have contained data (even if it's
> basically only one value).
>

The attached patch contains CINE for sequences.

I just strip this code from the patch rejected before.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/create_sequence.sgml b/doc/src/sgml/ref/create_sequence.sgml
index 70b9f3d..de85b18 100644
--- a/doc/src/sgml/ref/create_sequence.sgml
+++ b/doc/src/sgml/ref/create_sequence.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE [ TEMPORARY | TEMP ] SEQUENCE name [ INCREMENT [ BY ] increment ]
+CREATE [ TEMPORARY | TEMP ] [ IF NOT EXISTS ] SEQUENCE name [ INCREMENT [ BY ] increment ]
 [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
 [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]
 [ OWNED BY { table_name.column_name | NONE } ]
@@ -90,6 +90,16 @@ SELECT * FROM name;

 

+IF NOT EXISTS
+
+ 
+  Do nothing (except issuing a notice) if a sequence with the same name
+  already exists.
+ 
+
+   
+
+   
 name
 
  
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index ed696be..54be1b8 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -122,6 +122,17 @@ DefineSequence(CreateSeqStmt *seq)
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("unlogged sequences are not supported")));
 
+	/* Check for IF NOT EXISTS clause */
+	RangeVarGetAndCheckCreationNamespace(seq->sequence, NoLock, &seqoid);
+	if (seq->if_not_exists && OidIsValid(seqoid))
+	{
+		ereport(NOTICE,
+(errcode(ERRCODE_DUPLICATE_TABLE),
+ errmsg("relation \"%s\" already exists, skipping",
+		seq->sequence->relname)));
+		return seqoid;
+	}
+
 	/* Check and set all option values */
 	init_params(seq->options, true, &new, &owned_by);
 
@@ -210,7 +221,7 @@ DefineSequence(CreateSeqStmt *seq)
 	stmt->options = NIL;
 	stmt->oncommit = ONCOMMIT_NOOP;
 	stmt->tablespacename = NULL;
-	stmt->if_not_exists = false;
+	stmt->if_not_exists = seq->if_not_exists;
 
 	seqoid = DefineRelation(stmt, RELKIND_SEQUENCE, seq->ownerId);
 	Assert(seqoid != InvalidOid);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 98ad910..fecf4b7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3317,6 +3317,7 @@ _copyCreateSeqStmt(const CreateSeqStmt *from)
 	COPY_NODE_FIELD(sequence);
 	COPY_NODE_FIELD(options);
 	COPY_SCALAR_FIELD(ownerId);
+	COPY_SCALAR_FIELD(if_not_exists);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9901d23..21663fb 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1565,6 +1565,7 @@ _equalCreateSeqStmt(const CreateSeqStmt *a, const CreateSeqStmt *b)
 	COMPARE_NODE_FIELD(sequence);
 	COMPARE_NODE_FIELD(options);
 	COMPARE_SCALAR_FIELD(ownerId);
+	COMPARE_SCALAR_FIELD(if_not_exists);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7b9895d..27c24ea 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3391,6 +3391,17 @@ CreateSeqStmt:
 	n->sequence = $4;
 	n->options = $5;
 	n->ownerId = InvalidOid;
+	n->if_not_exists = false;
+	$$ = (Node *)n;
+}
+			| CREATE OptTemp SEQUENCE IF_P NOT EXISTS qualified_name OptSeqOptList
+{
+	CreateSeqStmt *n = makeNode(CreateSeqStmt);
+	$7->relpersistence = $2;
+	n->sequence = $7;
+	n->options = $8;
+	n->ownerId = InvalidOid;
+	n->if_not_exists = true;
 	$$ = (Node *)n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 18d4991..3facff7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1941,6 +1941,7 @@ typedef struct CreateSeqStmt
 	RangeVar   *sequence;		/* the sequence to create */
 	List	   *options;
 	Oid			ownerId;		/* ID of owner, or InvalidOid for default */
+	bool		if_not_exists;	/* skip error if a Sequence already exists */
 } CreateSeqStmt;
 
 typedef struct AlterSeqStmt
diff --

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-14 Thread Peter Eisentraut
On 4/4/14, 10:07 AM, Andres Freund wrote:
> If
> somebody previously tried to do the correct thing and attached
> PGDLLEXPORT to their own *function* prototoype, it would cause problems
> now.

What is the difference (on affected platforms) between

Datum funcname(PG_FUNCTION_ARGS);

and writing (effectively)

PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);
Datum funcname(PG_FUNCTION_ARGS);

or for that matter

Datum funcname(PG_FUNCTION_ARGS);
PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);


If there isn't a difference, then my patch is fine.  Otherwise, it might
be good to document the issues for extension authors.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-04-14 Thread Heikki Linnakangas

On 04/14/2014 07:51 PM, Tom Lane wrote:

I'd prefer to leave the prepare sequence alone and instead find a way
to reject COMMIT PREPARED until after the source transaction is safely
clear of the race conditions.  The upthread idea of looking at vxid
instead of xid might help, except that I see we clear both of them
in ProcArrayClearTransaction.  We'd need some state in PGPROC that
isn't cleared till later than that.


Hmm. What if one of the post-cleanup action fails? We can't bail out of 
the prepare sequence until we have transfered the locks to the new 
PGPROC. Otherwise the locks are lost. In essence, there should be a 
critical section from the EndPrepare call until all the critical cleanup 
actions like PostPrepare_Locks have been done, and I don't think we want 
that. We might be able to guarantee that the built-in post-cleanup 
operations are safe enough for that, but there's also CallXactCallbacks 
in there.


Given the lack of reports of that happening, though, perhaps that's not 
an issue.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Robert Haas
On Mon, Apr 14, 2014 at 1:26 PM, Andres Freund  wrote:
> On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>> > On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
>> >> AFAICS, the big advantage of something like this is that we'd get
>> >> proper deadlock detection, and that's not a trivial point.
>>
>> > Hm. Is this actually something we need? I am not aware of deadlock prone
>> > scenarios involving buffer pins during normal processing (HS is another
>> > matter).
>>
>> Ordinary buffer pinning isn't supposed to be subject to deadlocks (that's
>> why it's reasonable to use LWLocks for it), but it's less clear that
>> cleanup locks couldn't be subject to deadlocks.
>
> We only acquire cleanup locks in a blocking fashion from vacuum - and
> vacuum has a pretty clearly defined locking behaviour. Additionally both
> in vacuum and in opportunistic pruning there's only a very small and
> defined amount of work done once the buffer is successfully pinned.

Nevertheless, I'm pretty sure undetected deadlocks are possible; we've
discussed it before.  The TODO list contains a pointer to:

http://www.postgresql.org/message-id/21534.1200956...@sss.pgh.pa.us

I think the scenario is something like: vacuum is waiting for a buffer
pin that pertains to an open query in some other session, which then
tries to take a lock that conflicts with one already held by vacuum.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
>> In particular I'm not sold on the use-case
>> for being able to tell that a process is waiting without being able to
>> tell what it's waiting for.  I can figure that much out already.

> You can? How? It could also be io or something else that's problematic.

If the process is not consuming any CPU time at all, it's waiting on
something.  (Now admittedly, that might be hard to tell remotely ---
but Simon seems to be assuming you have access to "ps" output.)

>> One concrete reason not to do the proposed trivial hack is that the lock
>> readout views are asynchronous.  Right now, if someone sees a process that
>> claims to be waiting but they don't see any entry in pg_locks, they know
>> they saw inconsistent state.  If we add a valid state where waiting can be
>> true without a pg_locks entry, they won't know what to think.  I don't
>> want to go there.

> What's you opinion of the waiting = true combined with waiting_for =
> 'cleanup lock' or something similar?

It's better than the original proposal, but it still doesn't tell you
which other processes are blocking the waiter, which makes it not
terribly useful IMO.  Any actual gain in utility here would come from
being able to find that out.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-04-14 Thread Andres Freund
On 2014-04-14 13:47:35 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I wonder if the most natural way to express this wouldn't be to have a
> > heavyweight lock for every 2pc xact
> > 'slot'. ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) should be scheduled
> > correctly to make error handling for this work.
> 
> That seems like not a bad idea.  Could we also use the same lock to
> prevent concurrent attempts to commit/rollback the same already-prepared
> transaction?  I forget what we're doing to forestall such cases right now.

GlobalTransaction->locking_xid is currently used. If it points to a live
transaction by another backned "prepared transaction with identifier
\"%s\" is busy" will be thrown.
ISTM if there were using a lock for every slot, that logic couldbe
thrown away.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-04-14 Thread Tom Lane
Andres Freund  writes:
> I wonder if the most natural way to express this wouldn't be to have a
> heavyweight lock for every 2pc xact
> 'slot'. ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) should be scheduled
> correctly to make error handling for this work.

That seems like not a bad idea.  Could we also use the same lock to
prevent concurrent attempts to commit/rollback the same already-prepared
transaction?  I forget what we're doing to forestall such cases right now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-04-14 Thread Andres Freund
On 2014-04-14 12:51:02 -0400, Tom Lane wrote:
> The whole thing feels like we are solving the wrong problem, anyway.
> IIUC, the complaint arises because we are allowing COMMIT PREPARED
> to occur before the source transaction has reported successful prepare
> to its client.  Surely that does not need to be a legal case?  No
> correctly-operating 2PC xact manager would do that.

I agree here. This seems somewhat risky, just to support a case that
shouldn't happen in reality - as somewhat evidenced by the fact that
there don't seem to be field reports around this.

> The upthread idea of looking at vxid
> instead of xid might help, except that I see we clear both of them
> in ProcArrayClearTransaction.  We'd need some state in PGPROC that
> isn't cleared till later than that.

I wonder if the most natural way to express this wouldn't be to have a
heavyweight lock for every 2pc xact
'slot'. ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) should be scheduled
correctly to make error handling for this work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Andres Freund
On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
> >> AFAICS, the big advantage of something like this is that we'd get
> >> proper deadlock detection, and that's not a trivial point.
> 
> > Hm. Is this actually something we need? I am not aware of deadlock prone
> > scenarios involving buffer pins during normal processing (HS is another
> > matter).
> 
> Ordinary buffer pinning isn't supposed to be subject to deadlocks (that's
> why it's reasonable to use LWLocks for it), but it's less clear that
> cleanup locks couldn't be subject to deadlocks.

We only acquire cleanup locks in a blocking fashion from vacuum - and
vacuum has a pretty clearly defined locking behaviour. Additionally both
in vacuum and in opportunistic pruning there's only a very small and
defined amount of work done once the buffer is successfully pinned.

> In particular I'm not sold on the use-case
> for being able to tell that a process is waiting without being able to
> tell what it's waiting for.  I can figure that much out already.

You can? How? It could also be io or something else that's problematic.

> One concrete reason not to do the proposed trivial hack is that the lock
> readout views are asynchronous.  Right now, if someone sees a process that
> claims to be waiting but they don't see any entry in pg_locks, they know
> they saw inconsistent state.  If we add a valid state where waiting can be
> true without a pg_locks entry, they won't know what to think.  I don't
> want to go there.

What's you opinion of the waiting = true combined with waiting_for =
'cleanup lock' or something similar?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
>> AFAICS, the big advantage of something like this is that we'd get
>> proper deadlock detection, and that's not a trivial point.

> Hm. Is this actually something we need? I am not aware of deadlock prone
> scenarios involving buffer pins during normal processing (HS is another
> matter).

Ordinary buffer pinning isn't supposed to be subject to deadlocks (that's
why it's reasonable to use LWLocks for it), but it's less clear that
cleanup locks couldn't be subject to deadlocks.

The real issue here is that LWLocks support neither deadlock detection nor
state reporting, and that's more or less exactly why they're lightweight,
so we do not want to go in the direction of adding such features to them.
The cleanup-lock mechanism is sort of an ugly intermediate thing between
LWLocks and regular heavyweight locks.  That was okay, sort of, as long
as it was simple ... but once we start loading additional feature
requirements onto it, it gets (even) less attractive to have a
single-purpose mechanism.  In particular I'm not sold on the use-case
for being able to tell that a process is waiting without being able to
tell what it's waiting for.  I can figure that much out already.

One concrete reason not to do the proposed trivial hack is that the lock
readout views are asynchronous.  Right now, if someone sees a process that
claims to be waiting but they don't see any entry in pg_locks, they know
they saw inconsistent state.  If we add a valid state where waiting can be
true without a pg_locks entry, they won't know what to think.  I don't
want to go there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Alvaro Herrera
Tom Lane wrote:

> In an ideal world, when we needed to wait for a cleanup lock, we'd cause
> the lock manager to set up pre-granted sharable page locks for all the
> processes currently holding buffer pins, and then wait for an exclusive
> page lock.  The current hack of signaling when you're the last one off the
> page would be replaced by releasing your lock (if it exists) when you drop
> your own pin.  I'm not sure it's really worth the trouble to try to do
> this, but it would solve the visibility problem; and it might allow us to
> be a bit smarter about the priority of a cleanup lock request versus
> incoming regular pin requests.

AFAIU this would represent a behavioral change: right now, vacuum waits
until everybody is gone, and new pinners might arrive while vacuum is
waiting.  With this scheme, new pinners would have to wait behind
vacuum.  Maybe this change alone is enough to avoid vacuum blocking for
long periods waiting for cleanup lock.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Including replication slot data in base backups

2014-04-14 Thread Robert Haas
On Mon, Apr 14, 2014 at 9:26 AM, Fujii Masao  wrote:
> This makes me think that it's safer to just remove replication slot files
> at the beginning of the recovery when both backup_label and recovery.conf
> exist.

Well, we could do that, but that would preempt anyone who *does* want
to keep those replication slots around.  I'm not sure that's a good
idea, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Andres Freund
On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
> AFAICS, the big advantage of something like this is that we'd get
> proper deadlock detection, and that's not a trivial point.

Hm. Is this actually something we need? I am not aware of deadlock prone
scenarios involving buffer pins during normal processing (HS is another
matter).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-04-14 Thread Tom Lane
Heikki Linnakangas  writes:
>> I think we'll need to transfer of the locks earlier, before the
>> transaction is marked as fully prepared. I'll take a closer look at this
>> tomorrow.

> Here's a patch to do that. It's very straightforward, I just moved the 
> calls to transfer locks earlier, before ProcArrayClearTransaction. 
> PostPrepare_MultiXact had a similar race -  it also transfer state from 
> the old PGPROC entry to the new, and needs to be done before allowing 
> another backend to remove the new PGPROC entry. I changed the names of 
> the functions to distinguish them from the other PostPrepare_* functions 
> that now happen at a different time.

Why didn't you also move up PostPrepare_PredicateLocks?  Seems like its
access to MySerializableXact is also racy.

> The patch is simple, but it's a bit scary to change the order of things 
> like this.

Yeah.  There are a lot of assumptions in there about the order of resource
release, in particular that it is safe to do certain things because we're
still holding locks.

I poked around a bit and noticed one theoretical problem sequence: if the
prepared xact drops some relation that we're still holding buffer pins on.
This shouldn't really happen (why are we still pinning some rel we think
we dropped?) but if it did, the commit would do DropRelFileNodeBuffers
which would end up busy-looping until we drop our pins (see
InvalidateBuffer, which thinks this must be an I/O wait situation).
So it would work, more or less, but it seems pretty fragile.  I'm afraid
there are more assumptions like this one.

The whole thing feels like we are solving the wrong problem, anyway.
IIUC, the complaint arises because we are allowing COMMIT PREPARED
to occur before the source transaction has reported successful prepare
to its client.  Surely that does not need to be a legal case?  No
correctly-operating 2PC xact manager would do that.

I'd prefer to leave the prepare sequence alone and instead find a way
to reject COMMIT PREPARED until after the source transaction is safely
clear of the race conditions.  The upthread idea of looking at vxid
instead of xid might help, except that I see we clear both of them
in ProcArrayClearTransaction.  We'd need some state in PGPROC that
isn't cleared till later than that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor improvements in create_foreign_table.sgml

2014-04-14 Thread Robert Haas
On Mon, Apr 14, 2014 at 5:03 AM, Etsuro Fujita
 wrote:
> Attached is a small patch to improve create_foreign_table.sgml.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-14 Thread Robert Haas
On Sun, Apr 13, 2014 at 2:23 PM, Tom Lane  wrote:
> * I also left out the table documenting which aggregates have this
> optimization.  That's not the kind of thing we ordinarily document,
> and it seems inevitable to me that such a table would be noteworthy
> mostly for wrong/incomplete/obsolete information in the future.

I tend to think that not documenting such things is an error.  Sure,
the documentation could become obsolete, but I don't see why it's
particularly more likely with this table than anywhere else, and if it
does happen, and people care, they'll submit patches to fix it.  More
to the point, when we don't document things like this, it doesn't
cause the knowledge not to be important to end-users; it just means
that the knowledge lives on wiki pages, support fora, and the minds of
people who are "in the know" rather than being easily and generally
accessible.  I'd rather have documentation on this topic that was,
say, 80% correct than have none at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for aggregates

2014-04-14 Thread Simon Riggs
On 5 April 2014 04:18, Tom Lane  wrote:
> Greg Stark  writes:
>> Well in many cases stype will just be internal for many of them. That
>> doesn't mean they're the same.
>
>> Hm, I suppose it might if they have the same sfunc.
>
>> This is actually where I started but we concluded that we needed some
>> declaration that the aggregates were actually related and would interpret
>> the state the same way and not just that it happened to use the same
>> storage format.
>
> Well, in practice you'd need to also compare the input datatype (consider
> polymorphic aggregates) and initcond.  But the sfunc isn't told which
> finalfunc will be applied, so any aggregates that share the same sfunc and
> have the other conditions the same *must* have the identical transition
> data behavior.  I don't see any reason to invent new syntax, and there
> are good reasons not to if we don't have to.

Definitely happy not to have additional syntax. So we can just
dynamically group the aggregates together.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-14 Thread Robert Haas
On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila  wrote:
> On Wed, Apr 9, 2014 at 9:20 PM, Robert Haas  wrote:
>> On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila  wrote:
>>> I am just not sure whether it is okay to rearrange the code and call
>>> GetLastError() only if returned handle is Invalid (NULL) or try to look
>>> for more info.
>>
>> Well, I don't know either.  You wrote the Windows portion of this
>> code, so you'll have to decide what's best.  If the best practice in
>> this area is to only call GetLastError() if the handle isn't valid,
>> then that's probably what we should do, too.  But I can't speak to
>> what the best practice is.
>
> I have checked that other place in code also check handle to
> decide if API has failed.  Refer function PGSharedMemoryIsInUse().
> So I think fix to call GetLastError() after checking handle is okay.
> Attached patch fixes this issue.  After patch, the server shows below
> log which is exactly what is expected from test_shm_mq

In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
way to test for a NULL pointer.  I notice that the !hmap style is used
throughout this code, so I guess cleaning that up is a matter for a
separate commit.

For the create case, I'm wondering if we should put the block that
tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
is your opinion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Andres Freund
On 2014-04-14 12:02:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
> >> I wonder whether we should not try to fix this by making the process wait
> >> on a heavyweight lock, if it has to wait.  That would also get us out of
> >> the rather grotty business of using a special-purpose signal to wake it
> >> up.  However, there's still a visibility problem, in that there'd be no
> >> way to tell which other processes are blocking it (which is the thing
> >> you *really* want to know).
> 
> > That'd be neat, but I am not really sure how? Which lock would we be
> > waiting on?
> 
> Well, we already have locktags for relation pages, so that aspect isn't
> that hard.  The bigger problem is what are we waiting *for*, that is,
> what is it that blocks the lock request from being granted?

> In an ideal world, when we needed to wait for a cleanup lock, we'd cause
> the lock manager to set up pre-granted sharable page locks for all the
> processes currently holding buffer pins, and then wait for an exclusive
> page lock.

Well, wouldn't that imply every pin (well, unless local) goes through
the lock manager in some way because otherwise we'll end up with
basically the same kludges as today painted in a different color? I
can't believe that'll easily work out performancewise.

I think it might be worthwile to do this if we can figure out how to do
it performantly, but I won't hold my breath for it. And I think we need
to improve visibility of cleanup locks (and possibly lwlocks) on a much
shorter timescale.

> I'm not sure it's really worth the trouble to try to do
> this, but it would solve the visibility problem; and it might allow us to
> be a bit smarter about the priority of a cleanup lock request versus
> incoming regular pin requests.

I don't know how, but some smarter priorization here would be really
helpful. It pretty much sucks that some relation essentially can't be
vacuumed once there's tuples needing freezing.
I wonder if we could make the freezing part work without a cleanup lock...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Robert Haas
On Mon, Apr 14, 2014 at 12:02 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
>>> I wonder whether we should not try to fix this by making the process wait
>>> on a heavyweight lock, if it has to wait.  That would also get us out of
>>> the rather grotty business of using a special-purpose signal to wake it
>>> up.  However, there's still a visibility problem, in that there'd be no
>>> way to tell which other processes are blocking it (which is the thing
>>> you *really* want to know).
>
>> That'd be neat, but I am not really sure how? Which lock would we be
>> waiting on?
>
> Well, we already have locktags for relation pages, so that aspect isn't
> that hard.  The bigger problem is what are we waiting *for*, that is,
> what is it that blocks the lock request from being granted?
>
> In an ideal world, when we needed to wait for a cleanup lock, we'd cause
> the lock manager to set up pre-granted sharable page locks for all the
> processes currently holding buffer pins, and then wait for an exclusive
> page lock.  The current hack of signaling when you're the last one off the
> page would be replaced by releasing your lock (if it exists) when you drop
> your own pin.  I'm not sure it's really worth the trouble to try to do
> this, but it would solve the visibility problem; and it might allow us to
> be a bit smarter about the priority of a cleanup lock request versus
> incoming regular pin requests.

AFAICS, the big advantage of something like this is that we'd get
proper deadlock detection, and that's not a trivial point.  But other
than that I don't like it much.  The fast-path locking stuff was
basically this kind of thing in reverse: instead of trying to
implement something with a separate locking implementation into the
regular lock manager, we were trying to take something that was
handled by the regular lock manager in a completely generic way and
give it an optimized path that performs much better on the cases that
actually arise in practice.  And, it was worth it, because we got a
huge performance boost, but it was also messy and, from your recent
reports, apparently still not fully debugged.

Now, I don't really have a specific proposal in mind that I think
would be better than shoehorning more stuff into the lock manager, so
it would be hard for me to oppose that if someone worked out all of
the problems and proposed a patch.  But I have uneasy feelings about
it.  It's hard to see how we can have a reasonable deadlock detector
without some kind of very generic locking mechanism, and that's what
the lock manager is, but we also know from experience that it's really
slow in some circumstances and frustratingly easy to run out of shared
memory, so I can't help but wonder if we really ought to be giving the
whole way we do locking and deadlock detection a broader rethink
somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
>> I wonder whether we should not try to fix this by making the process wait
>> on a heavyweight lock, if it has to wait.  That would also get us out of
>> the rather grotty business of using a special-purpose signal to wake it
>> up.  However, there's still a visibility problem, in that there'd be no
>> way to tell which other processes are blocking it (which is the thing
>> you *really* want to know).

> That'd be neat, but I am not really sure how? Which lock would we be
> waiting on?

Well, we already have locktags for relation pages, so that aspect isn't
that hard.  The bigger problem is what are we waiting *for*, that is,
what is it that blocks the lock request from being granted?

In an ideal world, when we needed to wait for a cleanup lock, we'd cause
the lock manager to set up pre-granted sharable page locks for all the
processes currently holding buffer pins, and then wait for an exclusive
page lock.  The current hack of signaling when you're the last one off the
page would be replaced by releasing your lock (if it exists) when you drop
your own pin.  I'm not sure it's really worth the trouble to try to do
this, but it would solve the visibility problem; and it might allow us to
be a bit smarter about the priority of a cleanup lock request versus
incoming regular pin requests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] server vs foreign server inconsistency

2014-04-14 Thread Tom Lane
Jaime Casanova  writes:
> A few days ago i was wondering why we use CREATE/DROP SERVER but then
> when we want to GRANT/REVOKE we need to use FOREIGN SERVER.

Because the SQL standard says so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Andres Freund
On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
> >> On 13 April 2014 16:44, Andres Freund  wrote:
> >>> What I am not sure about is how... It's trivial to set
> >>> pg_stat_activity.waiting = true, but without a corresponding description
> >>> what the backend is waiting for it's not exactly obvious what's
> >>> happening. I think that's better than nothing, but maybe somebody has a
> >>> glorious better idea.
> 
> >> pg_stat_activity.waiting = true
> 
> > Yes. That's what I suggested above. The patch for it is trivial, but:
> > Currently - I think - everything that sets waiting = true, also has
> > contents in pg_locks. Not sure if it will confuse users if that's not
> > the case anymore.
> 
> I think it will.  This is a case where a quick and dirty hack is nothing
> but quick and dirty.

Well, it's still better than the current situation of waiting and not
signalling it to anything externally visible.

I think Robert's suggestion of an additional
waiting_on=lock,lwlock,bufferpin might be a realistic way forward.

> I wonder whether we should not try to fix this by making the process wait
> on a heavyweight lock, if it has to wait.  That would also get us out of
> the rather grotty business of using a special-purpose signal to wake it
> up.  However, there's still a visibility problem, in that there'd be no
> way to tell which other processes are blocking it (which is the thing
> you *really* want to know).

That'd be neat, but I am not really sure how? Which lock would we be
waiting on? I don't really see heavyweight locks scaling up to buffer
pins?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] server vs foreign server inconsistency

2014-04-14 Thread Jaime Casanova
Hi,

A few days ago i was wondering why we use CREATE/DROP SERVER but then
when we want to GRANT/REVOKE we need to use FOREIGN SERVER.

of course options are:

1) modify both to accept both forms
2) modify one of them to accept both forms and use that way in all our
examples in docs
3) do nothing at all

IMHO i think 3 is not acceptable because it's annoying

opinions?

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Tom Lane
Andres Freund  writes:
> On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
>> On 13 April 2014 16:44, Andres Freund  wrote:
>>> What I am not sure about is how... It's trivial to set
>>> pg_stat_activity.waiting = true, but without a corresponding description
>>> what the backend is waiting for it's not exactly obvious what's
>>> happening. I think that's better than nothing, but maybe somebody has a
>>> glorious better idea.

>> pg_stat_activity.waiting = true

> Yes. That's what I suggested above. The patch for it is trivial, but:
> Currently - I think - everything that sets waiting = true, also has
> contents in pg_locks. Not sure if it will confuse users if that's not
> the case anymore.

I think it will.  This is a case where a quick and dirty hack is nothing
but quick and dirty.

I wonder whether we should not try to fix this by making the process wait
on a heavyweight lock, if it has to wait.  That would also get us out of
the rather grotty business of using a special-purpose signal to wake it
up.  However, there's still a visibility problem, in that there'd be no
way to tell which other processes are blocking it (which is the thing
you *really* want to know).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Robert Haas
On Mon, Apr 14, 2014 at 10:50 AM, Andres Freund  wrote:
> On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
>> On 13 April 2014 16:44, Andres Freund  wrote:
>> > On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
>> >> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund  
>> >> wrote:
>> >> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
>> >> > page. Sometimes for a long time. Without reporting it externally.
>> >> > Rather confusing ;).
>> >> >
>> >> > Since we only take cleanup locks around vacuum, how about we report at
>> >> > least in pgstat that we're waiting? At the moment, there's really no way
>> >> > to know if that's what's happening.
>> >>
>> >> That seems like a pretty good idea to me.
>> >
>> > What I am not sure about is how... It's trivial to set
>> > pg_stat_activity.waiting = true, but without a corresponding description
>> > what the backend is waiting for it's not exactly obvious what's
>> > happening. I think that's better than nothing, but maybe somebody has a
>> > glorious better idea.
>>
>> pg_stat_activity.waiting = true
>
> Yes. That's what I suggested above. The patch for it is trivial, but:
> Currently - I think - everything that sets waiting = true, also has
> contents in pg_locks. Not sure if it will confuse users if that's not
> the case anymore.

In my personal opinion, it would be OK to change that, provided that
we have some real good documentation for it.

Longer-term, I'm wondering if we shouldn't have something like
pg_stat_activity.wait_type instead of pg_stat_activity.waiting.  It
could be NULL when not waiting, or otherwise "lock", "lwlock", "buffer
cleanup", etc.

>> Easy to set the ps message also
>
> That actually makes it considerably more expensive since we'd need to
> save the old string somewhere. I am not sure it will be relevant, but
> it's not as easy a sell as just setting a single boolean.

Yeah, I'm not too sanguine about squeezing that part into 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] four minor proposals for 9.5

2014-04-14 Thread Tom Lane
Pavel Stehule  writes:
> 2014-04-14 14:57 GMT+02:00 Robert Haas :
>> I agree.  I don't think the idea of pushing this into the
>> log_line_prefix stuff as a one-off is a very good one.  Sure, we could
>> wedge it in there, but we've got an existing precedent that everything
>> that you can get with log_line_prefix also shows up in the CSV output
>> file.  And it's easy to imagine LOTS more counters that somebody might
>> want to have.

I think this argument is sufficient to kill this line of proposal
entirely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor improvements in alter_table.sgml

2014-04-14 Thread Robert Haas
On Fri, Apr 11, 2014 at 5:00 AM, Etsuro Fujita
 wrote:
> Attached is an updated version of the patch.

I applied the first two hunks of this, which seem like clear
oversights; and also the bit fixing the constraint_name language.

I think the other changes deserve to be considered separately, and in
particular I'm still not sure it's a good idea to document both OF
type_name and type_name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Andres Freund
On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
> On 13 April 2014 16:44, Andres Freund  wrote:
> > On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
> >> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund  
> >> wrote:
> >> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
> >> > page. Sometimes for a long time. Without reporting it externally.
> >> > Rather confusing ;).
> >> >
> >> > Since we only take cleanup locks around vacuum, how about we report at
> >> > least in pgstat that we're waiting? At the moment, there's really no way
> >> > to know if that's what's happening.
> >>
> >> That seems like a pretty good idea to me.
> >
> > What I am not sure about is how... It's trivial to set
> > pg_stat_activity.waiting = true, but without a corresponding description
> > what the backend is waiting for it's not exactly obvious what's
> > happening. I think that's better than nothing, but maybe somebody has a
> > glorious better idea.
> 
> pg_stat_activity.waiting = true

Yes. That's what I suggested above. The patch for it is trivial, but:
Currently - I think - everything that sets waiting = true, also has
contents in pg_locks. Not sure if it will confuse users if that's not
the case anymore.

> Easy to set the ps message also

That actually makes it considerably more expensive since we'd need to
save the old string somewhere. I am not sure it will be relevant, but
it's not as easy a sell as just setting a single boolean.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Signaling of waiting for a cleanup lock?

2014-04-14 Thread Simon Riggs
On 13 April 2014 16:44, Andres Freund  wrote:
> On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
>> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund  
>> wrote:
>> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
>> > page. Sometimes for a long time. Without reporting it externally.
>> > Rather confusing ;).
>> >
>> > Since we only take cleanup locks around vacuum, how about we report at
>> > least in pgstat that we're waiting? At the moment, there's really no way
>> > to know if that's what's happening.
>>
>> That seems like a pretty good idea to me.
>
> What I am not sure about is how... It's trivial to set
> pg_stat_activity.waiting = true, but without a corresponding description
> what the backend is waiting for it's not exactly obvious what's
> happening. I think that's better than nothing, but maybe somebody has a
> glorious better idea.

pg_stat_activity.waiting = true

can be done in 9.4 easily enough. Any objections to doing this?

Easy to set the ps message also

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-04-14 Thread Tom Lane
Fujii Masao  writes:
> On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas  wrote:
>> Should we try to install some hack around fastupdate for 9.4?  I fear
>> the divergence between reasonable values of work_mem and reasonable
>> sizes for that list is only going to continue to get bigger.  I'm sure
>> there's somebody out there who has work_mem = 16GB, and stuff like
>> 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
>> appeal of large values.

> Controlling the threshold of the size of pending list only by GUC doesn't
> seem reasonable. Users may want to increase the threshold only for the
> GIN index which can be updated heavily, and decrease it otherwise. So
> I think that it's better to add new storage parameter for GIN index to control
> the threshold, or both storage parameter and GUC.

Yeah, -1 for a GUC.  A GIN-index-specific storage parameter seems more
appropriate.  Or we could just hard-wire some maximum limit.  Is it
really likely that users would trouble to set such a parameter if it
existed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-14 Thread Simon Riggs
On 7 April 2014 05:06, Rajeev rastogi  wrote:


> *Autonomous Transaction Storage:*
>
> As for main transaction, structure PGXACT is used to store main
> transactions, which are created in shared memory of size:
>
> (Number of process)*sizeof(struct PGXACT)
>
> Similarly a new structure will be defined to store autonomous transaction:
>
> *Struct PGAutonomousXACT*
>

I already proposed exactly this design two years ago and it was rejected at
the PgCon hackers meeting.

I have a better design worked out now and will likely be working on it for
9.5

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-04-14 Thread Fujii Masao
On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas  wrote:
> On Thu, Mar 20, 2014 at 1:12 PM, Jesper Krogh  wrote:
>> On 15/03/14 20:27, Heikki Linnakangas wrote:
>>> That said, I didn't expect the difference to be quite that big when you're
>>> appending to the end of the table. When the new entries go to the end of the
>>> posting lists, you only need to recompress and WAL-log the last posting
>>> list, which is max 256 bytes long. But I guess that's still a lot more WAL
>>> than in the old format.
>>>
>>> That could be optimized, but I figured we can live with it, thanks to the
>>> fastupdate feature. Fastupdate allows amortizing that cost over several
>>> insertions. But of course, you explicitly disabled that...
>>
>> In a concurrent update environment, fastupdate as it is in 9.2 is not really
>> useful. It may be that you can bulk up insertion, but you have no control
>> over who ends up paying the debt. Doubling the amount of wal from
>> gin-indexing would be pretty tough for us, in 9.2 we generate roughly 1TB
>> wal / day, keeping it
>> for some weeks to be able to do PITR. The wal are mainly due to gin-index
>> updates as new data is added and needs to be searchable by users. We do run
>> gzip that cuts it down to 25-30% before keeping the for too long, but
>> doubling this is going to be a migration challenge.
>>
>> If fast-update could be made to work in an environment where we both have
>> users searching the index and manually updating it and 4+ backend processes
>> updating the index concurrently then it would be a good benefit to gain.
>>
>> the gin index currently contains 70+ million records with and average
>> tsvector of 124 terms.
>
> Should we try to install some hack around fastupdate for 9.4?  I fear
> the divergence between reasonable values of work_mem and reasonable
> sizes for that list is only going to continue to get bigger.  I'm sure
> there's somebody out there who has work_mem = 16GB, and stuff like
> 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
> appeal of large values.

Controlling the threshold of the size of pending list only by GUC doesn't
seem reasonable. Users may want to increase the threshold only for the
GIN index which can be updated heavily, and decrease it otherwise. So
I think that it's better to add new storage parameter for GIN index to control
the threshold, or both storage parameter and GUC.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add psql tab completion for event triggers

2014-04-14 Thread Michael Paquier
On Mon, Apr 14, 2014 at 9:46 PM, Robert Haas  wrote:
> On Wed, Apr 9, 2014 at 8:58 PM, Ian Barwick  wrote:
>> Apologies again, that was ill-thought out. Revised patch attached with only
>> the additions related to event triggers, and the small fix for ALTER TRIGGER
>> mentioned above which ensures "RENAME TO" is applied only when "ALTER
>> TRIGGER  ON " was input; currently there is no check for a
>> preceding "ALTER", resulting in the spurious "RENAME TO" when completing
>> "CREATE EVENT TRIGGER".
>
> OK, committed.
>
> (I know this was submitted rather late, but I think we've often
> allowed tab-completion fixups at similar times in past releases, since
> they are quite mechanical.  If anyone feels that I shouldn't have
> committed this, please advise.)
+1 for this change even if it came late in-game. Patch is
straight-forward, and this is always useful to have.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] four minor proposals for 9.5

2014-04-14 Thread Pavel Stehule
2014-04-14 14:57 GMT+02:00 Robert Haas :

> On Tue, Apr 8, 2014 at 12:34 PM, Gregory Smith 
> wrote:
> > On 4/6/14 2:46 PM, Pavel Stehule wrote:
> >> Proposed options are interesting for "enterprise" using, when you have a
> >> some more smart tools for log entry processing, and when you need a
> complex
> >> view about performance of billions queries - when cancel time and lock
> time
> >> is important piece in mosaic of server' fitness.
> >
> > I once sent a design proposal over for something I called "Performance
> > Events" that included this.  It will be difficult to get everything
> people
> > want to track into log_line_prefix macro form.  You're right that this
> > particular one can probably be pushed into there, but you're adding four
> > macros just for this feature. And that's only a fraction of what people
> > expect from database per-query performance metrics.
>
> I agree.  I don't think the idea of pushing this into the
> log_line_prefix stuff as a one-off is a very good one.  Sure, we could
> wedge it in there, but we've got an existing precedent that everything
> that you can get with log_line_prefix also shows up in the CSV output
> file.  And it's easy to imagine LOTS more counters that somebody might
> want to have.  Time spent planning, time spent executing, time spent
> waiting for disk I/O, time spent returning results to client, and I'm
> sure people will think of many others.  I think this will balloon out
> of control if we don't have a more systematic design for this sort of
> thing.
>


I am thinking about this feature - and it has a more dimensions

1. In context of relation databases I am thinking so query duration, and
query lock time are relative basic metric and then should be in
log_line_prefix

2. I can imagine, so someone would to monitor a lot of metric on query
level -  and we should to provide some method how to do it. log_line_prefix
should be or should not be solution.

Any log_line_prefix like solution can be useful. Do you have some idea
about future direction of PostgreSQL logging?

Requests:

a) possibility to choose a format: CSV, JSON, XML
b) possibility to choose a set of metrics: ...
c) possibility to choose a target: syslog, file, ..

Pavel


--
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Including replication slot data in base backups

2014-04-14 Thread Fujii Masao
On Thu, Apr 10, 2014 at 12:36 AM, Robert Haas  wrote:
> On Tue, Apr 8, 2014 at 3:08 AM, Michael Paquier
>  wrote:
>> On Tue, Apr 8, 2014 at 1:18 AM, Robert Haas  wrote:
>>> Not sure if this is exactly the right way to do it, but I agree that
>>> something along those lines is a good idea.  I also think, maybe even
>>> importantly, that we should probably document that people using
>>> file-copy based hot backups should strongly consider removing the
>>> replication slots by hand before using the backup.
>> Good point. Something here would be adapted in this case:
>> http://www.postgresql.org/docs/devel/static/backup-file.html
>> I am attaching an updated patch as well.
>
> What you've got here doesn't look like the right section to update -
> the section you've updated is on taking a cold backup.  The section I
> think you want to update is "Making a Base Backup Using the Low Level
> API", and specifically this part:
>
> You can, however, omit from the backup dump the files within the
> cluster's pg_xlog/ subdirectory. This slight adjustment is worthwhile
> because it reduces the risk of mistakes when restoring. This is easy
> to arrange if pg_xlog/ is a symbolic link pointing to someplace
> outside the cluster directory, which is a common setup anyway for
> performance reasons. You might also want to exclude postmaster.pid and
> postmaster.opts, which record information about the running
> postmaster, not about the postmaster which will eventually use this
> backup. (These files can confuse pg_ctl.)
>
> What I'd propose is adding a second paragraph like this:
>
> It is often a good idea to also omit from the backup dump the files
> within the cluster's pg_replslot/ directory, so that replication slots
> that exist on the master do not become part of the backup.  Otherwise,
> the subsequent use of the backup to create a standby may result in
> indefinite retention of WAL files on the standby, and possibly bloat
> on the master if hot standby feedback is enabled, because the clients
> that are using those replication slots will still be connecting to and
> updating the slots on the master, not the standby.  Even if the backup
> is only intended for use in creating a new master, copying the
> replication slots isn't expected to be particularly useful, since the
> contents of those slots will likely be badly out of date by the time
> the new master comes on line.

This makes me think that it's safer to just remove replication slot files
at the beginning of the recovery when both backup_label and recovery.conf
exist.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] four minor proposals for 9.5

2014-04-14 Thread Robert Haas
On Tue, Apr 8, 2014 at 12:34 PM, Gregory Smith  wrote:
> On 4/6/14 2:46 PM, Pavel Stehule wrote:
>> Proposed options are interesting for "enterprise" using, when you have a
>> some more smart tools for log entry processing, and when you need a complex
>> view about performance of billions queries - when cancel time and lock time
>> is important piece in mosaic of server' fitness.
>
> I once sent a design proposal over for something I called "Performance
> Events" that included this.  It will be difficult to get everything people
> want to track into log_line_prefix macro form.  You're right that this
> particular one can probably be pushed into there, but you're adding four
> macros just for this feature. And that's only a fraction of what people
> expect from database per-query performance metrics.

I agree.  I don't think the idea of pushing this into the
log_line_prefix stuff as a one-off is a very good one.  Sure, we could
wedge it in there, but we've got an existing precedent that everything
that you can get with log_line_prefix also shows up in the CSV output
file.  And it's easy to imagine LOTS more counters that somebody might
want to have.  Time spent planning, time spent executing, time spent
waiting for disk I/O, time spent returning results to client, and I'm
sure people will think of many others.  I think this will balloon out
of control if we don't have a more systematic design for this sort of
thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add psql tab completion for event triggers

2014-04-14 Thread Robert Haas
On Wed, Apr 9, 2014 at 8:58 PM, Ian Barwick  wrote:
> Apologies again, that was ill-thought out. Revised patch attached with only
> the additions related to event triggers, and the small fix for ALTER TRIGGER
> mentioned above which ensures "RENAME TO" is applied only when "ALTER
> TRIGGER  ON " was input; currently there is no check for a
> preceding "ALTER", resulting in the spurious "RENAME TO" when completing
> "CREATE EVENT TRIGGER".

OK, committed.

(I know this was submitted rather late, but I think we've often
allowed tab-completion fixups at similar times in past releases, since
they are quite mechanical.  If anyone feels that I shouldn't have
committed this, please advise.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-04-14 Thread David Rowley
On 14 April 2014 03:31, Tom Lane  wrote:

> David Rowley  writes:
> > On this thread
> > http://www.postgresql.org/message-id/52c6f712.6040...@student.kit.eduthere
> > was some discussion around allowing push downs of quals that happen to be
> > in every window clause of the sub query. I've quickly put together a
> patch
> > which does this (see attached)
>
> I think you should have check_output_expressions deal with this, instead.
> Compare the existing test there for non-DISTINCT output columns.
>
>
I've moved the code there and it seems like a much better place for it.
Thanks for the tip.


> > Oh and I know that my function
> var_exists_in_all_query_partition_by_clauses
> > has no business in allpaths.c, I'll move it out as soon as I find a
> better
> > home for it.
>
> I might be wrong, but I think you could just embed that search loop in
> check_output_expressions, and it wouldn't be too ugly.
>
>
I've changed the helper function to take the TargetEntry now the same
as targetIsInSortList does for the hasDistinctOn test and I renamed the
helper function to targetExistsInAllQueryPartitionByClauses. It seems much
better, but there's probably a bit of room for improving on the names some
more.

I've included the updated patch with some regression tests.
The final test I added tests that an unused window which would, if used,
cause the pushdown not to take place still stops the pushdownf from
happening... I know you both talked about this case in the other thread,
but I've done nothing for it as Tom mentioned that this should likely be
fixed elsewhere, if it's even worth worrying about at all. I'm not quite
sure if I needed to bother including this test for it, but I did anyway, it
can be removed if it is deemed unneeded.

Per Thomas' comments, I added a couple of tests that ensure that the order
of the columns in the partition by clause don't change the behaviour.
Looking at the implementation of the actual code makes this test seems a
bit unneeded as really but I thought that the test should not assume
anything about the implementation so from that point of view the extra test
seems like a good idea.

For now I can't think of much else to do for the patch, but I'd really
appreciate it Thomas if you could run it through the paces if you can find
any time for it, or maybe just give my regression tests a once over to see
if you can think of any other cases that should be covered.

Regards

David Rowley


wfunc_pushdown_partitionby_v0.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Window function optimisation, allow pushdowns of items matching PARTITION BY clauses

2014-04-14 Thread David Rowley
On 14 April 2014 02:50, Thomas Mayer  wrote:

> Hello David,
>
> thanks for your work. The results look promising.
>

Thanks


>
> What I'm missing is a test case with multiple fields in the partition by
> clauses:
>
>
I've modified the patch and added some regression tests that I think cover
all of your cases, but please let me know if I've missed any. The patch
will follow very soon.


> -- should push down, because partid is part of all PARTITION BY clauses
> explain analyze select partid,n,m from (
>   select partid,
>   count(*) over (partition by partid) n,
>   count(*) over (partition by partid, partid+0) m
>   from winagg
> ) winagg
> where partid=1;
>
> current production 9.3.4 is returning
>
>
>   QUERY PLAN
> 
> 
> --
>  Subquery Scan on winagg  (cost=350955.11..420955.11 rows=20 width=20)
> (actual time=2564.360..3802.413 rows=20 loops=1)
>
>Filter: (winagg.partid = 1)
>Rows Removed by Filter: 180
>->  WindowAgg  (cost=350955.11..395955.11 rows=200 width=4) (actual
> time=2564.332..3657.051 rows=200 loops=1)
>  ->  Sort  (cost=350955.11..355955.11 rows=200 width=4)
> (actual time=2564.320..2802.444 rows=200 loops=1)
>Sort Key: winagg_1.partid, ((winagg_1.partid + 0))
>Sort Method: external sort  Disk: 50840kB
>->  WindowAgg  (cost=0.43..86948.43 rows=200 width=4)
> (actual time=0.084..1335.081 rows=200 loops=1)
>  ->  Index Only Scan using winagg_partid_idx on winagg
> winagg_1  (cost=0.43..51948.43 rows=200 width=4) (actual
> time=0.051..378.232 rows=200 loops=1)
>Heap Fetches: 0
>
> "Index Only Scan" currently returns all rows (without pushdown) on current
> production 9.3.4. What happens with the patch you provided?
>
>
I get a push down as expected.

 QUERY
PLAN

 Subquery Scan on winagg  (cost=82.71..83.31 rows=20 width=20) (actual
time=0.168..0.179 rows=20 loops=1)
   ->  WindowAgg  (cost=82.71..83.11 rows=20 width=4) (actual
time=0.166..0.174 rows=20 loops=1)
 ->  Sort  (cost=82.71..82.76 rows=20 width=4) (actual
time=0.151..0.154 rows=20 loops=1)
   Sort Key: ((winagg_1.partid + 0))
   Sort Method: quicksort  Memory: 17kB
   ->  WindowAgg  (cost=4.58..82.28 rows=20 width=4) (actual
time=0.127..0.135 rows=20 loops=1)
 ->  Bitmap Heap Scan on winagg winagg_1
 (cost=4.58..81.98 rows=20 width=4) (actual time=0.058..0.104 rows=20
loops=1)
   Recheck Cond: (partid = 1)
   Heap Blocks: exact=20
   ->  Bitmap Index Scan on winagg_partid_idx
 (cost=0.00..4.58 rows=20 width=0) (actual time=0.037..0.037 rows=20
loops=1)
 Index Cond: (partid = 1)
 Planning time: 0.235 ms
 Total runtime: 0.280 ms



> -- Already Part of your tests:
> -- should NOT push down, because partid is NOT part of all PARTITION BY
> clauses
> explain analyze select partid,n,m from (
>   select partid,
>   count(*) over (partition by partid) n,
>   count(*) over (partition by partid+0) m
>   from winagg
> ) winagg
> where partid=1;
>
> Reordering the fields should also be tested:
> -- should push down, because partid is part of all PARTITION BY clauses
> -- here: partid at the end
> explain analyze select partid,n,m from (
>   select partid,
>   count(*) over (partition by partid) n,
>   count(*) over (partition by partid+0, partid) m
>   from winagg
> ) winagg
> where partid=1;
>
>
Covered in regression and works as expected.


> -- should push down, because partid is part of all PARTITION BY clauses
> -- here: partid in the middle
> explain analyze select partid,n,m from (
>   select partid,
>   count(*) over (partition by partid) n,
>   count(*) over (partition by partid+0, partid, partid+1) m
>   from winagg
> ) winagg
> where partid=1;
>
>
I covered this in the regression tests too.

Regards

David Rowley


Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-14 Thread Simon Riggs
On 24 March 2014 10:25, Kouhei Kaigai  wrote:

> Brief summary of the current approach that has been revised from my
> original submission through the discussion on pgsql-hackers:
>
> The plannode was renamed to CustomPlan, instead of CustomScan, because
> it dropped all the hardcoded portion that assumes the custom-node shall
> perform as alternative scan or join method, because it prevents this
> custom-node to perform as other stuff; like sort or append potentially.
> According to the suggestion by Tom, I put a structure that contains
> several function pointers on the new CustomPlan node, and extension will
> allocate an object that extends CustomPlan node.
> It looks like polymorphism in object oriented programming language.
> The core backend knows abstracted set of methods defined in the
> tables of function pointers, and extension can implement its own logic
> on the callback, using private state on the extended object.

I just wanted to add some review comments here. I also apologise for
not reviewing this earlier; I had misunderstood the maturity of the
patch and had assumed it was a request for comments/WIP.

Overall, I very much support the concept of providing for alternate
scans. I like the placement of calls in the optimizer and we'll be
able to do much with that. Other comments in order that I consider
them important.

* There is no declarative structure for this at all. I was expecting
to see a way to declare that a specific table might have an alternate
scan path, but we just call the plugin always and it has to separately
make a cache lookup to see if anything extra is needed. The Index AM
allows us to perform scans, yet indexes are very clearly declared and
easily and clearly identifiable. We need the same thing for alternate
plans.

* There is no mechanism at all for maintaining other data structures.
Are we supposed to use the Index AM? Triggers? Or? The lack of clarity
there is disturbing, though I could be simply missing something big
and obvious.

* There is no catalog support. Complex APIs in Postgres typically have
a structure like pg_am which allows the features to be clearly
identified. I'd be worried about our ability to keep track of so many
calls in such pivotal places without that. I want to be able to know
what a plugin is doing, especially when it will likely come in binary
form. I don't see an easy way to have plugins partially override each
other or work together. What happens when I want to use Mr.X's clever
new join plugin at the same time as Mr.Y's GPU accelerator?

* How do we control security? What stops the Custom Scan API from
overriding privileges? Shouldn't the alternate data structures be
recognised as objects so we can grant privileges? Or do we simply say
if an alternate data structure is linked to a heap then has implied
privileges. It would be a shame to implement better security in one
patch and then ignore it in another (from the same author).

All of the above I can let pass in this release, but in the longer
term we need to look for more structure around these ideas so we can
manage and control what happens. The way this is now is quite raw -
suitable for R&D but not for longer term production usage by a wider
audience, IMHO. I wouldn't like to make commitments about the
longevity of this API either; if we accept it, it should have a big
"may change radically" sign hanging on it. Having said that, I am
interested in progress here and I accept that things will look like
this at this stage of the ideas process, so these are not things to
cause delay.

Some things I would like to see change on in this release are...

* It's not clear to me when we load/create the alternate data
structures. That can't happen at _init time. I was expecting this to
look like an infrastructure for unlogged indexes, but it doesn't look
like that either.

* The name Custom makes me nervous. It sounds too generic, as if the
design or objectives for this is are a little unclear. AlternateScan
sounds like a better name since its clearer that we are scanning an
alternate data structure rather than the main heap.

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.



Two additional use cases I will be looking to explore will be

* Code to make Mat Views recognised as alternate scan targets
* Code to allow queries access to sampled data rather the fully
detailed data, if the result would be within acceptable tolerance for
user

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with txid_snapshot_in/out() functionality

2014-04-14 Thread Andres Freund
On 2014-04-14 12:15:30 +0300, Heikki Linnakangas wrote:
> Hmm. There's a field in GlobalTransactionData called locking_xid, which is
> used to mark the XID of the transaction that's currently operating on the
> prepared transaction. At prepare, that ensures that the transaction cannot
> be committed or rolled back by another backend until the original backend
> has cleared its PGPROC entry. At COMMIT/ROLLBACK PREPARED, it ensures that
> only one backend can commit/rollback the transaction.
> 
> I wonder why we don't use a VirtualTransactionId there.

I wondered about that previously as well. My bet it's because the 2pc
support arrived before the virtualxact stuff...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with txid_snapshot_in/out() functionality

2014-04-14 Thread Heikki Linnakangas

On 04/12/2014 05:03 PM, Andres Freund wrote:

On 2014-04-12 09:47:24 -0400, Tom Lane wrote:

Heikki Linnakangas  writes:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.
The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation.



It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.


Hm, yeah, but why is that intermediate state visible to anyone else?
Don't we have exclusive lock on the PGPROC array while we're doing this?


It's done outside the remit of ProcArray lock :(. And documented to lead
to duplicate xids in PGXACT.
EndPrepare():
/*
 * Mark the prepared transaction as valid.  As soon as xact.c marks
 * MyPgXact as not running our XID (which it will do immediately after
 * this function returns), others can commit/rollback the xact.
 *
 * NB: a side effect of this is to make a dummy ProcArray entry for the
 * prepared XID.  This must happen before we clear the XID from 
MyPgXact,
 * else there is a window where the XID is not running according to
 * TransactionIdIsInProgress, and onlookers would be entitled to assume
 * the xact crashed.  Instead we have a window where the same XID 
appears
 * twice in ProcArray, which is OK.
 */
MarkAsPrepared(gxact);

It doesn't sound too hard to essentially move PrepareTransaction()'s
ProcArrayClearTransaction() into MarkAsPrepared() and rejigger the
locking to remove the intermediate state. But I think it'll lead to
bigger changes than we'd be comfortable backpatching.


Hmm. There's a field in GlobalTransactionData called locking_xid, which 
is used to mark the XID of the transaction that's currently operating on 
the prepared transaction. At prepare, that ensures that the transaction 
cannot be committed or rolled back by another backend until the original 
backend has cleared its PGPROC entry. At COMMIT/ROLLBACK PREPARED, it 
ensures that only one backend can commit/rollback the transaction.


I wonder why we don't use a VirtualTransactionId there. AFAICS there is 
no reason for COMMIT/ROLLBACK PREPARED to be assigned an XID of its own. 
And if we used a VirtualTransactionId there, prepare could clear the xid 
field of the PGPROC entry earlier.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor improvements in create_foreign_table.sgml

2014-04-14 Thread Etsuro Fujita
Attached is a small patch to improve create_foreign_table.sgml.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/doc/src/sgml/ref/create_foreign_table.sgml 
b/doc/src/sgml/ref/create_foreign_table.sgml
index 06a7087..4a8cf38 100644
--- a/doc/src/sgml/ref/create_foreign_table.sgml
+++ b/doc/src/sgml/ref/create_foreign_table.sgml
@@ -49,7 +49,7 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name
schema.  Otherwise it is created in the current schema.
The name of the foreign table must be
distinct from the name of any other foreign table, table, sequence, index,
-   or view in the same schema.
+   view, or materialized view in the same schema.
   
 
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding unsigned 256 bit integers

2014-04-14 Thread Olivier Lalonde
Thanks for helping me out everyone. I ended up simply using the numeric
type (I didn't realize it could support such large numbers) and writing the
hex-to-numeric conversion functions in my application code.


On 11 April 2014 12:27, Leon Smith  wrote:

> pgmp is also worth mentioning here,   and it's likely to be more efficient
> than the numeric type or something you hack up yourself:
>
> http://pgmp.projects.pgfoundry.org/
>
> Best,
> Leon
>
>
> On Thu, Apr 10, 2014 at 10:11 AM, k...@rice.edu  wrote:
>
>> On Thu, Apr 10, 2014 at 09:13:47PM +0800, Olivier Lalonde wrote:
>> > I was wondering if there would be any way to do the following in
>> PostgreSQL:
>> >
>> > UPDATE cryptotable SET work = work + 'some big hexadecimal number'
>> >
>> > where work is an unsigned 256 bit integer. Right now my column is a
>> > character varying(64) column (hexadecimal representation of the number)
>> but
>> > I would be happy to switch to another data type if it lets me do the
>> > operation above.
>> >
>> > If it's not possible with vanilla PostgreSQL, are there extensions that
>> > could help me?
>> >
>> > --
>> > - Oli
>> >
>> > Olivier Lalonde
>> > http://www.syskall.com <-- connect with me!
>> >
>>
>> Hi Olivier,
>>
>> Here are some sample pl/pgsql helper functions that I have written for
>> other purposes. They use integers but can be adapted to use numeric.
>>
>> Regards,
>> Ken
>> ---
>> CREATE OR REPLACE FUNCTION hex2dec(t text) RETURNS integer AS $$
>> DECLARE
>>   r RECORD;
>> BEGIN
>>   FOR r IN EXECUTE 'SELECT x'''||t||'''::integer AS hex' LOOP
>> RETURN r.hex;
>>   END LOOP;
>> END
>> $$ LANGUAGE plpgsql IMMUTABLE STRICT;
>> ---
>>
>> ---
>> CREATE OR REPLACE FUNCTION bytea2int (
>>   in_string BYTEA
>> ) RETURNS INTEGER AS $$
>>
>> DECLARE
>>
>>   b1 INTEGER := 0;
>>   b2 INTEGER := 0;
>>   b3 INTEGER := 0;
>>   b4 INTEGER := 0;
>>   out_int INTEGER := 0;
>>
>> BEGIN
>>
>>   CASE OCTET_LENGTH(in_string)
>> WHEN 1 THEN
>>   b4 := get_byte(in_string, 0);
>> WHEN 2 THEN
>>   b3 := get_byte(in_string, 0);
>>   b4 := get_byte(in_string, 1);
>> WHEN 3 THEN
>>   b2 := get_byte(in_string, 0);
>>   b3 := get_byte(in_string, 1);
>>   b4 := get_byte(in_string, 2);
>> WHEN 4 THEN
>>   b1 := get_byte(in_string, 0);
>>   b2 := get_byte(in_string, 1);
>>   b3 := get_byte(in_string, 2);
>>   b4 := get_byte(in_string, 3);
>>   END CASE;
>>
>>   out_int := (b1 << 24) + (b2 << 16) + (b3 << 8) + b4;
>>
>>   RETURN(out_int);
>> END;
>> $$ LANGUAGE plpgsql IMMUTABLE;
>> ---
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>


-- 
- Oli

Olivier Lalonde
http://www.syskall.com <-- connect with me!

Freelance web and Node.js engineer
Skype: o-lalonde


Re: [HACKERS] Problem with txid_snapshot_in/out() functionality

2014-04-14 Thread Marko Kreen
On Sun, Apr 13, 2014 at 05:46:20PM -0400, Jan Wieck wrote:
> On 04/13/14 14:22, Jan Wieck wrote:
> >On 04/13/14 08:27, Marko Kreen wrote:
> >>I think you need to do SET_VARSIZE also here.  Alternative is to
> >>move SET_VARSIZE after sort_snapshot().
> >>
> >>And it seems the drop-double-txid logic should be added also to
> >>txid_snapshot_recv().  It seems weird to have it behave differently
> >>from txid_snapshot_in().
> >>
> >
> >Thanks,
> >
> >yes on both issues. Will create another patch.
> 
> New patch attached.
> 
> New github commit is 
> https://github.com/wieck/postgres/commit/b8fd0d2eb78791e5171e34aecd233fd06218890d

Looks OK to me.

-- 
marko



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-04-14 Thread Heikki Linnakangas

On 04/13/2014 11:39 PM, Heikki Linnakangas wrote:

However, I just noticed that there's a race condition between PREPARE
TRANSACTION and COMMIT/ROLLBACK PREPARED. PostPrepare_Locks runs after
the prepared transaction is already marked as fully prepared. That means
that by the time we get to PostPrepare_Locks, another backend might
already have finished and removed the prepared transaction. That leads
to a PANIC (put a breakpoint just before PostPrepare_Locks):

postgres=# commit prepared 'foo';
PANIC:  failed to re-find shared proclock object
PANIC:  failed to re-find shared proclock object
The connection to the server was lost. Attempting reset: Failed.

FinishPrepareTransaction reads the list of locks from the two-phase
state file, but PANICs when it doesn't find the corresponding locks in
the lock manager (because PostPrepare_Locks hasn't transfered them to
the dummy PGPROC yet).

I think we'll need to transfer of the locks earlier, before the
transaction is marked as fully prepared. I'll take a closer look at this
tomorrow.


Here's a patch to do that. It's very straightforward, I just moved the 
calls to transfer locks earlier, before ProcArrayClearTransaction. 
PostPrepare_MultiXact had a similar race -  it also transfer state from 
the old PGPROC entry to the new, and needs to be done before allowing 
another backend to remove the new PGPROC entry. I changed the names of 
the functions to distinguish them from the other PostPrepare_* functions 
that now happen at a different time.


The patch is simple, but it's a bit scary to change the order of things 
like this. Looking at all the calls that now happen after transferring 
the locks, I believe this is OK. The change also applies to the 
callbacks called by the RegisterXactCallback mechanism, which means that 
in theory there might be a 3rd party extension out there that's 
affected. All the callbacks in contrib and plpgsql are OK, and it's 
questionable to do anything complicated that would depend on 
heavy-weight locks to be held in those callbacks, so I think this is OK. 
Warrants a note in the release notes, though.


- Heikki

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index d4ad678..b505c62 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1580,11 +1580,12 @@ AtPrepare_MultiXact(void)
 }
 
 /*
- * PostPrepare_MultiXact
- *		Clean up after successful PREPARE TRANSACTION
+ * TransferPrepare_MultiXact
+ *		Called after successful PREPARE TRANSACTION, before releasing our
+ *		PGPROC entry.
  */
 void
-PostPrepare_MultiXact(TransactionId xid)
+TransferPrepare_MultiXact(TransactionId xid)
 {
 	MultiXactId myOldestMember;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b20d973..c60edf1 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2224,6 +2224,16 @@ PrepareTransaction(void)
 	XactLastRecEnd = 0;
 
 	/*
+	 * Transfer any heavy-weight locks we're holding to the dummy ProcArray.
+	 *
+	 * NB: this has to be done before clearing our own ProcArray entry.
+	 * This is different from normal commit, where locks are released after
+	 * clearing the ProcArray entry!
+	 */
+	TransferPrepare_MultiXact(xid);	  /* also transfer our multixact state */
+	TransferPrepare_Locks(xid);
+
+	/*
 	 * Let others know about no transaction in progress by me.	This has to be
 	 * done *after* the prepared transaction has been marked valid, else
 	 * someone may think it is unlocked and recyclable.
@@ -2234,6 +2244,10 @@ PrepareTransaction(void)
 	 * This is all post-transaction cleanup.  Note that if an error is raised
 	 * here, it's too late to abort the transaction.  This should be just
 	 * noncritical resource releasing.	See notes in CommitTransaction.
+	 *
+	 * NB: we already transfered the locks to the prepared ProcArray entry,
+	 * so even the cleanup before ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS)
+	 * cannot rely on any heavy-weight locks being held!
 	 */
 
 	CallXactCallbacks(XACT_EVENT_PREPARE);
@@ -2256,11 +2270,12 @@ PrepareTransaction(void)
 
 	PostPrepare_smgr();
 
-	PostPrepare_MultiXact(xid);
-
-	PostPrepare_Locks(xid);
 	PostPrepare_PredicateLocks(xid);
 
+	/*
+	 * we're not actually holding any locks anymore, but clean up any other
+	 * resources that might need to be cleaned up at this stage.
+	 */
 	ResourceOwnerRelease(TopTransactionResourceOwner,
 		 RESOURCE_RELEASE_LOCKS,
 		 true, true);
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 6825063..779f0cb 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3069,8 +3069,8 @@ AtPrepare_Locks(void)
 }
 
 /*
- * PostPrepare_Locks
- *		Clean up after successful PREPARE
+ * TransferPrepare_Locks
+ *		Transfer locks to prepared transaction after successful PREPARE
  *
  * Here, we want to transfer ownership of our locks to a dummy PGPROC