Re: [HACKERS] pgcrypto missing header file inclusions

2013-12-30 Thread Marko Kreen
On Sat, Dec 28, 2013 at 10:36:19PM -0500, Peter Eisentraut wrote:
 While playing around with the pginclude tools, I noticed that pgcrypto
 header files are failing to include some header files whose symbols they
 use.  This change would fix it:
 
 diff --git a/contrib/pgcrypto/pgp.h b/contrib/pgcrypto/pgp.h
 index 3022abf..f856e07 100644
 --- a/contrib/pgcrypto/pgp.h
 +++ b/contrib/pgcrypto/pgp.h
 @@ -29,6 +29,9 @@
   * contrib/pgcrypto/pgp.h
   */
  
 +#include mbuf.h
 +#include px.h
 +
  enum PGP_S2K_TYPE
  {
 PGP_S2K_SIMPLE = 0,
 
 Does that look reasonable?

It's a style question - currently pgp.h expects other headers to be
included in .c files.   Including them in pgp.h might be better style,
but then please sync .c files with it too.

-- 
marko



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polymorphic function calls

2013-12-30 Thread knizhnik

On 12/30/2013 01:22 AM, Sergey Konoplev wrote:

On Sun, Dec 29, 2013 at 8:44 AM, knizhnik knizh...@garret.ru wrote:

create function volume(r base_table) returns integer as $$ begin return
r.x*r.y; end; $$ language plpgsql strict stable;
create function volume(r derived_table) returns integer as $$ begin return
r.x*r.y*r.z; end; $$ language plpgsql strict stable;

[...]


Execution plans of first and second queries are very similar.
The difference is that type of r_1 in first query is base_table.
It is obvious that query should return fixed set of columns, so

 select * from base_table;

can not return z column.
But passing direved_table type instead of base_table type to volume()
function for record belonging to derived_table seems to be possible and not
contradicting something, isn't it?

Correct.

Postgres chooses a function based on the passed signature. When you
specify base_table it will choose volume(base_table) and for
base_table it will be volume(derived_table) as well.

I think you mean and for derived_table it will be base_table as well.
Certainly I understand the reasons of such behavior and that it will be 
difficult to introduce some other non-contradictory model...


But polymorphism is one of the basic features of OO approach. Definitely 
PostgreSQL is not OODBMS. But it supports  inheritance.
And I wonder if it is possible/desirable to make an exception for 
function invocation rules for derived tables?


Actually a query on table with inherited tables is implemented as join 
of several independent table traversals.
But for all derived tables we restrict context to the scope of base 
table. If it is possible to leave real record type when it is passed to 
some function, we can support polymorphic calls...


Will it violate some principles/rules? I am not sure...



FYI, there is a common practice to follow the DRY principle with
inheritance and polymorphic functions in Postgres. On your example it
might be shown like this:

create function volume(r base_table) returns integer as $$ begin
return r.x*r.y;
end; $$ language plpgsql strict stable;

create function volume(r derived_table) returns integer as $$ begin
return volume(r::base_table) *r.z;
end; $$ language plpgsql strict stable;



I do not completely understand you here.
Function of derived class can refere to the overridden function when  
it's result depends on result of the overridden function.
But it is not always true, for example you can derived class Circle from 
Ellipse, but formula for circle volume do not need to use implementation 
of volume for ellipse.


In any case, my question was not how to implement polymorphic volume 
function.
I asked whether it is possible to change PostgreSQL function lookup 
rules to support something like virtual calls.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69

2013-12-30 Thread Andres Freund
On 2013-12-29 02:48:21 -0500, Tom Lane wrote:
 4. The server tries to start, and fails because it can't find a WAL file
 containing the last checkpoint record.  This is pretty unsurprising given
 the facts above.  The reason you don't see any no such file report is
 that XLogFileRead() will report any BasicOpenFile() failure *other than*
 ENOENT.  And nothing else makes up for that.
 
 Re point 4: the logic, if you can call it that, in xlog.c and xlogreader.c
 is making my head spin.  There are about four levels of overcomplicated
 and undercommented code before you ever get down to XLogFileRead, so I
 have no idea which level to blame for the lack of error reporting in this
 specific case.  But there are pretty clearly some cases in which ignoring
 ENOENT in XLogFileRead isn't such a good idea, and XLogFileRead isn't
 being told when to do that or not.

Yes, that code is pretty horrid. To Heikki's and my defense, I don't
think the xlogreader.c split had much to do with it tho. I think the
path erroring out essentially is

ReadRecord()-XLogReadRecord()*-ReadPageInternal()*-XLogPageRead()
-WaitForWALToBecomeAvailable()-XLogFileReadAnyTLI()-XLogFileRead()

The *ed functions are new, but it's really code that was in ReadRecord()
before. So I don't think too much has changed since 9.0ish, although the
timeline switch didn't make it simpler.

As far as I can tell XLogFileRead() actually is told when it's ok to
ignore an error - the notfoundOK parameter. It's just that we're always
passing true for it we're not streaming...
I think it might be sufficient to make passing that flag additionally
conditional on fetching_ckpt, that's already passed to
WaitForWALToBecomeAvailable(), so we'd just need to add it to
XLogFileReadAnyTLI().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Heikki Linnakangas

On 12/30/2013 05:57 AM, Peter Geoghegan wrote:

Now, when you actually posted the prototype, I realized that it was
the same basic design that I'd cited in my very first e-mail on the
IGNORE patch (the one that didn't have row locking at all) - nobody
else wanted to do heap insertion first for promise tuples. I read that
2007 thread [1] a long time ago, but that recognition only came when
you posted your first prototype, or perhaps shortly before when you
started participating on list.


Ah, I didn't remember that thread. Yeah, apparently I proposed the exact 
same design back then. Simon complained about the dead tuples being left 
behind, but I don't think that's a big issue with the design we've been 
playing around now; you only end up with dead tuples when two backends 
try to insert the same value concurrently, which shouldn't happen very 
often. Other than that, there wasn't any discussion on whether that's a 
good approach or not.



In short, I think that my approach may be better because it doesn't
conflate row locking with value locking (therefore making it easier
to reason about, maintaining modularity),


You keep saying that, but I don't understand what you mean. With your 
approach, an already-inserted heap tuple acts like a value lock, just 
like in my approach. You have the page-level locks on b-tree pages in 
addition to that, but the heap-tuple based mechanism is there too.



I'm not sure that this is essential to your design, and I'm not sure
what your thoughts are on this, but Andres has defended the idea of
promise tuples that lock old values indefinitely pending the outcome
of another xact where we'll probably want to update, and I think this
is a bad idea. Andres recently seemed less convinced of this than in
the past [2], but I'd like to hear your thoughts. It's very pertinent,
because I think releasing locks needs to be cheap, and rendering old
promise tuples invisible is not cheap.


Well, killing an old promise tuple is not cheap, but it shouldn't happen 
often. In both approaches, what probably matters more is the overhead of 
the extra heavy-weight locking. But this is all speculation, until we 
see some benchmarks.



I said that I have limited enthusiasm for
expanding the modularity violation that exists within the btree AM.
Based on what Andres has said in the recent past on this thread about
the current btree code, that in my opinion, bt_check_unique() doing
[locking heap and btree buffers concurrently] is a bug that needs
fixing [3], can you really blame me? What this patch does not need is
another controversy. It seems pretty reasonable and sane that we'd
implement this by generalizing from the existing mechanism.


_bt_check_unique() is a modularity violation, agreed. Beauty is in the 
eye of the beholder, I guess, but I don't see either patch making that 
any better or worse.



Now, enough with the venting. Back to drawing board, to figure out how best
to fix the deadlock issue with the insert_on_dup-kill-on-conflict-2.patch.
Please help me with that.


I will help you. I'll look at it tomorrow.


Thanks!


[1] 
http://www.postgresql.org/message-id/1172858409.3760.1618.ca...@silverbirch.site

[2] 
http://www.postgresql.org/message-id/20131227075453.gb17...@alap2.anarazel.de

[3] 
http://www.postgresql.org/message-id/20130914221524.gf4...@awork2.anarazel.de


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Andres Freund
On 2013-12-29 19:57:31 -0800, Peter Geoghegan wrote:
 On Sun, Dec 29, 2013 at 8:18 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  My position is not based on a gut feeling. It is based on carefully
  considering the interactions of the constituent parts, plus the
  experience of actually building a working prototype.
 
 
  I also carefully considered all that stuff, and reached a different
  conclusion. Plus I also actually built a working prototype (for some
  approximation of working - it's still a prototype).
 
 Well, clearly you're in agreement with me about unprincipled
 deadlocking. That's what I was referring to here.

Maybe you should describe what you mean with unprincipled. Sure, the
current patch deadlocks, but I don't see anything fundamental,
unresolvable there. So I don't understand what the word unprincipled
means in that sentence..

 Andres recently seemed less convinced of this than in
 the past [2], but I'd like to hear your thoughts.

Not really, I just don't have the time/energy to fight for it (aka write
a POC) at the moment.
I still think any form of promise tuple, be it index, or heap based,
it's a much better, more general, approach than yours. That doesn't
preclude other approaches from being workable though.

 I didn't say that locking index pages is obviously better, and I
 certainly didn't say anything about what you've done being
 fundamentally flawed. I said that I have limited enthusiasm for
 expanding the modularity violation that exists within the btree AM.
 Based on what Andres has said in the recent past on this thread about
 the current btree code, that in my opinion, bt_check_unique() doing
 [locking heap and btree buffers concurrently] is a bug that needs
 fixing [3], can you really blame me?

Uh. But that was said in the context of *your* approach being
flawed. Because it - at that time, I didn't look at the newest version
yet - extended the concept of holding btree page locks across external
operation to far much more code, even including user defined code!. And
you argued that that isn't a problem using _bt_check_unique() as an
argument.

I don't really see why your patch is less of a modularity violation than
Heikki's POC. It's just a different direction.

  PS. In btreelock_insert_on_dup_v5.2013_12_28.patch, the language used in the
  additional text in README is quite difficult to read. Too many difficult
  sentences and constructs for a non-native English speaker like me. I had to
  look up concomitantly in a dictionary and I'm still not sure I understand
  that sentence :-).

+1 on the simpler language front as a fellow non-native speaker.

Personally, the biggest thing I think you could do in favor of your
position, is trying to be a bit more succinct in the mailing list
discussions. I certainly fail at times at that as well, but I really try
to work on it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polymorphic function calls

2013-12-30 Thread Sergey Konoplev
On Mon, Dec 30, 2013 at 2:03 AM, knizhnik knizh...@garret.ru wrote:
 On 12/30/2013 01:22 AM, Sergey Konoplev wrote:
 On Sun, Dec 29, 2013 at 8:44 AM, knizhnik knizh...@garret.ru wrote:
 But passing direved_table type instead of base_table type to volume()
 function for record belonging to derived_table seems to be possible and
 not
 contradicting something, isn't it?

 Correct.

 Postgres chooses a function based on the passed signature. When you
 specify base_table it will choose volume(base_table) and for
 base_table it will be volume(derived_table) as well.

 I think you mean and for derived_table it will be base_table as well.

Sure. Sorry for the typo.

 Certainly I understand the reasons of such behavior and that it will be
 difficult to introduce some other non-contradictory model...

 But polymorphism is one of the basic features of OO approach. Definitely
 PostgreSQL is not OODBMS. But it supports  inheritance.
 And I wonder if it is possible/desirable to make an exception for function
 invocation rules for derived tables?

 Actually a query on table with inherited tables is implemented as join of
 several independent table traversals.

I would say it is more like union,

 But for all derived tables we restrict context to the scope of base table.
 If it is possible to leave real record type when it is passed to some
 function, we can support polymorphic calls...

 Will it violate some principles/rules? I am not sure...

Well, it is not implemented in Postgres.

 FYI, there is a common practice to follow the DRY principle with
 inheritance and polymorphic functions in Postgres. On your example it
 might be shown like this:

 create function volume(r base_table) returns integer as $$ begin
 return r.x*r.y;
 end; $$ language plpgsql strict stable;

 create function volume(r derived_table) returns integer as $$ begin
 return volume(r::base_table) *r.z;
 end; $$ language plpgsql strict stable;

 I do not completely understand you here.
 Function of derived class can refere to the overridden function when  it's
 result depends on result of the overridden function.
 But it is not always true, for example you can derived class Circle from
 Ellipse, but formula for circle volume do not need to use implementation of
 volume for ellipse.

Sure, it is not universal thing.

 In any case, my question was not how to implement polymorphic volume
 function.
 I asked whether it is possible to change PostgreSQL function lookup rules to
 support something like virtual calls.

You can use this hack to make it.

create or replace function volume(r base_table, t text) returns integer as $$
declare v integer;
begin execute format('select volume(r) from %I r where r.x = %L', t,
r.x) into v;
return v;
end; $$ language plpgsql;

select volume(r, tableoid::regclass::text) from base_table r;
 volume

  2
 60
(2 rows)

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2013-12-30 Thread Christian Kruse
Hi there,

I created two patches improving the log messages generated by
log_lock_waits. The first patch shows the process IDs holding a lock
we try to acquire (show_pids_in_lock_log.patch), sample output
(log_lock_waits=on required):

session 1: BEGIN; LOCK TABLE foo IN SHARE MODE;
session 2: BEGIN; LOCK TABLE foo IN SHARE MODE;
session 3: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE;

Output w/o patch:

LOG:  process 13777 still waiting for ExclusiveLock on relation 16385 of 
database 16384 after 1000.030 ms

Output with patch:

LOG:  process 13777 still waiting for ExclusiveLock on relation 16385 of 
database 16384 after 1000.030 ms
CONTEXT:  processes owning lock: 13775, 13776

The second patch (show_table_name_and_tuple_in_lock_log.patch)
includes relation info (table name and OID) as well as some tuple
information (if available). Sample output (log_lock_waits=on required):

session 1:
CREATE TABLE foo (val integer);
INSERT INTO foo (val) VALUES (1);
BEGIN;
UPDATE foo SET val = 3;

session 2:
BEGIN;
UPDATE TABLE foo SET val = 2;

Output w/o patch:

LOG:  process 24774 acquired ShareLock on transaction 696 after 11688.720 ms

Output with patch:

LOG:  process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT:  relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)


Regarding this patch I am not really sure where to put the
functions. Currently they are located in backend/storage/lmgr/lmgr.c
because XactLockTableWait() is located there, too. What do you think?


I also created two test specs for easy creation of the log output;
however, I was not able to provide an expected file since the process
IDs vary from test run to test run.

Regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 122afb2..fdfacdf 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1195,13 +1195,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		 */
 		if (log_lock_waits  deadlock_state != DS_NOT_YET_CHECKED)
 		{
-			StringInfoData buf;
+			StringInfoData buf,
+		buf1;
 			const char *modename;
 			long		secs;
 			int			usecs;
 			long		msecs;
+			SHM_QUEUE  *procLocks;
+			PROCLOCK   *proclock;
+			bool		first = true;
+			int			lockHoldersNum = 0;
 
 			initStringInfo(buf);
+			initStringInfo(buf1);
 			DescribeLockTag(buf, locallock-tag.lock);
 			modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid,
 	   lockmode);
@@ -1211,10 +1217,42 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			msecs = secs * 1000 + usecs / 1000;
 			usecs = usecs % 1000;
 
+			LWLockAcquire(partitionLock, LW_SHARED);
+
+			procLocks = (lock-procLocks);
+			proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
+			   offsetof(PROCLOCK, lockLink));
+
+			while (proclock)
+			{
+if (MyProc-pid != proclock-tag.myProc-pid)
+{
+	if (first)
+	{
+		appendStringInfo(buf1, %d,
+		 proclock-tag.myProc-pid);
+		first = false;
+	}
+	else
+		appendStringInfo(buf1, , %d,
+		 proclock-tag.myProc-pid);
+
+	lockHoldersNum++;
+}
+
+proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink,
+			   offsetof(PROCLOCK, lockLink));
+			}
+
+			LWLockRelease(partitionLock);
+
 			if (deadlock_state == DS_SOFT_DEADLOCK)
 ereport(LOG,
 		(errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errcontext(ngettext(process owning lock: %s,
+			  processes owning lock: %s,
+			  lockHoldersNum), buf1.data;
 			else if (deadlock_state == DS_HARD_DEADLOCK)
 			{
 /*
@@ -1226,13 +1264,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
  */
 ereport(LOG,
 		(errmsg(process %d detected deadlock while waiting for %s on %s after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errcontext(ngettext(process owning lock: %s,
+			  processes owning lock: %s,
+			  lockHoldersNum), buf1.data;
 			}
 
 			if (myWaitStatus == STATUS_WAITING)
 ereport(LOG,
 		(errmsg(process %d still waiting for %s on %s after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errcontext(ngettext(process owning lock: %s,
+			  processes owning lock: %s,
+			  lockHoldersNum), buf1.data;
 			else if (myWaitStatus == STATUS_OK)
 ereport(LOG,
 	(errmsg(process %d acquired %s on %s after %ld.%03d ms,
@@ -1252,7 +1296,10 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 if (deadlock_state != 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Peter Geoghegan
On Mon, Dec 30, 2013 at 8:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 Maybe you should describe what you mean with unprincipled. Sure, the
 current patch deadlocks, but I don't see anything fundamental,
 unresolvable there. So I don't understand what the word unprincipled
 means in that sentence..

Maybe it is resolvable, and maybe it's worth resolving - I never said
that it wasn't, I just said that I doubt the latter. By unprincipled
deadlocking, I mean deadlocking that cannot be reasonably prevented by
a user. Currently, I think that never deadlocking is a reasonable
aspiration for all applications. It's never really necessary. When it
occurs, we can advise users to do simple analysis and application
refactoring to prevent it. With unprincipled deadlocking, we can give
no such advice. The only advice we can give is to stop doing so much
upserting, which is a big departure from how things are today. AFAICT,
no one disagrees with my view that this is bad, and probably
unacceptable.

 Uh. But that was said in the context of *your* approach being
 flawed. Because it - at that time, I didn't look at the newest version
 yet - extended the concept of holding btree page locks across external
 operation to far much more code, even including user defined code!. And
 you argued that that isn't a problem using _bt_check_unique() as an
 argument.

That's a distortion of my position at the time. I acknowledged from
the start that all buffer locking was problematic (e.g. [1]), and was
exploring alternative locking approaches and the merit of the design.
This is obviously the kind of project that needs to be worked at
through iterative prototyping. While arguing that deadlocking would
not occur, I lost sight of the bigger picture. But even if that wasn't
true, I don't know why you feel the need to go on and on about buffer
locking like this months later. Are you trying to be provocative? Can
you *please* stop?

Everyone knows that the btree heap access is a modularity violation.
Even the AM docs says that the heap access is without a doubt ugly
and non-modular. So my original point remains, which is that
expanding that is obviously going to be controversial, and probably
legitimately so. I thought that your earlier marks on
_bt_check_unique() were a good example of this sentiment, but I hardly
need such an example.

 I don't really see why your patch is less of a modularity violation than
 Heikki's POC. It's just a different direction.

My approach does not regress modularity because it doesn't do anything
extra with the heap at all, and only btree insertion routines are
affected. Locking is totally localized to the btree insertion routines
- one .c file. At no other point does anything else have to care, and
it's obvious that this situation won't change in the future when we
decide to do something else cute with infomask bits or whatever.
That's a *huge* distinction.

[1] 
http://www.postgresql.org/message-id/cam3swzr2x4hjg7rjn0k4+hfdgucyx2prep0y3a7nccejeow...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Andres Freund
On 2013-12-30 12:29:22 -0800, Peter Geoghegan wrote:
 But even if that wasn't
 true, I don't know why you feel the need to go on and on about buffer
 locking like this months later. Are you trying to be provocative? Can
 you *please* stop?

ERR? Peter? *You* quoted a statement of mine that only made sense in
it's original context. And I *did* say that the point about buffer
locking applied to the *past* version of the patch.


Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Adrian Klaver

On 12/30/2013 12:45 PM, Andres Freund wrote:

On 2013-12-30 12:29:22 -0800, Peter Geoghegan wrote:

But even if that wasn't
true, I don't know why you feel the need to go on and on about buffer
locking like this months later. Are you trying to be provocative? Can
you *please* stop?


ERR? Peter? *You* quoted a statement of mine that only made sense in
it's original context. And I *did* say that the point about buffer
locking applied to the *past* version of the patch.


Alright this seems to have gone from confusion about the proposal to 
confusion about the confusion. Might I suggest a cooling off period and 
a return to the discussion in possibly a Wiki page where the 
points/counter points could be laid out more efficiently?





Andres




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Peter Geoghegan
On Mon, Dec 30, 2013 at 7:22 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Ah, I didn't remember that thread. Yeah, apparently I proposed the exact
 same design back then. Simon complained about the dead tuples being left
 behind, but I don't think that's a big issue with the design we've been
 playing around now; you only end up with dead tuples when two backends try
 to insert the same value concurrently, which shouldn't happen very often.

Right, because you check first, which also has a cost, paid in CPU
cycles and memory bandwidth, and buffer lock contention. As opposed to
a cost almost entirely localized to inserters into a single leaf page
per unique index for only an instant. You're checking *all* unique
indexes.

You call check_exclusion_or_unique_constraint() once per unique index
(or EC), and specify to wait on the xact, at least until a conflict is
found. So if you're waiting on an xact, your conclusion that earlier
unique indexes had no conflicts could soon become totally obsolete. So
for non-idiomatic usage, say like the usage Andres in particular cares
about for MM conflict resolution, I worry about the implications of
that. I'm not asserting that it's a problem, but it does seem like
something that's quite hard to reason about. Maybe Andres can comment.

 In short, I think that my approach may be better because it doesn't
 conflate row locking with value locking (therefore making it easier
 to reason about, maintaining modularity),

 You keep saying that, but I don't understand what you mean. With your
 approach, an already-inserted heap tuple acts like a value lock, just like
 in my approach. You have the page-level locks on b-tree pages in addition to
 that, but the heap-tuple based mechanism is there too.

Yes, but that historic behavior isn't really value locking at all.
That's very much like row locking, because there is a row, not the
uncertain intent to try to insert a row. Provided your transaction
commits and the client's transaction doesn't delete the row, the row
is definitely there. For upsert, conflicts may well be the norm, not
the exception.

Value locking is the exclusive lock on the buffer held during
_bt_check_unique(). I'm trying to safely extend that mechanism, to
reach consensus among unique indexes, which to me seems the most
logical and conservative approach. For idiomatic usage, it's only
sensible for there to be a conflict on one unique index, known ahead
of time. If you don't know where the conflict will be, then typically
your DML statement is unpredictable, just like the MySQL feature.
Otherwise, for MM conflict resolution, I think it makes sense to pick
those conflicts off, one at a time, dealing with exactly one row per
conflict. I mean, even with your approach, you're still not dealing
with later conflicts in later unique indexes, right? The fact that you
prevent conflicts on previously non-conflicting unique indexes only,
and, I suppose, not later ones too, seems quite arbitrary.

 Well, killing an old promise tuple is not cheap, but it shouldn't happen
 often. In both approaches, what probably matters more is the overhead of the
 extra heavy-weight locking. But this is all speculation, until we see some
 benchmarks.

Fair enough. We'll know more when we have fixed the exclusion
constraint supporting patch, which will allow us to make a fair
comparison. I'm working on it. Although I'd suggest that having dead
duplicates in indexes where that's avoidable is a cost that isn't
necessarily that easily characterized. I especially don't like that
you're currently doing the UNIQUE_CHECK_PARTIAL deferred unique
constraint thing of always inserting, continuing on for *all* unique
indexes regardless of finding a duplicate. Whatever overhead my
approach may imply around lock contention, clearly the cost to index
scans is zero.

The other thing is that if you're holding old value locks (i.e.
previously inserted btree tuples, from earlier unique indexes) pending
resolving a value conflict, you're holding those value locks
indefinitely pending the completion of the other guy's xact, just in
case there ends up being no conflict, which in general is unlikely. So
in extreme cases, that could be the difference between waiting all day
(on someone sitting on a lock that they very probably have no use
for), and not waiting at all.

 _bt_check_unique() is a modularity violation, agreed. Beauty is in the eye
 of the beholder, I guess, but I don't see either patch making that any
 better or worse.

Clearly the way in which you envisage releasing locks to prevent
unprincipled deadlocking implies that btree has to know more about the
heap, and maybe even that the heap has to know something about btree,
or at least about amcanunique AMs (including possible future
amcanunique AMs that may or may not be well suited to implementing
this the same way).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your 

Re: [HACKERS] Proposal: variant of regclass

2013-12-30 Thread Tatsuo Ishii
 It seems fine to me if the new function ignores the specific error of
 relation does not exist while continuing to throw other errors.
 
 Thanks. Here is the revised conceptual patch. I'm going to post a
 concrete patch and register it to 2014-01 Commit Fest.

Before proceeding the work, I would like to make sure that followings
are complete list of new functions. Inside parentheses are
corresponding original functions.

toregproc (regproc)
toregoper (regoper)
toregclass (regclass)
toregtype (regtype)

In addition to above, we have regprocedure, regoperator, regconfig,
pg_ts_config and regdictionary. However I don't see much use
case/users for new functions corresponding to them. Opinions?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: variant of regclass

2013-12-30 Thread Vik Fearing
On 12/31/2013 02:38 AM, Tatsuo Ishii wrote:
 Before proceeding the work, I would like to make sure that followings
 are complete list of new functions. Inside parentheses are
 corresponding original functions.

 toregproc (regproc)
 toregoper (regoper)
 toregclass (regclass)
 toregtype (regtype)

Pardon the bikeshedding, but those are hard to read for me.  I would
prefer to go with the to_timestamp() model and add an underscore to
those names.

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] truncating pg_multixact/members

2013-12-30 Thread Alvaro Herrera
Alvaro Herrera wrote:

 1. slru.c doesn't consider file names longer than 4 hexadecimal chars.

 For 9.3, I propose we skip this and tweak the code to consider files
 whose names are 4 or 5 chars in length, to remain compatible with
 existing installations that have pg_multixact/member having a mixture of
 4-char and 5-char file names.

Attached is a patch for this.

 2. pg_multixact/members truncation requires more intelligence to avoid
 removing files that are still needed.  Right now we use modulo-2^32
 arithmetic, but this doesn't work because the useful range can span
 longer than what we can keep within that range.

 #2c At start of truncation, save end-of-range in MultiXactState.  This
 state is updated by GetNewMultiXactId as new files are created.  That
 way, before each new file is created, the truncation routine knows to
 skip it.

Attached is a patch implementing this.

I also attach a patch implementing a burn multixact utility, initially
coded by Andres Freund, tweaked by me.  I used it to run a bunch of
wraparound cycles and everything seems to behave as expected.  (I don't
recommend applying this patch; I'm posting merely because it's a very
useful debugging tool.)

One problem I see is length of time before freezing multis: they live
for far too long, causing the SLRU files to eat way too much disk space.
I ran burnmulti in a loop, creating multis of 3 members each, with a min
freeze age of 50 million, and this leads to ~770 files in
pg_multixact/offsets and ~2900 files in pg_multixact/members.  Each file
is 32 pages long. 256kB apiece.  Probably enough to be bothersome.

I think for computing the freezing point for multis, we should slash
min_freeze_age by 10 or something like that.  Or just set a hardcoded
one million.


 3. New pg_multixact/members generation requires more intelligence to
 avoid stomping on files from the previous wraparound cycle.  Right now
 there is no defense against this at all.

I still have no idea how to attack this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
From b264bfe61b315a5f65d72b9550592cc9f73bf0a8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera alvhe...@alvh.no-ip.org
Date: Tue, 31 Dec 2013 00:42:11 -0300
Subject: [PATCH 1/3] pg_burn_multixact utility

Andres Freund, minor tweaks by me
---
 contrib/pageinspect/heapfuncs.c  |   42 ++
 contrib/pageinspect/pageinspect--1.1.sql |5 
 src/backend/access/heap/heapam.c |2 +-
 src/backend/access/transam/multixact.c   |   12 +
 src/include/access/multixact.h   |3 ++-
 5 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 6d8f6f1..93a3317 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -29,6 +29,8 @@
 #include funcapi.h
 #include utils/builtins.h
 #include miscadmin.h
+#include access/multixact.h
+#include access/transam.h
 
 Datum		heap_page_items(PG_FUNCTION_ARGS);
 
@@ -224,3 +226,43 @@ heap_page_items(PG_FUNCTION_ARGS)
 	else
 		SRF_RETURN_DONE(fctx);
 }
+
+extern Datum
+pg_burn_multixact(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(pg_burn_multixact);
+
+Datum
+pg_burn_multixact(PG_FUNCTION_ARGS)
+{
+	int		rep = PG_GETARG_INT32(0);
+	int		size = PG_GETARG_INT32(1);
+	MultiXactMember *members;
+	MultiXactId ret;
+	TransactionId id = ReadNewTransactionId() - size;
+	int		i;
+
+	if (rep  1)
+		elog(ERROR, need to burn, burn, burn);
+
+	members = palloc(size * sizeof(MultiXactMember));
+	for (i = 0; i  size; i++)
+	{
+		members[i].xid = id++;
+		members[i].status = MultiXactStatusForShare;
+
+		if (!TransactionIdIsNormal(members[i].xid))
+		{
+			id = FirstNormalTransactionId;
+			members[i].xid = id++;
+		}
+	}
+
+	MultiXactIdSetOldestMember();
+
+	for (i = 0; i  rep; i++)
+	{
+		ret = MultiXactIdCreateFromMembers(size, members, true);
+	}
+
+	PG_RETURN_INT64((int64) ret);
+}
diff --git a/contrib/pageinspect/pageinspect--1.1.sql b/contrib/pageinspect/pageinspect--1.1.sql
index 22a47d5..b895246 100644
--- a/contrib/pageinspect/pageinspect--1.1.sql
+++ b/contrib/pageinspect/pageinspect--1.1.sql
@@ -105,3 +105,8 @@ CREATE FUNCTION fsm_page_contents(IN page bytea)
 RETURNS text
 AS 'MODULE_PATHNAME', 'fsm_page_contents'
 LANGUAGE C STRICT;
+
+CREATE FUNCTION pg_burn_multixact(num int4, size int4)
+RETURNS int4
+AS 'MODULE_PATHNAME', 'pg_burn_multixact'
+LANGUAGE C STRICT;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 90e9e6f..1b8469f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5544,7 +5544,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * Create a new multixact with the surviving members of the previous
 		 * one, to set as new Xmax in the tuple.
 		 */
-		xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
+		xid = 

Re: [HACKERS] [PATCH] Regression tests in windows ignore white space

2013-12-30 Thread Amit Kapila
On Mon, Dec 30, 2013 at 11:08 AM, David Rowley dgrowle...@gmail.com wrote:
 On Mon, Dec 30, 2013 at 6:02 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 For Windows build, I am using whatever latest Git provides rather than
 downloading
 individual components which might not be good, but I find it
 convenient. The latest
 Git (1.8.4) download on windows still provides 2.7, which is the
 reason I am on older
 version. However I agree that it is better to use latest version.


 It looks like the diff version I'm using is from msys and msys is what is in
 my search path rather than the git\bin path. To be honest I didn't realise
 that git for windows came with bison and flex, (at least I see bison.exe and
 flex.exe in the git\bin path.. I don't remember me putting them there)

   Yes, git for windows came with bison,flex, etc required for PG build.

 I don't seem to even have git\bin in my %path% environment variable at all,
 so all those tools are being picked up from the msys path. I'd need to
 remind myself about the msys install process, but since we're already saying
 in the docs that msys should be installed,
   I think it is one of the ways of set up on Windows, not every user
uses from msys.

 then would the fix for this not
 just be as simple as my patch plus a note in the docs to say to ensure
 msys\bin occurs before git\bin in the %path% environment var, minimum
 supported diff version is 2.8.

  To be honest, I am also not fully aware of the project policy in such
  a matter where we have to mention the minimum version of dependent
  executable (diff). I request some senior members to please let us know
  whether we can make PG dependent on diff version = 2.8 (atleast
  for windows), as there are some of the options which are not available
  in older versions.

  Another option could be that if version of diff available is = 2.8, then
  use '--strip-trailing-cr', else use existing option '-w', but not
sure if it will
  serve the purpose completely.

 Did you install msys? if so does it have a later version of diff?
   No.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: variant of regclass

2013-12-30 Thread Tatsuo Ishii
 On 12/31/2013 02:38 AM, Tatsuo Ishii wrote:
 Before proceeding the work, I would like to make sure that followings
 are complete list of new functions. Inside parentheses are
 corresponding original functions.

 toregproc (regproc)
 toregoper (regoper)
 toregclass (regclass)
 toregtype (regtype)
 
 Pardon the bikeshedding, but those are hard to read for me.  I would
 prefer to go with the to_timestamp() model and add an underscore to
 those names.

I have no problem with adding to_. Objection?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fix_PGSTAT_NUM_TABENTRIES_macro patch

2013-12-30 Thread Mark Dilger
In src/include/pgstat.h, the PGSTAT_NUM_TABENTRIES macro
attempts to subtract off the size of the PgStat_MsgTabstat
struct up to the m_entry[] field.  This macro was correct up
until the fields m_block_read_time and m_block_write_time
were added to that struct, as the macro was not changed to
include their size.

This patch fixes the macro.

As an aside, the PGSTAT_MSG_PAYLOAD define from which
these field sizes are being subtracted is a bit of a WAG, if
I am reading the code correctly, in which case whether the
two additional fields are subtracted from that WAG is perhaps
not critical.  But the code is certainly easier to understand if
the macro matches the definition of the struct.


Mark Dilger

fix_PGSTAT_NUM_TABENTRIES_macro
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-30 Thread Peter Geoghegan
On Sun, Dec 29, 2013 at 9:09 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 While mulling this over further, I had an idea about this: suppose we
 marked the tuple in some fashion that indicates that it's a promise
 tuple.  I imagine an infomask bit, although the concept makes me wince
 a bit since we don't exactly have bit space coming out of our ears
 there.  Leaving that aside for the moment, whenever somebody looks at
 the tuple with a mind to calling XactLockTableWait(), they can see
 that it's a promise tuple and decide to wait on some other heavyweight
 lock instead.  The simplest thing might be for us to acquire a
 heavyweight lock on the promise tuple before making index entries for
 it, and then have callers wait on that instead always instead of
 transitioning from the tuple lock to the xact lock.

 Yeah, that seems like it should work. You might not even need an infomask
 bit for that; just take the other heavyweight lock always before calling
 XactLockTableWait(), whether it's a promise tuple or not. If it's not,
 acquiring the extra lock is a waste of time but if you're going to sleep
 anyway, the overhead of one extra lock acquisition hardly matters.

Are you suggesting that I lock the tuple only (say, through a special
LockPromiseTuple() call), or lock the tuple *and* call
XactLockTableWait() afterwards? You and Robert don't seem to be in
agreement about which here. From here on I assume Robert's idea (only
get the special promise lock where appropriate), because that makes
more sense to me.

I've taken a look at this idea, but got frustrated. You're definitely
going to need an infomask bit for this. Otherwise, how do you
differentiate between a pending promise tuple and a fulfilled
promise tuple (or a tuple that never had anything to do with promises
in the first place)? You'll want to wake up as soon as it becomes
clear that the former is not going to become the latter on the one
hand. On the other hand, you really will want to wait until xact end
on the pending promise tuple when it becomes a fulfilled promise, or
on an already-fulfilled promise tuple, or a plain old tuple. It's
either locking the promise tuple, or locking the xid; never both,
because the combination makes no sense to any case (unless you're
talking about the case where you lock the promise tuple and then later
*somehow* decide that you need to lock the xid as the upserter
releases promise tuple locks directly within ExecInsert() upon
successful insertion).

The fact that your LockPromiseTuple() call didn't find someone else
with the lock does not mean no one ever promised the tuple (assuming
no infomask bit has the relevant info).

Obviously you can't just have upserters hold on to the promise tuple
locks until xact end if the promiser's insertion succeeds, for the
same reason we don't with regular in-memory tuple locks: they're
totally unbounded. So not only are you going to need an infomask
promise bit, you're going to need to go and unset the bit in the event
of a *successful* insertion, so that waiters know to wait on your xact
now when you finally UnlockPromiseTuple() within ExecInsert() to
finish off successful insertion. *And*, all XactLockTableWait()
promise waiters need to go back and check that just-in-case.

This problem illustrates what I mean about conflating row locking with
value locking.

 I think the interlocking with buffer locks and heavyweight locks to
 make that work could be complex.

 Hmm. Can you elaborate?

What I meant is that you should be wary of what you go on to describe below.

 The inserter has to acquire the heavyweight lock before releasing the buffer
 lock, because otherwise another inserter (or deleter or updater) might see
 the tuple, acquire the heavyweight lock, and fall to sleep on
 XactLockTableWait(), before the inserter has grabbed the heavyweight lock.
 If that race condition happens, you have the original problem again, ie. the
 updater unnecessarily waits for the inserting transaction to finish, even
 though it already killed the tuple it inserted.

Right. Can you suggest a workaround to the above problems?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers