Re: [HACKERS] remembering locks and dynahash.c

2012-05-31 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 I am working on making resource owner remember a limited number of
 locks, so it can reassign them more efficiently.

Check, per previous discussion.

 Currently I'm having resowner.c remember the LOCALLOCKTAG, because I
 thought that that was the only handle I could use.  But now I'm
 wondering if I should remember the LOCALLOCK * instead.

I was envisioning LOCALLOCK *.  dynahash will never move an existing
entry, and if the entry can be deleted while the resowner still holds a
pointer to it then you've got a logic bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-05-31 Thread Devrim GÜNDÜZ
Hi,

On Wed, 2012-05-30 at 22:03 -0400, Bruce Momjian wrote:
 A control variable was added in this commit:
 
 commit db84ba65ab5c0ad0b34d68ab5a687bc5f4ca3ba6
 Author: Peter Eisentraut pete...@gmx.net

Thanks Bruce, apparently I missed it.

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Yeb Havinga

On 2012-05-30 21:26, Kohei KaiGai wrote:

If we would have an ideal optimizer, I'd still like the optimizer to
wipe out redundant clauses transparently, rather than RLSBYPASS
permissions, because it just controls all-or-nothing stuff.
For example, if tuples are categorized to unclassified, classified or
secret, and RLS policy is configured as:
   ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
'classified')) OR (X IN 'unclassified)),
superuser can see all the tuples, and alice and bob can see
up to classified tuples.
Is it really hard to wipe out redundant condition at planner stage?
If current_user is obviously 'kaigai', it seems to me the left-side of
this clause can be wiped out at the planner stage.


The query's RLS policy would be simpler if the RLS policy function that 
returns the WHERE clause would take the user as argument, so its result 
does not contain user conditionals.


IF (current_user IN ('alice', 'bob')
THEN
  RETURN X IN ('unclassified', 'classified'))
ELSE
  RETURN X IN ('unclassified')
END IF;


regards,
Yeb



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Heikki Linnakangas

On 31.05.2012 08:06, Erik Rijkers wrote:

On Thu, May 31, 2012 03:30, Robert Haas wrote:

On Wed, May 30, 2012 at 6:00 PM, Erik Rijkerse...@xs4all.nl  wrote:

directory
2012-05-30 23:40:57.909 CEST 3909 CONTEXT:  writing block 5152 of relation 
base/21268/26569
xlog redo multi-insert (init): rel 1663/21268/26581; blk 3852; 35 tuples
TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 
1741)
2012-05-30 23:40:58.006 CEST 5331 FATAL:  could not open file 
base/21268/26569: No such file
or
directory
2012-05-30 23:40:58.006 CEST 5331 CONTEXT:  writing block 5153 of relation 
base/21268/26569
2012-05-30 23:40:59.661 CEST 3908 LOG:  startup process (PID 3909) was 
terminated by signal 6:
Aborted
2012-05-30 23:40:59.661 CEST 3908 LOG:  terminating any other active server 
processes


Hmm.  I set up what I believe to be the same test case you were using,
and it's not crashing for me.  In fact, with the latest code, I
haven't been able to produce any error at all.  When I reverted my
last commit, I managed to get this:

ERROR:  could not open relation with OID 18229
STATEMENT:  select current_setting('port') port, count(*) from t

...but just once, and with no other error messages.So I'm either
missing a step somewhere, or something about your system just happens
to tickle this moreso than on mine.


In my test, I run the bash code (the bits that I posted earlier) in a while 
loop so that the table
is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that wasn't 
clear.  That loop
is necessary; a few iterations are often successful.

I can test it today on a few other systems to see if it is reproducible.


I could no longer reproduce it after Robert's fix, the test case has 
been running for about an hour now. Please triple-check that you have it 
applied in the standby :-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Kohei KaiGai
2012/5/31 Yeb Havinga yebhavi...@gmail.com:
 On 2012-05-30 21:26, Kohei KaiGai wrote:

 If we would have an ideal optimizer, I'd still like the optimizer to
 wipe out redundant clauses transparently, rather than RLSBYPASS
 permissions, because it just controls all-or-nothing stuff.
 For example, if tuples are categorized to unclassified, classified or
 secret, and RLS policy is configured as:
   ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
 'classified')) OR (X IN 'unclassified)),
 superuser can see all the tuples, and alice and bob can see
 up to classified tuples.
 Is it really hard to wipe out redundant condition at planner stage?
 If current_user is obviously 'kaigai', it seems to me the left-side of
 this clause can be wiped out at the planner stage.


 The query's RLS policy would be simpler if the RLS policy function that
 returns the WHERE clause would take the user as argument, so its result does
 not contain user conditionals.

 IF (current_user IN ('alice', 'bob')
 THEN
  RETURN X IN ('unclassified', 'classified'))
 ELSE
  RETURN X IN ('unclassified')
 END IF;

Yes, it is a happy case. But the point I'm concern about is, the conditions
to branch cases are not limited to current user-id.
The RLS policy shall be appended at planner stage, so prepared statement
needs to be invalidated whenever its prerequisites are changed.
For example, someone may assign a function that returns RLS policy
depending on whether the current hour is even-number of odd-number.
It implies that we cannot predicate all the cases to invalidate prepared
statement with RLS policy, because of too much flexibility.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii
 We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
 extremely specific cases that might or might not have anything to do
 with what you're seeing.  The complainant was extremely helpful about
 tracking down the problems:
 http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
 
 I'm wondering if these fixes (or today's commit) include the case for
 a database has ~100 thounsands of tables, indexes. One of my customers
 has had troubles with pg_dump for the database, it takes over 10
 hours.
 
 So I did qucik test with old PostgreSQL 9.0.2 and current (as of
 commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
 database I created 100,000 tables, and each has two integer
 attributes, one of them is a primary key. Creating tables were
 resonably fast as expected (18-20 minutes). This created a 1.4GB
 database cluster.
 
 pg_dump dbname /dev/null took 188 minutes on 9.0.2, which was pretty
 long time as the customer complained. Now what was current?  Well it
 took 125 minutes. Ps showed that most of time was spent in backend.
 
 Below is the script to create tables.
 
 cnt=10
 while [ $cnt -gt 0 ]
 do
 psql -e -p 5432 -c create table t$cnt(i int primary key, j int); test
 cnt=`expr $cnt - 1`
 done
 
 p.s. You need to increate max_locks_per_transaction before running
 pg_dump (I raised to 640 in my case).

Just for record, I rerun the test again with my single-LOCK patch, and
now total runtime of pg_dump is 113 minutes.
188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

So far, I'm glad to see 40% time savings at this point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST buffering build, bug in levelStep calculation

2012-05-31 Thread Heikki Linnakangas

On 29.05.2012 23:46, Alexander Korotkov wrote:

On Wed, May 30, 2012 at 12:25 AM, Tom Lanet...@sss.pgh.pa.us  wrote:


Alexander Korotkovaekorot...@gmail.com  writes:

On Tue, May 29, 2012 at 11:42 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

While I'm looking at this, is the first test involving
effective_cache_size bulletproof either?  In particular, is
avgIndexTuplesPerPage clamped to be strictly greater than 1?



It's based on collected statistics on already inserted tuple sizes. Since
tuple sizes are measured after possible toasting, I don't see the way
for avgIndexTuplesPerPage to be less than 1.


Yeah, but if it could be *equal* to one, you've got a zero-divide there.



avgIndexTuplesPerPage is calculated as:

avgIndexTuplesPerPage = pageFreeSpace / itupAvgSize;

I think size of each index tuple must be at least few times lower
than pageFreeSpace to let us create any index.


Hmm, in theory, it seems possible that every leaf level index tuple 
would completely fill an index page. Not sure how useful such an index 
would be, though. On internal pages, at least, you have to fit at least 
two tuples on a page or you can't build a tree.


I note that the calculations assume that leaf tuples and internal tuples 
have similar sizes. We calculate the average leaf tuple size, and use 
that to calculate the fan-out of internal pages. On some GiST opclasses, 
the values stored on internal pages might be quite different from the 
leaf tuples. I don't think we need to worry about that in practice, 
these calculations are not very accurate anyway, but perhaps a comment 
would be in order.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Simon Riggs
On 30 May 2012 17:19, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 30 May 2012 15:25, Simon Riggs si...@2ndquadrant.com wrote:
 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
 matter if commit_siblings0 since there won't be any other backends
 with transaction IDs anyway, but if commit_siblings==0 then we'll
 sleep for no possible benefit.

 Agreed

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.


 2. Doing it in FlushBuffer() seems slightly iffy since we might be
 sitting on a buffer lock.  But maybe it's a win anyway, or just not
 worth worrying about.

 Agreed.

 As I've pointed out, we cannot meaningfully skip the wait for
 auxiliary processes alone (except perhaps by having commit_siblings
 set sufficiently high).

 The remaining cases aren't worth worrying about, apart from
 SlruPhysicalWritePage() which happens during visibility checks and
 needs to happen as quickly as possible also.

 I'm not so sure. It says in that function:

                /*
                 * We must determine the largest async-commit LSN for the 
 page. This
                 * is a bit tedious, but since this entire function is a slow 
 path
                 * anyway, it seems better to do this here than to maintain a 
 per-page
                 * LSN variable (which'd need an extra comparison in the
                 * transaction-commit path).
                 */

 I would say the additional contention from waiting outweighs the
 benefit of the wait in those 3 places, so skipping the wait is wise.

 MinimumActiveBackends() reports the count backends (other than
 myself) that are in active transactions, so unnecessary calls will
 have to occur when we have active transactions = CommitSiblings, not
 connections = CommitSiblings as was previously the case.

 What if we were to skip the wait during recovery only, by specially
 setting CommitDelay to 0 in the start-up process? Would that satisfy
 everyone's concerns about unhelpful delays? I'm not sure how this
 might interact with hot standby.

Hmm, that was a good idea, but as of my comments above, that isn't
required or useful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST buffering build, bug in levelStep calculation

2012-05-31 Thread Alexander Korotkov
On Thu, May 31, 2012 at 1:36 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 I note that the calculations assume that leaf tuples and internal tuples
 have similar sizes. We calculate the average leaf tuple size, and use that
 to calculate the fan-out of internal pages. On some GiST opclasses, the
 values stored on internal pages might be quite different from the leaf
 tuples. I don't think we need to worry about that in practice, these
 calculations are not very accurate anyway, but perhaps a comment would be
 in order.


Probably we could collect per-level statistics of average tuple size?

--
With best regards,
Alexander Korotkov.


[HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma
Hi all,

I am trying to call heap_form_tuple to create a tuple from a datum.

My call to heap_form_tuple looks like:


val1=0;
tupledesc1=BlessTupleDesc(node-ss.ss_currentRelation-rd_att);
tuple=heap_form_tuple(tupledesc1,p1,val1);


p1 is a pointer to a Datum instance which is created from a char array.

When I am running the code,the system is crashing.

Please let me know what can the problem be and also what I should do
to rectify it.

Atri
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-31 Thread Dickson S. Guedes
2012/5/30 Robert Haas robertmh...@gmail.com:
 I tested the effect of this by setting up a series of 5-minute
 read-only pgbench run at scale factor 300 with 8GB of shared buffers
 on the IBM POWER7 machine.

I know it doesn't matter, but out of curiosity what OS you used?

best regards,
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 7:07 AM, Dickson S. Guedes lis...@guedesoft.net wrote:
 2012/5/30 Robert Haas robertmh...@gmail.com:
 I tested the effect of this by setting up a series of 5-minute
 read-only pgbench run at scale factor 300 with 8GB of shared buffers
 on the IBM POWER7 machine.

 I know it doesn't matter, but out of curiosity what OS you used?

Fedora 16, Linux 3.2.6-3.fc16.ppc64

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers e...@xs4all.nl wrote:
 In my test, I run the bash code (the bits that I posted earlier) in a while 
 loop so that the table
 is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
 wasn't clear.  That loop
 is necessary; a few iterations are often successful.

Yes... I let it run all night on my laptop with no errors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan
On 31 May 2012 11:19, Simon Riggs si...@2ndquadrant.com wrote:
 I've looked at this more closely now and I can see that the call to
 XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
 actually flush WAL, so whether we delay or not is completely
 irrelevant.

 So un-agreed. No change required to patch there.

So, does that clear up the question of it being acceptable to add a
delay to every existing XLogFlush() call site? I think so.

Aside from the outstanding question of what to rename
commit_delay/commit_siblings to, and how we might want to reframe
those settings in the docs, I think that's everything.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Heikki Linnakangas

On 31.05.2012 13:42, Atri Sharma wrote:

I am trying to call heap_form_tuple to create a tuple from a datum.

My call to heap_form_tuple looks like:


val1=0;
tupledesc1=BlessTupleDesc(node-ss.ss_currentRelation-rd_att);
tuple=heap_form_tuple(tupledesc1,p1,val1);


p1 is a pointer to a Datum instance which is created from a char array.

When I am running the code,the system is crashing.

Please let me know what can the problem be and also what I should do
to rectify it.


Hard to say without seeing the full code...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma
On Thu, May 31, 2012 at 5:02 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 31.05.2012 13:42, Atri Sharma wrote:

 I am trying to call heap_form_tuple to create a tuple from a datum.

 My call to heap_form_tuple looks like:


 val1=0;
 tupledesc1=BlessTupleDesc(node-ss.ss_currentRelation-rd_att);
 tuple=heap_form_tuple(tupledesc1,p1,val1);


 p1 is a pointer to a Datum instance which is created from a char array.

 When I am running the code,the system is crashing.

 Please let me know what can the problem be and also what I should do
 to rectify it.


 Hard to say without seeing the full code...

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Hi Heikki,

Thanks for the reply.

Another thing I wanted to ask was that would you recommend building
tuples from strings directly or converting them to Datum first and
then build the tuples from Datum instances?

Atri
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Heikki Linnakangas

On 31.05.2012 14:42, Atri Sharma wrote:

Another thing I wanted to ask was that would you recommend building
tuples from strings directly or converting them to Datum first and
then build the tuples from Datum instances?


It depends. If you have all the values in strings already, then 
BuildTupleFromCStrings() is probably the easiest. But if you have some 
attributes in Datum format already, then it's probably easier and faster 
to use heap_form_tuple().


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma
On Thu, May 31, 2012 at 5:14 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 31.05.2012 14:42, Atri Sharma wrote:

 Another thing I wanted to ask was that would you recommend building
 tuples from strings directly or converting them to Datum first and
 then build the tuples from Datum instances?


 It depends. If you have all the values in strings already, then
 BuildTupleFromCStrings() is probably the easiest. But if you have some
 attributes in Datum format already, then it's probably easier and faster to
 use heap_form_tuple().


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Performance wise,which one would be better(I am envisioning a large
set of strings to be converted to tuples)?


-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Simon Riggs
On 31 May 2012 12:14, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers e...@xs4all.nl wrote:
 In my test, I run the bash code (the bits that I posted earlier) in a while 
 loop so that the table
 is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
 wasn't clear.  That loop
 is necessary; a few iterations are often successful.

 Yes... I let it run all night on my laptop with no errors.

It looked to me like the correct fix to me as well, FWIW.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Erik Rijkers
On Thu, May 31, 2012 13:14, Robert Haas wrote:
 On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers e...@xs4all.nl wrote:
 In my test, I run the bash code (the bits that I posted earlier) in a while 
 loop so that the
 table
 is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
 wasn't clear.  That loop
 is necessary; a few iterations are often successful.

 Yes... I let it run all night on my laptop with no errors.

My apologies to both of you.  It seems the problem was indeed solved with that 
commit from Robert.
  I managed to forget that I, uh... temporary, commented out the git-pull from 
my build script...

Thanks,

Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Simon Riggs
On 30 May 2012 12:10, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 Hmm, we do this in smgrDoPendingDeletes:

 for (i = 0; i = MAX_FORKNUM; i++)
 {
        smgrdounlink(srel, i, false);
 }

 So we drop the buffers for each relation fork separately, which means that
 we scan the buffer pool four times. Relation forks in 8.4 introduced that
 issue, and 9.1 made it worse by adding another fork for unlogged tables.
 With some refactoring, we could scan the buffer pool just once. That would
 help a lot.

That struck me as a safe and easy optimisation. This was a problem I'd
been trying to optimise for 9.2, so I've written a patch that appears
simple and clean enough to be applied directly.

 Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
 grabbing the spinlocks on every buffer? It could do an unlocked test first,
 and only grab the spinlock on buffers that need to be dropped.

Sounds less good and we'd need reasonable proof it actually did
anything useful without being dangerous.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


dropallforks.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 7:49 AM, Erik Rijkers e...@xs4all.nl wrote:
 On Thu, May 31, 2012 13:14, Robert Haas wrote:
 On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers e...@xs4all.nl wrote:
 In my test, I run the bash code (the bits that I posted earlier) in a while 
 loop so that the
 table
 is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
 wasn't clear.  That loop
 is necessary; a few iterations are often successful.

 Yes... I let it run all night on my laptop with no errors.

 My apologies to both of you.  It seems the problem was indeed solved with 
 that commit from Robert.
  I managed to forget that I, uh... temporary, commented out the git-pull from 
 my build script...

No problem.

The one thing that still seems a little odd to me is that this caused
a pin count to get orphaned.  It seems reasonable that ignoring the
AccessExclusiveLock could result in not-found errors trying to open a
missing relation, and even fsync requests on a missing relation.  But
I don't see why that would cause the backend-local pin counts to get
messed up, which makes me wonder if there really is another bug here
somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 6:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've looked at this more closely now and I can see that the call to
 XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
 actually flush WAL, so whether we delay or not is completely
 irrelevant.

 So un-agreed. No change required to patch there.

I think Peter's suggestion of forcibly setting the delay to 0 in the
startup process is a good one, though.  It's one line of code, and if
it isn't strictly necessary today, it still seems like good
future-proofing.

I am not very happy about the idea of renaming commit_* to
group_commit_*.  It's basically a cosmetic renaming, and breaking
existing configuration files for cosmetic purposes does not seem
warranted to me, especially when the old and new names are so close.
I certainly don't think we can do that in 9.2, now that beta1 has
already shipped.  Modifying the default contents of postgresql.conf
after we've shipped beta has been a historical no-no for reasons that
escape me at the moment, but IIRC they're not stupid reasons.

Frankly, I think this whole thing should be pushed to 9.3.  The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them.  We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Johann 'Myrkraverk' Oskarsson
Tom Lane t...@sss.pgh.pa.us writes:

 Robert Haas robertmh...@gmail.com writes:
 On Tue, May 29, 2012 at 9:04 AM, Johann 'Myrkraverk' Oskarsson
 joh...@2ndquadrant.com wrote:
 The header file crtdefs.h in MinGW typedefs errcode which conflicts
 with Postgres' elog.h.

 Eep.  Maybe this is not directly relevant, but I guess my first
 question is: why is MinGW doing that?

 I concur with Robert here: your first step should be to push back on
 the MinGW developers about this nonstandard intrusion on application
 namespace.  We've been using errcode() as a function name since 2003,
 and it's never been a problem before on any platform, including
 previous versions of MinGW.

I have contacted the MinGW W64 team on this.

 If they won't change it, then we could consider some other hack, but
 that should really be the first attempt.

 MinGW W64's sys/stat.h #defines stat to be _stati64 and there is
 subsequently a compilation error in port.h:
 
 note: expected 'struct _stati64 *' but argument is of type 'struct
 stat *' error: conflicting types for 'pgwin32_safestat'

 In this case, I really think we ought to change all backend calls
 that hit stat() to use something like pgstat() instead.

 I disagree with this conclusion.  That'd be an unnecessarily
 nonstandard notation, which all existing and future developers would
 have to learn.  I'd rather work around this in port.h if at all
 possible.  I'm not quite sure why the existing code fails, though ---
 is there a conflict between #define stat and #define stat(a,b)?

I wouldn't know, the compiler is GCC 4.6.3 here (any 4.5+ will do I
think) so all the usal GCC macro magic should be working.

Is this something to discuss with the MinGW W64 team?

 There are series of redefined macros from the MinGW W64 CRT.
 ...
 And possibly some more.  Do we need these redefines?

 We probably need them somewhere, or they wouldn't have been added.
 But maybe we don't need them on the exact platform you're using.

 Can we deal with this by just wrapping each #define in #ifndef?

I'll take a look and make sure the #defines end up with the same
values.  If so I'll attach a patch for this.


-- 
   Johann Oskarssonhttp://www.2ndquadrant.com/|[]
   PostgreSQL Development, 24x7 Support, Training and Services  --+--
  |
   Blog: http://my.opera.com/myrkraverk/blog/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Simon Riggs
On 31 May 2012 13:16, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 31, 2012 at 6:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I've looked at this more closely now and I can see that the call to
 XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
 actually flush WAL, so whether we delay or not is completely
 irrelevant.

 So un-agreed. No change required to patch there.

 I think Peter's suggestion of forcibly setting the delay to 0 in the
 startup process is a good one, though.  It's one line of code, and if
 it isn't strictly necessary today, it still seems like good
 future-proofing.

Adding a line that does nothing is not a good idea. The Startup
process flushes very, very few WAL messages, so the setting is
irrelevant.

 I am not very happy about the idea of renaming commit_* to
 group_commit_*.  It's basically a cosmetic renaming, and breaking
 existing configuration files for cosmetic purposes does not seem
 warranted to me, especially when the old and new names are so close.
 I certainly don't think we can do that in 9.2, now that beta1 has
 already shipped.  Modifying the default contents of postgresql.conf
 after we've shipped beta has been a historical no-no for reasons that
 escape me at the moment, but IIRC they're not stupid reasons.

 Frankly, I think this whole thing should be pushed to 9.3.  The
 commit_delay and commit_siblings knobs suck, but they've sucked for a
 long time, and it won't kill anybody to wait another release cycle to
 fix them.  We have plenty of more important things queued up for 9.3
 already, and I don't believe there's any compelling reason to think
 that this particular thing needs preferential treatment.

No problem with pushing a variable rename through to 9.3. To be
honest, I don't care whether we rename them or not.

What matters is that we have a patch that provides a massive
performance gain in write performance in just a few lines of code, and
that should be committed to 9.2.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan
On 31 May 2012 13:16, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 31, 2012 at 6:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Frankly, I think this whole thing should be pushed to 9.3.  The
 commit_delay and commit_siblings knobs suck, but they've sucked for a
 long time, and it won't kill anybody to wait another release cycle to
 fix them.  We have plenty of more important things queued up for 9.3
 already, and I don't believe there's any compelling reason to think
 that this particular thing needs preferential treatment.

Why do you think that? Those knobs are now quite ineffective, though
we never even considered that when the group commit delay patch was
committed.  The entire body of research and commentary that exists on
commit_delay has been invalidated for 9.2. If that isn't something
that needs to be addressed before release, I don't know what is. The
fact that the patch can sometimes double transaction throughput for an
absolutely trivial change, moving 2 lines of code, is also a good
reason to not bump this for another year.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 My call to heap_form_tuple looks like:
 val1=0;
 tupledesc1=BlessTupleDesc(node-ss.ss_currentRelation-rd_att);
 tuple=heap_form_tuple(tupledesc1,p1,val1);

 p1 is a pointer to a Datum instance which is created from a char array.

Does that actually match the tupdesc you're using?  Are you sure you
created the Datum correctly (ie, did you call the appropriate datatype
input routine)?

BTW, the BlessTupleDesc call here seems to be pure cargo-cult
programming.  It should not be necessary to bless a relation's tupdesc
(because that should be a named type already); and even if it were,
heap_form_tuple doesn't care.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Klemme
On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.

I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The one thing that still seems a little odd to me is that this caused
 a pin count to get orphaned.  It seems reasonable that ignoring the
 AccessExclusiveLock could result in not-found errors trying to open a
 missing relation, and even fsync requests on a missing relation.  But
 I don't see why that would cause the backend-local pin counts to get
 messed up, which makes me wonder if there really is another bug here
 somewhere.

According to Heikki's log, the Assert was in the startup process itself,
and it happened after an error:

 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file 
 base/21268/32994: No such file or directory
 2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation 
 base/21268/32994
  xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 
 tuples
 TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 
 1741)
 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was 
 terminated by signal 6: Aborted

I don't think that code is meant to recover from errors anyway, so
the fact that it fails with a pin count held isn't exactly surprising.
But it might be worth looking at exactly which on_proc_exit callbacks
are installed in the startup process and what assumptions they make.

As for where the error came from in the first place, it's easy to
imagine somebody who's not got the word about the AccessExclusiveLock
reading pages of the table into buffers that have already been scanned
by the DROP.  So you'd end up with orphaned buffers belonging to a
vanished table.  If somebody managed to dirty them by setting hint bits
(we do allow that in HS mode no?) then later you'd have various processes
trying to write the buffer before recycling it, which seems to fit the
reported error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 8:38 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 31 May 2012 13:16, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 31, 2012 at 6:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Frankly, I think this whole thing should be pushed to 9.3.  The
 commit_delay and commit_siblings knobs suck, but they've sucked for a
 long time, and it won't kill anybody to wait another release cycle to
 fix them.  We have plenty of more important things queued up for 9.3
 already, and I don't believe there's any compelling reason to think
 that this particular thing needs preferential treatment.

 Why do you think that? Those knobs are now quite ineffective, though
 we never even considered that when the group commit delay patch was
 committed.  The entire body of research and commentary that exists on
 commit_delay has been invalidated for 9.2. If that isn't something
 that needs to be addressed before release, I don't know what is. The
 fact that the patch can sometimes double transaction throughput for an
 absolutely trivial change, moving 2 lines of code, is also a good
 reason to not bump this for another year.

Fixing regressions before release is essential; improving performance
is not - especially when the improvement relates to a little-used
feature that you were proposing to get rid of two weeks ago.  It can't
simultaneously be so unimportant that we should remove it altogether
and so important that it's must-fix-before-release, and if one test
can completely overturn your view of which category this falls into,
that seems like a reason for taking some more time to think it over
and, perhaps, run more tests.  We don't have a lot of latitude to
maneuver at this point - anything we do now is going to go straight
out into the wild.  Caution is appropriate.

However, rather than arguing about it, let's see if anyone else has an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 31 May 2012 13:16, Robert Haas robertmh...@gmail.com wrote:
 Frankly, I think this whole thing should be pushed to 9.3.

 What matters is that we have a patch that provides a massive
 performance gain in write performance in just a few lines of code, and
 that should be committed to 9.2.

I agree with Robert on this.  This patch hasn't had *nearly* enough
testing to justify cramming it into 9.2 at this point.  AFAIK the
claim of massive performance gain is based on a single test case run
by a single person, which doesn't even give me any confidence that it
doesn't break anything, much less that it's a win across the board.

If we want to finish the beta cycle in a reasonable time period and get
back to actual development, we have to refrain from adding more
possibly-destabilizing development work to 9.2.  And that is what
this is.

Add it to the upcoming CF, please.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii
 On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.
 
 I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

What I meant was (100 * (113/188 - 1)).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 1741

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 9:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The one thing that still seems a little odd to me is that this caused
 a pin count to get orphaned.  It seems reasonable that ignoring the
 AccessExclusiveLock could result in not-found errors trying to open a
 missing relation, and even fsync requests on a missing relation.  But
 I don't see why that would cause the backend-local pin counts to get
 messed up, which makes me wonder if there really is another bug here
 somewhere.

 According to Heikki's log, the Assert was in the startup process itself,
 and it happened after an error:

 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file 
 base/21268/32994: No such file or directory
 2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation 
 base/21268/32994
          xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 
 tuples
 TRAP: FailedAssertion(!(PrivateRefCount[i] == 0), File: bufmgr.c, Line: 
 1741)
 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was 
 terminated by signal 6: Aborted

 I don't think that code is meant to recover from errors anyway, so
 the fact that it fails with a pin count held isn't exactly surprising.
 But it might be worth looking at exactly which on_proc_exit callbacks
 are installed in the startup process and what assumptions they make.

Which code isn't meant to recover from errors?

 As for where the error came from in the first place, it's easy to
 imagine somebody who's not got the word about the AccessExclusiveLock
 reading pages of the table into buffers that have already been scanned
 by the DROP.  So you'd end up with orphaned buffers belonging to a
 vanished table.  If somebody managed to dirty them by setting hint bits
 (we do allow that in HS mode no?) then later you'd have various processes
 trying to write the buffer before recycling it, which seems to fit the
 reported error.

Right, I understand the other errors.  It's just the pin count that I
am a bit confused about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Klemme
On Thu, May 31, 2012 at 4:07 PM, Tatsuo Ishii is...@postgresql.org wrote:
 On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.

 I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

 What I meant was (100 * (113/188 - 1)).

OK, my fault was to assume you wanted to measure only your part, while
apparently you meant overall savings.  But Tom had asked for separate
measurements if I understood him correctly.  Also, that measurement of
your change would go after the O(N^2) fix.  It could actually turn out
to be much more than 9% because the overall time would be reduced even
more dramatic.  So it might actually be good for your fix to wait a
bit. ;-)

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 7:31 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Wed, 30 May 2012, Robert Haas wrote:

 I'd really like to find out exactly where all those s_lock calls are
 coming from.  Is there any way you can get oprofile to output a
 partial stack backtrace?  If you have perf it's very easy, just 'perf
 record -g -a command to launch your test case' and then 'perf report
 -g'.


 I repeated my test with 8 threads (without tasksetting) and with
 sharedbuffers=48g (because that seemed to trigger in particular long times ~
 80 seconds). And I attach the perf report.

Thanks.  How did you generate this perf report?  It's cool, because I
haven't figured out how to make perf generate a report that is easily
email-able, and it seems you have.

The only trouble is that there's no call stack information here for
s_lock or PinBuffer, which is what I really want.  It seems to have
spit out call stack information only for the kernel functions, and not
for user functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 11:17 AM, Robert Klemme
shortcut...@googlemail.com wrote:

 OK, my fault was to assume you wanted to measure only your part, while
 apparently you meant overall savings.  But Tom had asked for separate
 measurements if I understood him correctly.  Also, that measurement of
 your change would go after the O(N^2) fix.  It could actually turn out
 to be much more than 9% because the overall time would be reduced even
 more dramatic.  So it might actually be good for your fix to wait a
 bit. ;-)

It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm asking because it doesn't seem like a complicated patch,
contributors may want to get working if not ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm not; Jeff Janes is.  But you shouldn't be holding your breath
anyway, since it's 9.3 material at this point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson
joh...@2ndquadrant.com wrote:
 MinGW W64's sys/stat.h #defines stat to be _stati64 and there is
 subsequently a compilation error in port.h:

 note: expected 'struct _stati64 *' but argument is of type 'struct
 stat *' error: conflicting types for 'pgwin32_safestat'

 In this case, I really think we ought to change all backend calls
 that hit stat() to use something like pgstat() instead.

 I disagree with this conclusion.  That'd be an unnecessarily
 nonstandard notation, which all existing and future developers would
 have to learn.  I'd rather work around this in port.h if at all
 possible.  I'm not quite sure why the existing code fails, though ---
 is there a conflict between #define stat and #define stat(a,b)?

 I wouldn't know, the compiler is GCC 4.6.3 here (any 4.5+ will do I
 think) so all the usal GCC macro magic should be working.

 Is this something to discuss with the MinGW W64 team?

My viewpoint on this (which is different than Tom's) is that we're
probably not entitled to assume anything about what the system header
files do with respect to stat.  On some systems, they might just have
a function prototype, while others might define stat or stat() as a
macro.   It seems to me that our source code is hoping for a function
definition rather than a macro definition and falling over when that's
not how it is.  I don't see that as very reasonable, unless we have
some basis for believing that the OS isn't entitled to define stat as
a macro rather than a function, and maybe not even then.  We have
plenty of other places where we use our own wrapper function in lieu
of OS facilities for various reasons (e.g. BasicOpenFile) and I don't
think adding one more is a big deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 It's not clear whether Tom is already working on that O(N^2) fix in locking.

 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

I agree we can't back-patch that change, but then I think we ought to
consider back-patching some variant of Tatsuo's patch.  Maybe it's not
reasonable to thunk an arbitrary number of relation names in there on
one line, but how about 1000 relations per LOCK statement or so?  I
guess we'd need to see how much that erodes the benefit, but we've
certainly done back-branch rearrangements in pg_dump in the past to
fix various kinds of issues, and this is pretty non-invasive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Robert Haas
On Wed, May 30, 2012 at 3:26 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 My preference is RLSBYPASS permission rather than the approach
 with functions that return policy clause at run-time, because it needs
 to invalidate prepared statement at random timing.
 In case of this function approach, the RLS policy shall be generated
 on planner stage, and we cannot have any assumption to the criteria
 of RLS policy. A function might generate RLS policy regarding to the
 current user id. Yes, it is straightforward. The prepared statement
 should be invalidate whenever current user-id got switched.
 However, someone may define a function that generate RLS policy
 depending on the value of client_min_messages for example.
 Do we need to invalidate prepared statement whenever GUC get
 updated? I think it is overkill. We cannot predicate all the criteria
 user want to control the RLS policy using the functions.
 So, RLSBYPASS permission is more simple way to limit number of
 situations to invalidate prepared statements.

That's a good point.

 If we would have an ideal optimizer, I'd still like the optimizer to
 wipe out redundant clauses transparently, rather than RLSBYPASS
 permissions, because it just controls all-or-nothing stuff.
 For example, if tuples are categorized to unclassified, classified or
 secret, and RLS policy is configured as:
  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
 'classified')) OR (X IN 'unclassified)),
 superuser can see all the tuples, and alice and bob can see
 up to classified tuples.
 Is it really hard to wipe out redundant condition at planner stage?
 If current_user is obviously 'kaigai', it seems to me the left-side of
 this clause can be wiped out at the planner stage.
 Do I consider the issue too simple?

Yes.  :-)

There are two problems.  First, if using the extended query protocol
(e.g. pgbench -M prepared) you can prepare a statement just once and
then execute it multiple times.  In this case, stable-functions cannot
be constant-folded at plan time, because they are only guaranteed to
remain constant for a *single* execution of the query, not for all
executions of the query.  So any optimization in this area would have
to be limited to cases where the simple query protocol is used.  I
think that might still be worth doing, but it's a significant
limitation, to be sure.  Second, at present, there is no guarantee
that the snapshot used for planning the query is the same as the
snapshot used for executing the query, though commit
d573e239f03506920938bf0be56c868d9c3416da made that happen in some
common cases.  If we were to do constant-folding of stable functions
using the planner snapshot, it would represent a behavior change from
previous releases.  I am not clear whether that has any real-world
consequences that we should be worried about.  It seems to me that the
path of least resistance might be to refactor the portal stuff so that
we can provide a uniform guarantee that, when using the simple query
protocol, the planner and executor snapshots will be the same ... but
I might be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

 I agree we can't back-patch that change, but then I think we ought to
 consider back-patching some variant of Tatsuo's patch.  Maybe it's not
 reasonable to thunk an arbitrary number of relation names in there on
 one line, but how about 1000 relations per LOCK statement or so?  I
 guess we'd need to see how much that erodes the benefit, but we've
 certainly done back-branch rearrangements in pg_dump in the past to
 fix various kinds of issues, and this is pretty non-invasive.

I am not convinced either that this patch will still be useful after
Jeff's fix goes in, or that it provides any meaningful savings when
you consider a complete pg_dump run.  Yeah, it will make the lock
acquisition phase faster, but that's not a big part of the runtime
except in very limited scenarios (--schema-only, perhaps).

The performance patches we applied to pg_dump over the past couple weeks
were meant to relieve pain in situations where the big server-side
lossage wasn't the dominant factor in runtime (ie, partial dumps).
But this one is targeting exactly that area, which is why it looks like
a band-aid and not a fix to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Simon Riggs
On 31 May 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 31 May 2012 13:16, Robert Haas robertmh...@gmail.com wrote:
 Frankly, I think this whole thing should be pushed to 9.3.

 What matters is that we have a patch that provides a massive
 performance gain in write performance in just a few lines of code, and
 that should be committed to 9.2.

 I agree with Robert on this.  This patch hasn't had *nearly* enough
 testing to justify cramming it into 9.2 at this point.  AFAIK the
 claim of massive performance gain is based on a single test case run
 by a single person, which doesn't even give me any confidence that it
 doesn't break anything, much less that it's a win across the board.

I agree with you. You would be mistaken if you thought that I think
Peter's laptop was sufficient proof for anyone to commit something and
I've already said exactly that to him.

My description of massive performance gain is appropriate based on
the measurements so far.

 If we want to finish the beta cycle in a reasonable time period and get
 back to actual development, we have to refrain from adding more
 possibly-destabilizing development work to 9.2.  And that is what
 this is.

In what way is it possibly destabilising? I see nothing in the patch
to merit that claim, so presumably you haven't read the patch yet?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'm not; Jeff Janes is. �But you shouldn't be holding your breath
  anyway, since it's 9.3 material at this point.
 
  I agree we can't back-patch that change, but then I think we ought to
  consider back-patching some variant of Tatsuo's patch.  Maybe it's not
  reasonable to thunk an arbitrary number of relation names in there on
  one line, but how about 1000 relations per LOCK statement or so?  I
  guess we'd need to see how much that erodes the benefit, but we've
  certainly done back-branch rearrangements in pg_dump in the past to
  fix various kinds of issues, and this is pretty non-invasive.
 
 I am not convinced either that this patch will still be useful after
 Jeff's fix goes in, or that it provides any meaningful savings when
 you consider a complete pg_dump run.  Yeah, it will make the lock
 acquisition phase faster, but that's not a big part of the runtime
 except in very limited scenarios (--schema-only, perhaps).

FYI, that is the pg_upgrade use-case, and pg_dump/restore time is
reportedly taking the majority of time in many cases.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The performance patches we applied to pg_dump over the past couple weeks
 were meant to relieve pain in situations where the big server-side
 lossage wasn't the dominant factor in runtime (ie, partial dumps).
 But this one is targeting exactly that area, which is why it looks like
 a band-aid and not a fix to me.

No, Tatsuo's patch attacks a phase dominated by latency in some
setups. That it's also becoming slow currently because of the locking
cost is irrelevant, with locking sped up, the patch should only
improve the phase even further. Imagine the current timeline:

* = locking
. = waiting

*.*.**.**.***.***...*.

Tatsuo's patch converts it to:

*.**

The locking fix would turn the timeline into:

*.*.*.*.*.*.*

Tatsuo's patch would turn that into:

***

And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

 I agree we can't back-patch that change, but then I think we ought to
 consider back-patching some variant of Tatsuo's patch.  Maybe it's not
 reasonable to thunk an arbitrary number of relation names in there on
 one line, but how about 1000 relations per LOCK statement or so?  I
 guess we'd need to see how much that erodes the benefit, but we've
 certainly done back-branch rearrangements in pg_dump in the past to
 fix various kinds of issues, and this is pretty non-invasive.

 I am not convinced either that this patch will still be useful after
 Jeff's fix goes in, ...

But people on older branches are not going to GET Jeff's fix.

 or that it provides any meaningful savings when
 you consider a complete pg_dump run.  Yeah, it will make the lock
 acquisition phase faster, but that's not a big part of the runtime
 except in very limited scenarios (--schema-only, perhaps).

That is not a borderline scenario, as others have also pointed out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote:
 On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'm not; Jeff Janes is.  But you shouldn't be holding your breath
  anyway, since it's 9.3 material at this point.
 
  I agree we can't back-patch that change, but then I think we ought to
  consider back-patching some variant of Tatsuo's patch.  Maybe it's not
  reasonable to thunk an arbitrary number of relation names in there on
  one line, but how about 1000 relations per LOCK statement or so?  I
  guess we'd need to see how much that erodes the benefit, but we've
  certainly done back-branch rearrangements in pg_dump in the past to
  fix various kinds of issues, and this is pretty non-invasive.
 
  I am not convinced either that this patch will still be useful after
  Jeff's fix goes in, ...
 
 But people on older branches are not going to GET Jeff's fix.

FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2
would benefit because pg_upgrade uses the new cluster's pg_dumpall.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 31 May 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote:
 If we want to finish the beta cycle in a reasonable time period and get
 back to actual development, we have to refrain from adding more
 possibly-destabilizing development work to 9.2.  And that is what
 this is.

 In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it could
be destabilizing to that.  It needs proper review and testing, and the
next CF is the right environment for that to happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan
On 31 May 2012 14:58, Robert Haas robertmh...@gmail.com wrote:
 Fixing regressions before release is essential; improving performance
 is not - especially when the improvement relates to a little-used
 feature that you were proposing to get rid of two weeks ago.

Yes, the fact that I wanted to get rid of commit_delay is well
established - I called for its deprecation in a dedicated thread, and
during my talk at pgCon. Bruce's confusion as to how that interacted
with what I've been calling new group commit was actually what
crystallised my position here: it is trying, for the most part, to do
the same thing as new group commit, but in an entirely orthogonal way.
Bruce's confusion actually reflected the confusion of the code. So I'm
in a sense removing the overlap between commit_delay used to do but
now but shouldn't try to do anymore (make commits coincide, giving
good benchmark results) and what new group commit now does, while
preserving commit_delay's ability to trade off latency for throughput.

I didn't have an answer to the question of how we might continue to
offer a throughput/latency trade-off to users before, but knew that
with 9.2, commit_delay was totally ineffective anyway. The realisation
that it could be made effective by working with rather than against
new group commit changed my mind.

 It can't simultaneously be so unimportant that we should remove it altogether
 and so important that it's must-fix-before-release, and if one test
 can completely overturn your view of which category this falls into,
 that seems like a reason for taking some more time to think it over
 and, perhaps, run more tests.  We don't have a lot of latitude to
 maneuver at this point - anything we do now is going to go straight
 out into the wild.  Caution is appropriate.

The patch can be justified as a way of removing the tension between
new group commit and commit_delay. Removing commit_delay would also do
this, but then there'd be no way to make the aforementioned trade-off
that we previously offered. I suspect that if it restored the peaks
and valleys of commit_delay's changes to throughput in 9.1, over and
above a new group commit baseline, this would be more readily
accepted. I hope the patch isn't being punished for being effective.
Yes, it does offer a large boost to performance, but that happens to
be incidental, unlikely though that sounds.

You've called this a clever idea. I actually don't agree. I was fairly
surprised that no one noticed this earlier. It is rather obviously the
case that a delay that hopes to maximise the batching of commits at
the expense of latency should occur only in a single leader backend
that will proceed with the flush for the batch, and not within each
and every backend as it commits.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The performance patches we applied to pg_dump over the past couple weeks
 were meant to relieve pain in situations where the big server-side
 lossage wasn't the dominant factor in runtime (ie, partial dumps).
 But this one is targeting exactly that area, which is why it looks like
 a band-aid and not a fix to me.

 No, Tatsuo's patch attacks a phase dominated by latency in some
 setups.

No, it does not.  The reason it's a win is that it avoids the O(N^2)
behavior in the server.  Whether the bandwidth savings is worth worrying
about cannot be proven one way or the other as long as that elephant
is in the room.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 12:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 No, Tatsuo's patch attacks a phase dominated by latency in some
 setups.

 No, it does not.  The reason it's a win is that it avoids the O(N^2)
 behavior in the server.  Whether the bandwidth savings is worth worrying
 about cannot be proven one way or the other as long as that elephant
 is in the room.

                        regards, tom lane

I understand that, but if the locking is fixed and made to be O(N)
(and hence each table locking O(1)), then latency suddenly becomes the
dominating factor.

I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump
backups, so in that case latency would be negligible and Tatsuo's
patch inconsequential.

I'm also thinking, whether the ResourceOwner patch you've proposed
would get negated by Tatsuo's patch, because suddenly a portal
(IIRC) has a lot more locks than ResourceOwner could accomodate,
forcing a reversal to O(N²) behavior. In that case, that patch would
in fact be detrimental... huh... way to go 180

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan
On 31 May 2012 16:26, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 31 May 2012 16:23, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 In what way is it possibly destabilising?

 I'm prepared to believe that it only affects performance, but it could
 be destabilizing to that.  It needs proper review and testing, and the
 next CF is the right environment for that to happen.

 It couldn't possibly be as destabilising to performance as
 commit_delay was in 9.1.

Furthermore, it couldn't possibly affect performance in any way unless
commit_delay is set. I've just moved the delay site so that its only
executed by the group commit leader. The leader would execute the code
anyway, but now the followers don't.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hash index concurrency

2012-05-31 Thread David Fetter
On Wed, May 30, 2012 at 12:21:33AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, May 29, 2012 at 11:21 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  2) Only support bitmap scans and not ordinary tid scans (the way gin
  indexes already do).
 
  -1 on losing amgettuple.  I regret that we lost that for GIN and I
  shall regret it more if we lose it anywhere else.
 
 Not sure that's all that big a deal for hash.  IIRC the only reasons to
 want it are for index-only scans (not possible anyway with hash) and
 exclusion constraints (for which you might as well use a btree, or plain
 index-supported uniqueness if hash had that).

It does via EXCLUDE constraints, so it could with what as far as I've
been able to tell would be some relatively small amount of coding.

dfetter@dfetter:5492=# CREATE TABLE foo(i TEXT, EXCLUDE USING HASH(i WITH =));
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index foo_i_excl for 
table foo
CREATE TABLE
dfetter@dfetter:5492=# insert into foo VALUES (1),(1);
ERROR:  conflicting key value violates exclusion constraint foo_i_excl
DETAIL:  Key (i)=(1) conflicts with existing key (i)=(1).

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan
On 31 May 2012 16:23, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 In what way is it possibly destabilising?

 I'm prepared to believe that it only affects performance, but it could
 be destabilizing to that.  It needs proper review and testing, and the
 next CF is the right environment for that to happen.

It couldn't possibly be as destabilising to performance as
commit_delay was in 9.1.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-31 Thread Bruce Momjian
On Tue, May 15, 2012 at 12:57:37PM -0400, Magnus Hagander wrote:
 On Fri, May 11, 2012 at 11:44 AM, Andrew Dunstan and...@dunslane.net wrote:
 
 
  On 05/11/2012 05:32 AM, Magnus Hagander wrote:
 
 
  But in the interest of actually being productive - what *is* the
  usecase for needing a 5 minute turnaround time? I don't buy the check
  what a patch looks like, because that should be done *before* the
  commit, not after - so it's best verified by a local docs build anyway
  (which will also be faster).
 
  I'm sure we can put something in with a pretty quick turnaround again
  without too much strain on the system, but it does, as I mentioned
  before, require decoupling it from the buildfarm which means it's not
  just tweaking a config file.
 
 
  If it's of any use to you I have made some adjustments to the buildfarm code
  which would let you do *just* the docs build (and dist make if you want). It
  would still pull from git, and only do anything if there's a (relevant)
  change. So using that to set up a machine that would run every few minutes
  might work. Of course, building the docs can itself be fairly compute
  intensive, so you still might not want to run every few minutes if that's a
  limiting factor.
 
 that would definitely be useful. Compute intensive is not really a
 problem, we can easily shape the box on that (and I think we already
 do).
 
 Do you have some details of what to do and how to do it to use that,
 so Stefan can set it up for us ? ;)

Where are we on building the development docs more frequently?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-31 Thread Magnus Hagander
On Thu, May 31, 2012 at 5:55 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, May 15, 2012 at 12:57:37PM -0400, Magnus Hagander wrote:
 On Fri, May 11, 2012 at 11:44 AM, Andrew Dunstan and...@dunslane.net wrote:
 
 
  On 05/11/2012 05:32 AM, Magnus Hagander wrote:
 
 
  But in the interest of actually being productive - what *is* the
  usecase for needing a 5 minute turnaround time? I don't buy the check
  what a patch looks like, because that should be done *before* the
  commit, not after - so it's best verified by a local docs build anyway
  (which will also be faster).
 
  I'm sure we can put something in with a pretty quick turnaround again
  without too much strain on the system, but it does, as I mentioned
  before, require decoupling it from the buildfarm which means it's not
  just tweaking a config file.
 
 
  If it's of any use to you I have made some adjustments to the buildfarm 
  code
  which would let you do *just* the docs build (and dist make if you want). 
  It
  would still pull from git, and only do anything if there's a (relevant)
  change. So using that to set up a machine that would run every few minutes
  might work. Of course, building the docs can itself be fairly compute
  intensive, so you still might not want to run every few minutes if that's a
  limiting factor.

 that would definitely be useful. Compute intensive is not really a
 problem, we can easily shape the box on that (and I think we already
 do).

 Do you have some details of what to do and how to do it to use that,
 so Stefan can set it up for us ? ;)

 Where are we on building the development docs more frequently?

Still waiting for details on how it works to set that up on the
buildfarm client.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 11:23 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 31 May 2012, Robert Haas wrote:

 Thanks.  How did you generate this perf report?  It's cool, because I
 haven't figured out how to make perf generate a report that is easily
 email-able, and it seems you have.

 I did pretty much what you have said, e.g.
 attached it to running process by
 perf record -g -p PID
 and then
 perf report -g   output

Ah, interesting.  I never tried sending the output to a file.

 And postgresql was compiled with cflags=-g

 The only trouble is that there's no call stack information here for
 s_lock or PinBuffer, which is what I really want.  It seems to have
 spit out call stack information only for the kernel functions, and not
 for user functions.

 Yes, I forgot to clean the old binaries when recompiled with cflags=-g.
 So not it is fixed. I attach the updated perf report (i.e. the first 1
 lines of it to reduce the  file size).

Oh, ho.  So from this we can see that the problem is that we're
getting huge amounts of spinlock contention when pinning and unpinning
index pages.

It would be nice to have a self-contained reproducible test case for
this, so that we could experiment with it on other systems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Merlin Moncure
On Thu, May 31, 2012 at 10:23 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 31 May 2012, Robert Haas wrote:


 Thanks.  How did you generate this perf report?  It's cool, because I
 haven't figured out how to make perf generate a report that is easily
 email-able, and it seems you have.


 I did pretty much what you have said, e.g.
 attached it to running process by
 perf record -g -p PID
 and then
 perf report -g   output

 And postgresql was compiled with cflags=-g


 The only trouble is that there's no call stack information here for
 s_lock or PinBuffer, which is what I really want.  It seems to have
 spit out call stack information only for the kernel functions, and not
 for user functions.


 Yes, I forgot to clean the old binaries when recompiled with cflags=-g.
 So not it is fixed. I attach the updated perf report (i.e. the first 1
 lines of it to reduce the  file size).

That's basically what we needed.  The sequential scans are driving
index scans are all simultaneously pointing at the same couple of
pages in the indexThe are constantly pinning and unpinning -- the
database is schizophrenically going back and forth between the key
pages in the index being and not being allowed to be candidates for
buffer eviction.  Raising shared buffers doesn't help because it's
just marking the buffers to be available for eviction, not the
eviction process itself, that is the problem.  IOS doens't help
because it's index relation buffers, not the heap buffers we are
binding up on (although if the btree was fixed it's entirely possible
the problem could head right back to the heap for non IOS scans.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread Kevin Grittner
chinnaobi  wrote:
 
 You mean when the primary which is going to switch its role to
 standby might not have sent all the WAL records to the standby and
 If it is switched to standby it has more WAL records than the
 standby which is now serves as primary. Is it ??
 
What happens when there is a network fault between the primary and
the standby, but not between the primary and some of the clients
updating it?  Similarly, if this is asynchronous replication, what if
there have been commits on the primary which were still in the
network buffer when the primary crashed?
 
Clean automated failover is not a trivial task.  If you are writing
your own, it would be best to follow the steps recommended in the
documentation rather closely.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Sergey Koposov

On Thu, 31 May 2012, Robert Haas wrote:


Oh, ho.  So from this we can see that the problem is that we're
getting huge amounts of spinlock contention when pinning and unpinning
index pages.

It would be nice to have a self-contained reproducible test case for
this, so that we could experiment with it on other systems.


I have created it a few days ago:
http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php

It is still valid. And I'm using exactly it to test. The only thing to 
change is to create a two-col index and drop another index.

The scripts are precisely the ones I'm using now.

The problem is that in order to see a really big slowdown (10 times slower 
than a single thread) I've had to raise the buffers to 48g but it was 
slow for smaller shared buffer settings as well.


But I'm not sure how sensitive the test is to the hardware.

Cheers,
S

*
Sergey E. Koposov,  PhD, Research Associate
Institute of Astronomy, University of Cambridge 
Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551

Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Figuring out shared buffer pressure

2012-05-31 Thread Bruce Momjian
On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote:
   Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which
   tells how many time we have requested a buffer.  Not sure how that helps
   measure buffer pressure.
  
  Once the linked list is empty, every request for a buffer to read a
  new page into must result in the eviction of the previous occupant
  from this conceptual freelist buffer (except perhaps for some race
  conditions).  Isn't that what you wanted?  Except that the
  buffers_alloc does not get incremented when the StrategyGetBuffer is
  satisfied by a ring strategy rather than the default strategy.
 
 Well, the ideal case is that I could find out how often data that is
 near to be discarded is actually needed, hence the reclaimed field
 that is often important for kernel memory presssure reporting on older
 operating systems.  I will post an email soon about my theory of why
 buffer pressure is an important thing to report to users.

OK, realizing there is no simple way to measure shared buffer pressure,
let me explain why I want to.

Right now we simplisticly recommend 25% of RAM for shared_buffers, with
a maximum of 8GB (512MB on Windows).  This helps to be sure that there
are sufficient kernel buffers for high-write operations, and perhaps a
kernel cache larger than shared buffers.

However, this doesn't help people configure shared buffers larger (e.g.
35%) if their working set is larger.  Right now, I don't see how a user
would know this is happening.  On the flip side, they might have a
smaller working set than 25% and spending the overhead of managing 1
million shared buffers.  Again, there is no way to know if that is the
case.

For example, we have reports that larger shared buffers is sometimes
better, sometimes not, but there is no feedback we give the user to
explain why this is happening.  My guess is that if their working set is
larger than 25% of RAM, they benefit, if not, the buffer management
overhead makes things slower.

I feel we need to allow users to get clearer information on how active
their shared buffer cache is, perhaps allowing them to shink/grow it as
appropriate.  Asking them to blindly try different shared buffer sizes
seems suboptimal.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread chinnaobi
Dear Kevin,

Thank you for your reply. Yeah I am writing an application using powershell,
it's true it is not trivial and especially a guy like me who has no idea on
database. 

You raised all the cases which I am muddling with, But currently I am
testing this setup:

Always standby server is configured from base backup and restore from
storage server then start streaming replication(asynchronous). Base backups
are taken frequently. 

I am sure there is some data loss during switching. Still researching how to
do it clean. suggest me if you have any good papers on this ..

Reddy.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710830.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread chinnaobi
Sorry to mention, In my setup the primary and standby servers receive same
traffic, so no issue with the 
network fault between the primary and the standby, but not between the
primary and some of the clients updating it

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710832.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma
On Thu, May 31, 2012 at 7:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Atri Sharma atri.j...@gmail.com writes:
 My call to heap_form_tuple looks like:
 val1=0;
 tupledesc1=BlessTupleDesc(node-ss.ss_currentRelation-rd_att);
 tuple=heap_form_tuple(tupledesc1,p1,val1);

 p1 is a pointer to a Datum instance which is created from a char array.

 Does that actually match the tupdesc you're using?  Are you sure you
 created the Datum correctly (ie, did you call the appropriate datatype
 input routine)?

 BTW, the BlessTupleDesc call here seems to be pure cargo-cult
 programming.  It should not be necessary to bless a relation's tupdesc
 (because that should be a named type already); and even if it were,
 heap_form_tuple doesn't care.

                        regards, tom lane

Hi Tom,

Thanks for the advice and help.Your diagnosis is correct,I used the
correct datatype input routine and it worked like a charm.

I will remove the calls to BlessTupleDesc then?

Atri

-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Merlin Moncure
On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 31 May 2012, Robert Haas wrote:

 Oh, ho.  So from this we can see that the problem is that we're
 getting huge amounts of spinlock contention when pinning and unpinning
 index pages.

 It would be nice to have a self-contained reproducible test case for
 this, so that we could experiment with it on other systems.


 I have created it a few days ago:
 http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php

 It is still valid. And I'm using exactly it to test. The only thing to
 change is to create a two-col index and drop another index.
 The scripts are precisely the ones I'm using now.

 The problem is that in order to see a really big slowdown (10 times slower
 than a single thread) I've had to raise the buffers to 48g but it was slow
 for smaller shared buffer settings as well.

 But I'm not sure how sensitive the test is to the hardware.

It's not: high contention on spinlocks is going to suck no matter what
hardware you have.   I think the problem is pretty obvious now: any
case where multiple backends are scanning the same sequence of buffers
in a very tight loop is going to display this behavior.  It doesn't
come up that often: it takes a pretty unusual sequence of events to
get a bunch of backends hitting the same buffer like that.

Hm, I wonder if you could alleviate the symptoms by making making the
Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
pinned longer -- kinda as if your private ref count was hacked to be
higher in that case.   It would be a complex fix for a narrow issue
though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Create collation incorrect error code

2012-05-31 Thread Magnus Hagander
Strangeness:

template1=# create collation nb_NO.utf8 (locale=nb_NO.utf8);
ERROR:  could not create locale nb_no.utf8: Success

Clearly it's not successful...

On some runs, I get:

template1=# create collation nb_NO.utf8 (locale=nb_NO.utf8);
ERROR:  could not create locale nb_no.utf8: No such file or directory
DETAIL:  The operating system could not find any locale data for the
locale name nb_no.utf8.

Which is correct.

The first one is clearly wrong. Seems there are some cases where errno
isn't set properly? But form what I can tell
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/newlocale.html)
errno should be set.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Sergey Koposov

On Thu, 31 May 2012, Simon Riggs wrote:



That struck me as a safe and easy optimisation. This was a problem I'd
been trying to optimise for 9.2, so I've written a patch that appears
simple and clean enough to be applied directly.


Thanks! The patch indeed improved the timings, 
The dropping of 100 tables in a single commit before the patch took ~ 50 
seconds, now it takes ~ 5 sec (it would be nice to reduce it further 
though, because the dropping of 1 tables still takes ~10 min).


Cheers,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson
 joh...@2ndquadrant.com wrote:
 Is this something to discuss with the MinGW W64 team?

 My viewpoint on this (which is different than Tom's) is that we're
 probably not entitled to assume anything about what the system header
 files do with respect to stat.  On some systems, they might just have
 a function prototype, while others might define stat or stat() as a
 macro.   It seems to me that our source code is hoping for a function
 definition rather than a macro definition and falling over when that's
 not how it is.  I don't see that as very reasonable, unless we have
 some basis for believing that the OS isn't entitled to define stat as
 a macro rather than a function, and maybe not even then.

I quote from the POSIX:2008 specification for sys/stat.h:

The following shall be declared as functions and may also be
defined as macros. Function prototypes shall be provided. 
...
intstat(const char *restrict, struct stat *restrict);

I do not believe that the standard intends the word shall to have any
wiggle room.  I would also read this to mean that if the header defines
stat as a macro, that macro ought to be an alternative way of invoking
the function.  Now we are messing up by failing to #undef the macro
before redefining it, but if we do that and it still doesn't work, the
header is not conformant to POSIX.

You can read this yourself at
http://pubs.opengroup.org/onlinepubs/9699919799/

 We have
 plenty of other places where we use our own wrapper function in lieu
 of OS facilities for various reasons (e.g. BasicOpenFile) and I don't
 think adding one more is a big deal.

Well, I think it is.  Where we use a wrapper function, it's because it
has somewhat different semantics from the underlying standard function.
I do not think it's good for readability to define pgfoo() when that's
only meant to be exactly foo().  I'm especially not keen on doing that
just because one variant of MinGW has managed to break their conformance
with POSIX.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create collation incorrect error code

2012-05-31 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 template1=# create collation nb_NO.utf8 (locale=nb_NO.utf8);
 ERROR:  could not create locale nb_no.utf8: Success

What platform?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create collation incorrect error code

2012-05-31 Thread Magnus Hagander
On Thu, May 31, 2012 at 8:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 template1=# create collation nb_NO.utf8 (locale=nb_NO.utf8);
 ERROR:  could not create locale nb_no.utf8: Success

 What platform?

D'uh, sorry.

Ubuntu 12.04.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Figuring out shared buffer pressure

2012-05-31 Thread Cédric Villemain
Le jeudi 31 mai 2012 19:11:07, Bruce Momjian a écrit :
 On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote:
Seems buffers_alloc is the number of calls to StrategyGetBuffer(),
which tells how many time we have requested a buffer.  Not sure how
that helps measure buffer pressure.
   
   Once the linked list is empty, every request for a buffer to read a
   new page into must result in the eviction of the previous occupant
   from this conceptual freelist buffer (except perhaps for some race
   conditions).  Isn't that what you wanted?  Except that the
   buffers_alloc does not get incremented when the StrategyGetBuffer is
   satisfied by a ring strategy rather than the default strategy.
  
  Well, the ideal case is that I could find out how often data that is
  near to be discarded is actually needed, hence the reclaimed field
  that is often important for kernel memory presssure reporting on older
  operating systems.  I will post an email soon about my theory of why
  buffer pressure is an important thing to report to users.
 
 OK, realizing there is no simple way to measure shared buffer pressure,
 let me explain why I want to.
 
 Right now we simplisticly recommend 25% of RAM for shared_buffers, with
 a maximum of 8GB (512MB on Windows).  This helps to be sure that there
 are sufficient kernel buffers for high-write operations, and perhaps a
 kernel cache larger than shared buffers.
 
 However, this doesn't help people configure shared buffers larger (e.g.
 35%) if their working set is larger.  Right now, I don't see how a user
 would know this is happening.  On the flip side, they might have a
 smaller working set than 25% and spending the overhead of managing 1
 million shared buffers.  Again, there is no way to know if that is the
 case.
 
 For example, we have reports that larger shared buffers is sometimes
 better, sometimes not, but there is no feedback we give the user to
 explain why this is happening.  My guess is that if their working set is
 larger than 25% of RAM, they benefit, if not, the buffer management
 overhead makes things slower.
 
 I feel we need to allow users to get clearer information on how active
 their shared buffer cache is, perhaps allowing them to shink/grow it as
 appropriate.  Asking them to blindly try different shared buffer sizes
 seems suboptimal.

There is also a recent thread about the usagecount used.
Jeff suggested to maybe increase by 2 and decrease by 1. This is really near 
another idea I had but I didn't tested yet. Increment/decrement by 1 or 2, 
just make even numbers ascending, and odd number descending. So that by just 
looking at the usage count you can know what's going with your buffers (more 
odd than even or reverse ? building the cache or wasting it or ...). it should 
also allows a finer control of eviction.

Increase/decrease shared_buffer while postgresql is up is a nice idea. 
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov kopo...@ast.cam.ac.uk 
 wrote:
 On Thu, 31 May 2012, Robert Haas wrote:

 Oh, ho.  So from this we can see that the problem is that we're
 getting huge amounts of spinlock contention when pinning and unpinning
 index pages.

 It would be nice to have a self-contained reproducible test case for
 this, so that we could experiment with it on other systems.


 I have created it a few days ago:
 http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php

 It is still valid. And I'm using exactly it to test. The only thing to
 change is to create a two-col index and drop another index.
 The scripts are precisely the ones I'm using now.

 The problem is that in order to see a really big slowdown (10 times slower
 than a single thread) I've had to raise the buffers to 48g but it was slow
 for smaller shared buffer settings as well.

 But I'm not sure how sensitive the test is to the hardware.

 It's not: high contention on spinlocks is going to suck no matter what
 hardware you have.   I think the problem is pretty obvious now: any
 case where multiple backends are scanning the same sequence of buffers
 in a very tight loop is going to display this behavior.  It doesn't
 come up that often: it takes a pretty unusual sequence of events to
 get a bunch of backends hitting the same buffer like that.

 Hm, I wonder if you could alleviate the symptoms by making making the
 Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
 pinned longer -- kinda as if your private ref count was hacked to be
 higher in that case.   It would be a complex fix for a narrow issue
 though.

This test case is unusual because it hits a whole series of buffers
very hard.  However, there are other cases where this happens on a
single buffer that is just very, very hot, like the root block of a
btree index, where the pin/unpin overhead hurts us.  I've been
thinking about this problem for a while, but it hasn't made it up to
the top of my priority list, because workloads where pin/unpin is the
dominant cost are still relatively uncommon.  I expect them to get
more common as we fix other problems.

Anyhow, I do have some vague thoughts on how to fix this.  Buffer pins
are a lot like weak relation locks, in that they are a type of lock
that is taken frequently, but rarely conflicts.  And the fast-path
locking in 9.2 provides a demonstration of how to handle this kind of
problem efficiently: making the weak, rarely-conflicting locks
cheaper, at the cost of some additional expense when a conflicting
lock (in this case, a buffer cleanup lock) is taken.  In particular,
each backend has its own area to record weak relation locks, and a
strong relation lock must scan all of those areas and migrate any
locks found there to the main lock table.  I don't think it would be
feasible to adopt exactly this solution for buffer pins, because page
eviction and buffer cleanup locks, while not exactly common, are
common enough that we can't require a scan of N per-backend areas
every time one of those operations occurs.

But, maybe we could have a system of this type that only applies to
the very hottest buffers.  Suppose we introduce two new buffer flags,
BUF_NAILED and BUF_NAIL_REMOVAL.  When we detect excessive contention
on the buffer header spinlock, we set BUF_NAILED.  Once we do that,
the buffer can't be evicted until that flag is removed, and backends
are permitted to record pins in a per-backend area protected by a
per-backend spinlock or lwlock, rather than in the buffer header.
When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.  At that
point, it's no longer permissible to record new pins in the
per-backend areas, but old ones may still exist.  So then we scan all
the per-backend areas and transfer the pins to the buffer header, or
else just wait until no more exist; then, we clear both BUF_NAILED and
BUF_NAIL_REMOVAL.

So the pin algorithm looks like this:

read buffer header flags (unlocked)
if (flags  (BUF_NAILED|BUF_NAIL_REMOVAL) != BUF_NAILED)
{
take buffer header spinlock
record pin in buffer header
release buffer header spinlock;
}
else
{
take per-backend lwlock
record pin in per-backend area
release per-backend lwlock
read buffer header flags (unlocked)
if (flags  (BUF_NAILED|BUF_NAIL_REMOVAL) != BUF_NAILED)
{
take per-backend lwlock
forget pin in per-backend area
release per-backend lwlock
take buffer header spinlock
record pin in buffer header
release buffer header spinlock
}
}

Due to memory ordering effects, we might see the buffer as nailed when
in fact nail removal has already begun (or even, completed).  We can
prevent that if (1) the nail removal code sets the nail removal flag
before checking the per-backend areas and (2) the pin code checks the
nail removal flag AFTER checking the per-backend 

Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 2:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson
 joh...@2ndquadrant.com wrote:
 Is this something to discuss with the MinGW W64 team?

 My viewpoint on this (which is different than Tom's) is that we're
 probably not entitled to assume anything about what the system header
 files do with respect to stat.  On some systems, they might just have
 a function prototype, while others might define stat or stat() as a
 macro.   It seems to me that our source code is hoping for a function
 definition rather than a macro definition and falling over when that's
 not how it is.  I don't see that as very reasonable, unless we have
 some basis for believing that the OS isn't entitled to define stat as
 a macro rather than a function, and maybe not even then.

 I quote from the POSIX:2008 specification for sys/stat.h:

        The following shall be declared as functions and may also be
        defined as macros. Function prototypes shall be provided.
        ...
        int    stat(const char *restrict, struct stat *restrict);

 I do not believe that the standard intends the word shall to have any
 wiggle room.  I would also read this to mean that if the header defines
 stat as a macro, that macro ought to be an alternative way of invoking
 the function.

There's nothing in the passage you quote that says the macro
definition can't do anything other than invoke the eponymous function,
but...

 Now we are messing up by failing to #undef the macro
 before redefining it, but if we do that and it still doesn't work, the
 header is not conformant to POSIX.

...this is probably still true.

 We have
 plenty of other places where we use our own wrapper function in lieu
 of OS facilities for various reasons (e.g. BasicOpenFile) and I don't
 think adding one more is a big deal.

 Well, I think it is.  Where we use a wrapper function, it's because it
 has somewhat different semantics from the underlying standard function.
 I do not think it's good for readability to define pgfoo() when that's
 only meant to be exactly foo().  I'm especially not keen on doing that
 just because one variant of MinGW has managed to break their conformance
 with POSIX.

In this case, I feel like we've kind of already stepped in it, because
we've defined stat() to be pgwin32_safestat(), which is, in fact, not
exactly stat().  So right now ISTM that a naive backend hacker might
think that stat() means the stat provided by the OS, but, on
Windows, it doesn't.  I would exactly describe myself as keen on
adding wrapper functions to things like stat(), but until non-POSIX
operating systems go the way of the dodo bird, I'm not sure there's
any way around it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] \conninfo and SSL

2012-05-31 Thread Bruce Momjian
On startup, psql shows the SSL information:

$ psql 'sslmode=require host=localhost'
psql (9.2beta1)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)

However, \conninfo does not mention SSL:

postgres= \conninfo
You are connected to database postgres as user postgres on 
host localhost at port 5432.

Should \conninfo mention SSL?  Fortunately \c shows SSL information:

postgres= \c
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
You are now connected to database postgres as user postgres.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \conninfo and SSL

2012-05-31 Thread Magnus Hagander
On Thu, May 31, 2012 at 9:04 PM, Bruce Momjian br...@momjian.us wrote:
 On startup, psql shows the SSL information:

        $ psql 'sslmode=require host=localhost'
        psql (9.2beta1)
        SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)

 However, \conninfo does not mention SSL:

        postgres= \conninfo
        You are connected to database postgres as user postgres on
        host localhost at port 5432.

 Should \conninfo mention SSL?  Fortunately \c shows SSL information:

        postgres= \c
        SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
        You are now connected to database postgres as user postgres.

Well, \c creates a new connection, so that's not really an information
command. It might not connect that trivially, depending on what
authentication method you use.

Including ssl info in \conninfo would be useful, I think.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-31 Thread Jeff Davis
On Tue, 2012-05-01 at 10:08 -0400, Robert Haas wrote:
 We've previously discussed the possible desirability of extending
 relations in larger increments, rather than one block at a time, for
 performance reasons.  I attempted to determine how much performance we
 could possibly buy this way, and found that, as far as I can see, the
 answer is, basically, none.

Another point here is that with checksums, we will want to make sure
that zero pages can be treated as corrupt. That will probably involve
using the WAL for extension operations, and we'll want to mitigate that
cost somehow. Extending in larger chunks would probably be necessary.

There are some challenges there, but I think it's worth pursuing.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Merlin Moncure
On Thu, May 31, 2012 at 1:50 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov kopo...@ast.cam.ac.uk 
 wrote:
 On Thu, 31 May 2012, Robert Haas wrote:

 Oh, ho.  So from this we can see that the problem is that we're
 getting huge amounts of spinlock contention when pinning and unpinning
 index pages.

 It would be nice to have a self-contained reproducible test case for
 this, so that we could experiment with it on other systems.


 I have created it a few days ago:
 http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php

 It is still valid. And I'm using exactly it to test. The only thing to
 change is to create a two-col index and drop another index.
 The scripts are precisely the ones I'm using now.

 The problem is that in order to see a really big slowdown (10 times slower
 than a single thread) I've had to raise the buffers to 48g but it was slow
 for smaller shared buffer settings as well.

 But I'm not sure how sensitive the test is to the hardware.

 It's not: high contention on spinlocks is going to suck no matter what
 hardware you have.   I think the problem is pretty obvious now: any
 case where multiple backends are scanning the same sequence of buffers
 in a very tight loop is going to display this behavior.  It doesn't
 come up that often: it takes a pretty unusual sequence of events to
 get a bunch of backends hitting the same buffer like that.

 Hm, I wonder if you could alleviate the symptoms by making making the
 Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
 pinned longer -- kinda as if your private ref count was hacked to be
 higher in that case.   It would be a complex fix for a narrow issue
 though.

 This test case is unusual because it hits a whole series of buffers
 very hard.  However, there are other cases where this happens on a
 single buffer that is just very, very hot, like the root block of a
 btree index, where the pin/unpin overhead hurts us.  I've been
 thinking about this problem for a while, but it hasn't made it up to
 the top of my priority list, because workloads where pin/unpin is the
 dominant cost are still relatively uncommon.  I expect them to get
 more common as we fix other problems.

 Anyhow, I do have some vague thoughts on how to fix this.  Buffer pins
 are a lot like weak relation locks, in that they are a type of lock
 that is taken frequently, but rarely conflicts.  And the fast-path
 locking in 9.2 provides a demonstration of how to handle this kind of
 problem efficiently: making the weak, rarely-conflicting locks
 cheaper, at the cost of some additional expense when a conflicting
 lock (in this case, a buffer cleanup lock) is taken.  In particular,
 each backend has its own area to record weak relation locks, and a
 strong relation lock must scan all of those areas and migrate any
 locks found there to the main lock table.  I don't think it would be
 feasible to adopt exactly this solution for buffer pins, because page
 eviction and buffer cleanup locks, while not exactly common, are
 common enough that we can't require a scan of N per-backend areas
 every time one of those operations occurs.

 But, maybe we could have a system of this type that only applies to
 the very hottest buffers.  Suppose we introduce two new buffer flags,
 BUF_NAILED and BUF_NAIL_REMOVAL.  When we detect excessive contention
 on the buffer header spinlock, we set BUF_NAILED.  Once we do that,
 the buffer can't be evicted until that flag is removed, and backends
 are permitted to record pins in a per-backend area protected by a
 per-backend spinlock or lwlock, rather than in the buffer header.
 When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.

Hm, couple questions: how do you determine if/when to un-nail a
buffer, and who makes that decision (bgwriter?) Is there a limit to
how many buffers you are allowed to nail?  It seems like a much
stronger idea, but one downside I see vs the 'pin for longer idea' i
was kicking around was how to deal stale nailed buffers and keeping
them from uncontrollably growing so that you have to either stop
nailing or forcibly evicting them.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-31 Thread Robert Haas
On Wed, May 30, 2012 at 12:17 PM, Fujii Masao masao.fu...@gmail.com wrote:
 OTOH, I wonder whether we really need to send keepalive messages
 periodically to calculate a network latency. ISTM we don't unless a network
 latency varies from situation to situation so frequently and we'd like to
 monitor that in almost real time.

I didn't look at this patch too carefully when it was committed.
Looking at it more carefully now, it looks to me like this patch does
two different things.  One is to add a function called
GetReplicationApplyDelay(), which returns the number of milliseconds
since replay was fully caught up.  So if you were last caught up 5
minutes ago and you have replayed 4 minutes and 50 seconds worth of
WAL during that time, this function will return 5 minutes, not 10
seconds.  That is not what I would call apply delay, which I would
define as how far behind you are NOW, not how long it's been since you
weren't behind at all.

The second thing it does is add a function called
GetReplicationTransferLatency().  The return value of this function is
the difference between the slave's clock at the time the last master
keepalive was processed and the master's clock at the time that
keepalive was generated.  I think that in practice, unless network
time synchronization is in use, this is mostly going to be computing
the clock skew between the master and the slave. If time
synchronization is in use, then as you say it'll be a very jittery
measure of master-slave network latency, which can be monitored
perfectly well from outside PG.

Now, measuring time skew is potentially a useful thing to do, if we
believe that this will actually give us an accurate measurement of
what the time skew is, because there are a whole series of things that
people want to do which involve subtracting a slave timestamp from a
master timestamp.  Tom has persistently rebuffed all such proposals on
the grounds that there might be time skew, so in theory we could make
those things possible by having a way to measure time skew, which this
does.  Here's what we do: given a slave timestamp, add the estimated
time skew to find an equivalent master timestamp, and then subtract.
Using a method of this type would allow us to compute a *real* apply
delay.  Woohoo!  Unfortunately, if time synchronization IS in use,
then the system clocks are probably already synchronized three to six
orders of magnitude more precisely than what this method can measure,
so the effect of using GetReplicationTransferLatency() to adjust slave
timestamps will be to massively reduce the accuracy of such
calculations.  However, I've thus far been unable to convince anyone
that this is a bad idea, so maybe this is where we're gonna end up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Kohei KaiGai
2012/5/31 Robert Haas robertmh...@gmail.com:
 If we would have an ideal optimizer, I'd still like the optimizer to
 wipe out redundant clauses transparently, rather than RLSBYPASS
 permissions, because it just controls all-or-nothing stuff.
 For example, if tuples are categorized to unclassified, classified or
 secret, and RLS policy is configured as:
  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
 'classified')) OR (X IN 'unclassified)),
 superuser can see all the tuples, and alice and bob can see
 up to classified tuples.
 Is it really hard to wipe out redundant condition at planner stage?
 If current_user is obviously 'kaigai', it seems to me the left-side of
 this clause can be wiped out at the planner stage.
 Do I consider the issue too simple?

 Yes.  :-)

 There are two problems.  First, if using the extended query protocol
 (e.g. pgbench -M prepared) you can prepare a statement just once and
 then execute it multiple times.  In this case, stable-functions cannot
 be constant-folded at plan time, because they are only guaranteed to
 remain constant for a *single* execution of the query, not for all
 executions of the query.  So any optimization in this area would have
 to be limited to cases where the simple query protocol is used.  I
 think that might still be worth doing, but it's a significant
 limitation, to be sure.  Second, at present, there is no guarantee
 that the snapshot used for planning the query is the same as the
 snapshot used for executing the query, though commit
 d573e239f03506920938bf0be56c868d9c3416da made that happen in some
 common cases.  If we were to do constant-folding of stable functions
 using the planner snapshot, it would represent a behavior change from
 previous releases.  I am not clear whether that has any real-world
 consequences that we should be worried about.  It seems to me that the
 path of least resistance might be to refactor the portal stuff so that
 we can provide a uniform guarantee that, when using the simple query
 protocol, the planner and executor snapshots will be the same ... but
 I might be wrong.

It may be an option to separate the case into two; a situation to execute
the given query immediately just after optimization and never reused,
and others.
Even though the second situation, it may give us better query execution
plan, if we try to reconstruct query plan just before executor with
assumption that expects immutable / stable function can be replaced
by constant value prior to execution.
In other words, this idea tries to query optimization again on EXECUTE
statement against to its nature, to replace immutable / stable functions
by constant value, and to generate wiser execute plan.
At least, it may make sense to have a flag on prepared statement to
indicate whether it has possible better plan with this re-construction.

Then, if so, we will be able to push the stuff corresponding to
RLSBYPASS into the query optimization, and works transparently
for users.

Isn't it feasible to implement?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Kohei KaiGai
2012/5/31 Kohei KaiGai kai...@kaigai.gr.jp:
 2012/5/31 Robert Haas robertmh...@gmail.com:
 If we would have an ideal optimizer, I'd still like the optimizer to
 wipe out redundant clauses transparently, rather than RLSBYPASS
 permissions, because it just controls all-or-nothing stuff.
 For example, if tuples are categorized to unclassified, classified or
 secret, and RLS policy is configured as:
  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
 'classified')) OR (X IN 'unclassified)),
 superuser can see all the tuples, and alice and bob can see
 up to classified tuples.
 Is it really hard to wipe out redundant condition at planner stage?
 If current_user is obviously 'kaigai', it seems to me the left-side of
 this clause can be wiped out at the planner stage.
 Do I consider the issue too simple?

 Yes.  :-)

 There are two problems.  First, if using the extended query protocol
 (e.g. pgbench -M prepared) you can prepare a statement just once and
 then execute it multiple times.  In this case, stable-functions cannot
 be constant-folded at plan time, because they are only guaranteed to
 remain constant for a *single* execution of the query, not for all
 executions of the query.  So any optimization in this area would have
 to be limited to cases where the simple query protocol is used.  I
 think that might still be worth doing, but it's a significant
 limitation, to be sure.  Second, at present, there is no guarantee
 that the snapshot used for planning the query is the same as the
 snapshot used for executing the query, though commit
 d573e239f03506920938bf0be56c868d9c3416da made that happen in some
 common cases.  If we were to do constant-folding of stable functions
 using the planner snapshot, it would represent a behavior change from
 previous releases.  I am not clear whether that has any real-world
 consequences that we should be worried about.  It seems to me that the
 path of least resistance might be to refactor the portal stuff so that
 we can provide a uniform guarantee that, when using the simple query
 protocol, the planner and executor snapshots will be the same ... but
 I might be wrong.

 It may be an option to separate the case into two; a situation to execute
 the given query immediately just after optimization and never reused,
 and others.
 Even though the second situation, it may give us better query execution
 plan, if we try to reconstruct query plan just before executor with
 assumption that expects immutable / stable function can be replaced
 by constant value prior to execution.
 In other words, this idea tries to query optimization again on EXECUTE
 statement against to its nature, to replace immutable / stable functions
 by constant value, and to generate wiser execute plan.
 At least, it may make sense to have a flag on prepared statement to
 indicate whether it has possible better plan with this re-construction.

 Then, if so, we will be able to push the stuff corresponding to
 RLSBYPASS into the query optimization, and works transparently
 for users.

 Isn't it feasible to implement?

If we could replace a particular term that consists of constant values
and stable / immutable functions only by parameter references,
it may enable to handle the term as if a constant value, but actual
calculation is delayed to executor stage.

For example, according to this idea,
  PREPARE p1(int) AS SELECT * FROM tbl WHERE
  current_user in ('alice','bob') AND X  $1;
shall be internally rewritten to,
  PREPARE p1(int) AS SELECT * FROM tbl WHERE
  $2 AND X$1;

then, $2 is implicitly calculated just before execution of this prepared
statement. The snapshot to be used for this calculation is same with
executor's one. It seems to me it is a feasible idea with less invasive
implementation to existing planner.

Does it make sense to describe exceptional condition using regular
clause, instead of special permission?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 3:25 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Hm, couple questions: how do you determine if/when to un-nail a
 buffer, and who makes that decision (bgwriter?)

Well, I think some experimentation might be required, but my first
thought is to tie it into buffer eviction.  If we record a pin in some
side array in lieu of acquiring the buffer header spinlock, then we're
not bumping the usage count.  So perhaps what we might do is - when we
first nail the buffer, we set it's usage count to something much
higher than what's normally allowed, like say 50.  When it gets down
to 0 and somebody tries to evict it, we scan the per-backend arrays
and count the number of pins we find there, and set the usage count to
5 * that number.

The details could be completely wrong here.  The point is that I think
every so often we can just reconsider whether the buffer is still
worthy of being nailed.  Exactly how often we should reconsider it
seems like an empirical question.

 Is there a limit to
 how many buffers you are allowed to nail?

Probably not.  The system had better be designed so that the threshold
for nailing a buffer is pretty high.  It should be exceedingly
unlikely for more than a small percentage of buffers to meet the
criteria; if it happens, then there's probably a good reason.

 It seems like a much
 stronger idea, but one downside I see vs the 'pin for longer idea' i
 was kicking around was how to deal stale nailed buffers and keeping
 them from uncontrollably growing so that you have to either stop
 nailing or forcibly evicting them.

I think we're really talking about variants on the same theme.  My
idea is basically pin forever, which is taking longer to its
logical extreme, but even if it were something short of forever I
think you'd still need some way to flush the stale pins.  Otherwise,
vacuum can get stuck, etc.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Figuring out shared buffer pressure

2012-05-31 Thread Daniel Farina
On Thu, May 31, 2012 at 10:11 AM, Bruce Momjian br...@momjian.us wrote:
 However, this doesn't help people configure shared buffers larger (e.g.
 35%) if their working set is larger.  Right now, I don't see how a user
 would know this is happening.  On the flip side, they might have a
 smaller working set than 25% and spending the overhead of managing 1
 million shared buffers.  Again, there is no way to know if that is the
 case.

Another important use case: downgrades.  They do happen, and right now
are amazingly risky and made with limited information.  Clearly a most
complete picture is impossible because of reliance on kernel buffer
management, but knowing the PG buffer pool occupancy and flux seems
like it'd be so much better than knowing nothing, and it is likely
that some conservative intuition could be learned to perform
relatively safe downgrades.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2012-05-31 Thread Jeff Janes
On Sun, Jun 19, 2011 at 3:30 PM, Greg Smith g...@2ndquadrant.com wrote:
 I applied Jeff's patch but changed this to address concerns about the
 program getting stuck running for too long in the function:

 #define plpgsql_loops   512

 This would be better named as plpgsql_batch_size or something similar
 instead, the current name suggests it's how many loops to run which is
 confusing.

 My main performance concern here was whether this change really matter so
 much once a larger number of clients were involved.  Some of the other
 things you can do to optimize single-client performance aren't as useful
 with lots of them.  Here's how the improvements in this mode worked for me
 on a server with 4 Hyper-Threaded cores (i870); shared_buffers=256MB,
 scale=100:

 1 client:
 -S: 11533
 -S -M prepared: 19498
 -P: 49547

 12 clients, 4 workers:
 -S:  56052
 -S -M prepared: 82043
 -P: 159443

 96 clients, 8 workers:
 -S: 49940
 -S -M prepared: 76099
 -P: 137942

 I think this is a really nice new workload to demonstrate.  One of the
 things we tell people is that code works much faster when moved server-side,
 but how much faster isn't always easy to show.  Having this mode available
 lets them see how dramatic that can be quite easily.  I know I'd like to be
 able to run performance tests for clients of new hardware using PostgreSQL
 and tell them something like this:  With simple clients executing a
 statement at a time, this server reaches 56K SELECTs/section.  But using
 server-side functions to execute them in larger batches it can do 159K.

 The value this provides for providing an alternate source for benchmark load
 generation, with a very different profile for how it exercises the server,
 is good too.

 Things to fix in the patch before it would be a commit candidate:

 -Adjust the loop size/name, per above
 -Reformat some of the longer lines to try and respect the implied right
 margin in the code formatting
 -Don't include the plgsql function created. line unless in debugging mode.
 -Add the docs.  Focus on how this measures how fast the database can execute
 SELECT statements using server-side code.  An explanation that the
 transaction block size is 512 is important to share.  It also needs a
 warning that time based runs (-T) may have to wait for a block to finish
 and go beyond its normally expected end time.
 -The word via in the transaction type output description is probably not
 the best choice.  Changing to SELECT only using PL/pgSQL would translate
 better, and follow the standard case use for the name of that language.

Sorry it has taken me a year to get back to this patch.  I have wanted
to use it, and to ask other people to run it and report their results,
several time recently, so I would like to get it into the core.

I've attached a new patch which addresses several of your concerns,
and adds the documentation.  The description is much longer than the
descriptions of other nearby options, which mostly just give a simple
statement of what they do rather than a description of why that is
useful.  I don't know if that means I'm starting a good trend, or a
bad one, or I'm just putting the exposition in the wrong place.

In addition to showing the benefits of coding things on the server
side when that is applicable, it also allows hackers to stress parts
of the server code that are not easy to stress otherwise.

Cheers,

Jeff


pgbench_loop_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes
On Sun, May 27, 2012 at 11:45 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I did another test using the same data and the same code, which I've
 provided before and the performance of the single thread seems to be
 degrading quadratically with the number of threads.

 Here are the results:
 Nthreads Time_to_execute_one_thread
 1 8.1
 2 7.8
 3 8.1
 4 9.0
 5 10.2
 6 11.4
 7 13.3
 8 16.1
 9 19.0
 10 21.4
 11 23.8
 12 27.3
 13 30.2
 14 32.0
 15 34.1
 16 37.5

This looks more skew hyperbolic than quadratic.  In the higher region,
it doesn't seem to be increasing at an increasing rate, but rather
increasing at a constant rate of about 2.5 sec per additional backend,
i.e. converging to an asymptote with a slope of 2.5.  A pedantic
point, perhaps, but if you are trying to infer the nature of the
bottleneck from the shape of the curve, they lead to rather different
sets of hypotheses.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes
On Thu, May 31, 2012 at 9:17 AM, Robert Haas robertmh...@gmail.com wrote:

 Oh, ho.  So from this we can see that the problem is that we're
 getting huge amounts of spinlock contention when pinning and unpinning
 index pages.

 It would be nice to have a self-contained reproducible test case for
 this, so that we could experiment with it on other systems.

I just posted a patch under subject pgbench--new transaction type
that introduces a pgbench -P option.

I think that that would do a good job of simulating unique-key
look-ups on the inner side of a nested loop (which is basically what
we have here) and so creating contention on index pages.  Right now I
don't have anything with more than 2 CPUs and 2 is not high enough to
get much contention so I can't post any meaningful numbers.  (pgbench
-P might also be of interest in hash index investigation)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes
On Thu, May 31, 2012 at 11:50 AM, Robert Haas robertmh...@gmail.com wrote:

 This test case is unusual because it hits a whole series of buffers
 very hard.  However, there are other cases where this happens on a
 single buffer that is just very, very hot, like the root block of a
 btree index, where the pin/unpin overhead hurts us.

I think that very very hot page is also the problem here, not a whole
sequence of hot pages.  Most of his buffer content sh lwlocks are on
just two buffers, and most of his blocked buffer mapping lwlocks on
are on just two partitions.  So I am guessing that almost all of his
spin-lock contention from Pin and Unpin are also coming from those
same two buffers.  Why there are two buffers when there is only one
index root block involved, I don't know.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Jeff Janes
On Thu, May 31, 2012 at 11:09 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 31 May 2012, Simon Riggs wrote:


 That struck me as a safe and easy optimisation. This was a problem I'd
 been trying to optimise for 9.2, so I've written a patch that appears
 simple and clean enough to be applied directly.


 Thanks! The patch indeed improved the timings, The dropping of 100 tables in
 a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec (it
 would be nice to reduce it further though, because the dropping of 1
 tables still takes ~10 min).

I'm surprised it helped that much.  I thought the most it could
theoretically could help would be a factor of 4.

I tried the initially unlocked test, and for me it cut the time by a
factor of 3.  But I only have a 1GB shared_buffers at the max, I would
expect it help more at larger sizes because there is a constant
overhead not related to scanning the shared buffers which gets diluted
out the larger shared_buffers is.

I added to that a drop-all very similar to what Simon posted and got
another factor of 3.

But, if you can do this during a maintenance window, then just
restarting with a much smaller shared_buffers should give you a much
larger speed up than either or both of these.  If I can extrapolate up
to 10G from my current curve, setting it to 8MB instead would give a
speed up of nearly 400 fold.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes
On Wed, May 30, 2012 at 6:10 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Wed, 30 May 2012, Jeff Janes wrote:

 But anyway, is idt_match a fairly static table?  If so, I'd partition
 that into 16 tables, and then have each one of your tasks join against
 a different one of those tables.  That should relieve the contention
 on the index root block, and might have some other benefits as well.


 No, idt_match is getting filled by multi-threaded copy() and then joined
 with 4 other big tables like idt_phot. The result is then split into
 partitions.

That does make things more complicated.  But you could you partition
it at that level and then do the joins partition-wise?

I don't have much experience at data partitioning (well, I do, but the
experience is with partitioning in Perl with terabytes of flat files,
not in PG :) ) but I think that once you have your partitioning keys
you want to apply them the same way up and down the data set.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Now, measuring time skew is potentially a useful thing to do, if we
 believe that this will actually give us an accurate measurement of
 what the time skew is, because there are a whole series of things that
 people want to do which involve subtracting a slave timestamp from a
 master timestamp.  Tom has persistently rebuffed all such proposals on
 the grounds that there might be time skew, so in theory we could make
 those things possible by having a way to measure time skew, which this
 does.  Here's what we do: given a slave timestamp, add the estimated
 time skew to find an equivalent master timestamp, and then subtract.
 Using a method of this type would allow us to compute a *real* apply
 delay.  Woohoo!  Unfortunately, if time synchronization IS in use,
 then the system clocks are probably already synchronized three to six
 orders of magnitude more precisely than what this method can measure,
 so the effect of using GetReplicationTransferLatency() to adjust slave
 timestamps will be to massively reduce the accuracy of such
 calculations.  However, I've thus far been unable to convince anyone
 that this is a bad idea, so maybe this is where we're gonna end up.

Hmm ... first question is do we actually care whether the clocks are
synced to the millisecond level, ie what is it you'd do differently
if you know that the master and slave clocks are synced more closely
than you can measure at the protocol level.

But if there is a reason to care, perhaps we could have a setting that
says we're using NTP, so trust the clocks to be synced?  What I object
to is assuming that without any evidence, or being unable to operate
correctly in an environment where it's not true.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Simon Riggs
On 31 May 2012 19:09, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 31 May 2012, Simon Riggs wrote:


 That struck me as a safe and easy optimisation. This was a problem I'd
 been trying to optimise for 9.2, so I've written a patch that appears
 simple and clean enough to be applied directly.


 Thanks! The patch indeed improved the timings, The dropping of 100 tables in
 a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec

Thanks for the timing.

(it
 would be nice to reduce it further though, because the dropping of 1
 tables still takes ~10 min).

Why do you have 10,000 tables and why is it important to drop them so quickly?

If its that important, why not run the drop in parallel sessions?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers