Re: [HACKERS] Switching to Homebrew as recommended Mac install?

2012-04-04 Thread Dave Page
On Tue, Apr 3, 2012 at 11:12 PM, Greg Stark st...@mit.edu wrote:
 On Wed, Apr 4, 2012 at 1:19 AM, Dave Page dp...@pgadmin.org wrote:
 then, we're talking about making parts of the filesystem
 world-writeable so it doesn't even matter if the user is running as an
 admin for a trojan or some other nasty to attack the system.

 The argument is that a trojan or other nasty doesn't *need* to be
 admin to attack the system since it can just attack the user's account
 since that's where all the interesting data is anyways.

Isn't that what I said?

 But again, this is all beside the point. It's a judgement for Apple
 and Microsoft and individual administrators to make. We can't really
 start reconfiguring people's systems to use a different security model
 than they expect just because they've installed a database software --
 even if we think our security model makes more sense than the one
 their used to.

Exactly - which is why I was objecting to recommending a distribution
of PostgreSQL that came in a packaging system that we were told
changed /usr/local to be world writeable to avoid the use/annoyance of
the standard security measures on the platform.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel pg_dump

2012-04-04 Thread Joachim Wieland
On Tue, Apr 3, 2012 at 9:26 AM, Andrew Dunstan and...@dunslane.net wrote:
 First, either the creation of the destination directory needs to be delayed
 until all the sanity checks have passed and we're sure we're actually going
 to write something there, or it needs to be removed if we error exit before
 anything gets written there.

pg_dump also creates empty files which is the analogous case here.
Just try to dump a nonexistant database for example (this also shows
that delaying won't help...).

 Maybe pg_dump -F d should be prepared to accept an empty directory as well as 
 a
 non-existent directory, just as initdb can.

That sounds like a good compromise. I'll implement that.


 Second, all the PrintStatus traces are annoying and need to be removed, or
 perhaps better only output in debugging mode (using ahlog() instead of just
 printf())

Sure, PrintStatus is just there for now to see what's going on. My
plan was to remove it entirely in the final patch.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bugfix for cursor arguments in named notation

2012-04-04 Thread Yeb Havinga
Using a cursor argument name equal to another plpgsql variable results 
in the error:

cursor .. has no argument named 

The attached patch fixes that.

Instead of solving the issue like is done in the patch, another way 
would be to expose internal_yylex() so that could be used instead of 
yylex() by read_cursor_args when reading the argument name, and would 
always return the argument name in yylval.str.


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

From 47c451cbf188ac2aff9784bff73bc7fb7b846d26 Mon Sep 17 00:00:00 2001
From: Willem  Yeb w...@mgrid.net
Date: Wed, 4 Apr 2012 11:30:41 +0200
Subject: [PATCH] Fix cursor has no argument named  error.

When a cursor argument name coincided with another plpgsql variable name,
yylex() returns it not as str but as a wdatum.
---
 src/pl/plpgsql/src/gram.y |6 --
 src/test/regress/expected/plpgsql.out |   18 ++
 src/test/regress/sql/plpgsql.sql  |   15 +++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y
index 5a555af..2d52278 100644
--- a/src/pl/plpgsql/src/gram.y
+++ b/src/pl/plpgsql/src/gram.y
@@ -3447,8 +3447,10 @@ read_cursor_args(PLpgSQL_var *cursor, int until, const char *expected)
 			char   *argname;
 
 			/* Read the argument name, and find its position */
-			yylex();
-			argname = yylval.str;
+			if (yylex() == T_DATUM)
+argname = yylval.wdatum.ident;
+			else
+argname = yylval.str;
 
 			for (argpos = 0; argpos  row-nfields; argpos++)
 			{
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 5455ade..56cfa57 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -2420,6 +2420,24 @@ select namedparmcursor_test8();
  0
 (1 row)
 
+-- cursor parameter named the same as other plpgsql variables
+create or replace function namedparmcursor_test9(p1 int) returns int4 as $$
+declare
+  c1 cursor (p1 int, p2 int) for
+select count(*) from tenk1 where thousand = p1 and tenthous = p2;
+  n int4;
+  p2 int4 := 1006;
+begin
+  open c1 (p1 := p1,  p2 := p2);
+  fetch c1 into n;
+  return n;
+end $$ language plpgsql;
+select namedparmcursor_test9(6);
+ namedparmcursor_test9 
+---
+ 1
+(1 row)
+
 --
 -- tests for raise processing
 --
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index f577dc3..6b9795d 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2053,6 +2053,21 @@ begin
 end $$ language plpgsql;
 select namedparmcursor_test8();
 
+-- cursor parameter named the same as other plpgsql variables
+create or replace function namedparmcursor_test9(p1 int) returns int4 as $$
+declare
+  c1 cursor (p1 int, p2 int) for
+select count(*) from tenk1 where thousand = p1 and tenthous = p2;
+  n int4;
+  p2 int4 := 1006;
+begin
+  open c1 (p1 := p1,  p2 := p2);
+  fetch c1 into n;
+  return n;
+end $$ language plpgsql;
+
+select namedparmcursor_test9(6);
+
 --
 -- tests for raise processing
 --
-- 
1.7.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-04-04 Thread Joachim Wieland
On Tue, Apr 3, 2012 at 11:04 AM, Robert Haas robertmh...@gmail.com wrote:
 OK, but it seems like a pretty fragile assumption that none of the
 workers will ever manage to emit any other error messages.  We don't
 rely on this kind of assumption in the backend, which is a lot
 better-structured and less spaghetti-like than the pg_dump code.

Yeah, but even if they don't use exit_horribly(), the user would still
see the output, stdout/stderr aren't closed and everyone can still
write to it.

As a test, I printed out some messages upon seeing a specific dump id
in a worker:

if (strcmp(command, DUMP 3540) == 0)
{
write_msg(NULL, Info 1\n);
printf(Info 2\n);
exit_horribly(modulename, that's why\n);
}


$ ./pg_dump -j 7 ...
pg_dump: Info 1
Info 2
pg_dump: [parallel archiver] that's why


if (strcmp(command, DUMP 3540) == 0)
{
write_msg(NULL, Info 1\n);
printf(Info 2\n);
fprintf(stderr, exiting on my own\n);
exit(1);
}


$ ./pg_dump -j 7 ...
pg_dump: Info 1
Info 2
exiting on my own
pg_dump: [parallel archiver] A worker process died unexpectedly

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Mon, Apr 2, 2012 at 12:33 PM, Robert Haas robertmh...@gmail.com wrote:
 This particular example shows the above chunk of code taking 13s to
 execute.  Within 3s, every other backend piles up behind that, leading
 to the database getting no work at all done for a good ten seconds.

 My guess is that what's happening here is that one backend needs to
 read a page into CLOG, so it calls SlruSelectLRUPage to evict the
 oldest SLRU page, which is dirty.  For some reason, that I/O takes a
 long time.  Then, one by one, other backends comes along and also need
 to read various SLRU pages, but the oldest SLRU page hasn't changed,
 so SlruSelectLRUPage keeps returning the exact same page that it
 returned before, and everybody queues up waiting for that I/O, even
 though there might be other buffers available that aren't even dirty.

 I am thinking that SlruSelectLRUPage() should probably do
 SlruRecentlyUsed() on the selected buffer before calling
 SlruInternalWritePage, so that the next backend that comes along
 looking for a buffer doesn't pick the same one.  Possibly we should go
 further and try to avoid replacing dirty buffers in the first place,
 but sometimes there may be no choice, so doing SlruRecentlyUsed() is
 still a good idea.

 I'll do some testing to try to confirm whether this theory is correct
 and whether the above fix helps.

Having performed this investigation, I've discovered a couple of
interesting things.  First, SlruRecentlyUsed() is an ineffective way
of keeping a page from getting reused, because it's called extremely
frequently, and on these high-velocity tests it takes almost no time
at all for the most recently used buffer to become the least recently
used buffer.  Therefore, SlruRecentlyUsed() doesn't prevent the lock
pile-up.  In the unpatched code, once a long buffer I/O starts,
everybody immediately goes into the tank until the I/O finishes.  If
you patch the code so that the page is marked recently-used before
beginning the I/O, everybody's next few CLOG requests hit some other
buffer but eventually the long-I/O-in-progress buffer again becomes
least recently used and the next CLOG eviction causes a second backend
to begin waiting for that buffer.  Lather, rinse, repeat, until
literally every backend is once again waiting on that buffer I/O.  You
still get the same problem; it just takes slightly longer to develop.

On reflection, it seems to me that the right fix here is to make
SlruSelectLRUPage() to avoid selecting a page on which an I/O is
already in progress.  In general, those I/Os are all writes.  We don't
end up waiting for reads because all the old CLOG pages we might want
to read are still in the OS cache.  So reads complete quickly, on this
test.  Writes take a long time, because there we have to actually get
the data down to disk, and the disk is busy.  But there's no reason
for a backend doing a replacement to wait for either a read or a write
that is in progress: once the read or write completes, we're going to
loop around and repeat the buffer selection process, and most likely
pick a buffer completely unrelated to the one whose I/O we waited for.
 We might as well just skip the wait and select that other buffer
immediately.  The attached patch implements that.

Applying this patch does in fact eliminate the stalls.  Here are the
top ten places where blocking happens without the patch - these are
counts of times we waited more than 100ms for a lwlock during
30-minute, 32-client pgbench run:

 54 slru.c:311 blocked by slru.c:405
 99 xlog.c:2241 blocked by xlog.c:2090
172 heapam.c:2758 blocked by heapam.c:2758
635 indexam.c:521 blocked by heapam.c:2758
663 xlog.c:2090 blocked by xlog.c:2241
666 varsup.c:65 blocked by varsup.c:65
682 heapam.c:2758 blocked by indexam.c:521
803 xlog.c:1502 blocked by xlog.c:2241
   3002 slru.c:311 blocked by slru.c:529
  23978 xlog.c:909 blocked by xlog.c:909

And with the patch:

 72 hio.c:336 blocked by heapam.c:2758
109 xlog.c:2241 blocked by xlog.c:2090
129 slru.c:311 blocked by slru.c:405
210 heapam.c:2758 blocked by heapam.c:2758
425 heapam.c:2758 blocked by indexam.c:521
710 indexam.c:521 blocked by heapam.c:2758
766 xlog.c:2090 blocked by xlog.c:2241
915 xlog.c:1502 blocked by xlog.c:2241
   1684 varsup.c:65 blocked by varsup.c:65
  27950 xlog.c:909 blocked by xlog.c:909

As you can see, slru.c:311 blocked by slru.c:529 disappears.  It's not
just no longer in the top ten - it's actually completely gone.
Unfortunately, we get more stalls elsewhere as a result, but that's
only to be expected - contention moves around as you fix things.  The
remaining blocking within slru.c is attributable to the line that says
129 slru.c:311 blocked by slru.c:405.  I haven't fully verified
this, but I believe that blocking happens there when somebody needs to
read a page that's already being read - the second guy quite naturally
waits for the first guy's I/O to finish.  Those waits 

Re: [HACKERS] parallel pg_dump

2012-04-04 Thread Andrew Dunstan



On 04/04/2012 05:03 AM, Joachim Wieland wrote:

Second, all the PrintStatus traces are annoying and need to be removed, or
perhaps better only output in debugging mode (using ahlog() instead of just
printf())

Sure, PrintStatus is just there for now to see what's going on. My
plan was to remove it entirely in the final patch.





We need that final patch NOW, I think. There is very little time for 
this before it will be too late for 9.2.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance-test farm

2012-04-04 Thread Tomas Vondra
On 4.4.2012 05:35, Greg Smith wrote:
 On 03/05/2012 05:20 PM, Tomas Vondra wrote:
 What is the current state of this effort? Is there someone else working
 on that? If not, I propose this (for starters):

* add a new page Performance results to the menu, with a list of
  members that uploaded the perfomance-results

* for each member, there will be a list of tests along with a running
  average for each test, last test and indicator if it improved, got
  worse or is the same

* for each member/test, a history of runs will be displayed, along
  with a simple graph
 
 
 I am quite certain no one else is working on this.
 
 The results are going to bounce around over time.  Last test and
 simple computations based on it are not going to be useful.  A graph and
 a way to drill down into the list of test results is what I had in mind.
 
 Eventually we'll want to be able to flag bad trends for observation,
 without having to look at the graph.  That's really optional for now,
 but here's how you could do that.  If you compare a short moving average
 to a longer one, you can find out when a general trend line has been
 crossed upwards or downwards, even with some deviation to individual
 samples.  There's a stock trading technique using this property called
 the moving average crossover; a good example is shown at
 http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20FollowingwealthScriptType=MovingAverageCrossover

Yes, exactly. I've written 'last test' but I actually meant something
like this, i.e. detecting the change of the trend over time. The moving
average crossover looks interesting, although there are other ways to
achieve similar goal (e.g. correlating with a a pattern - a step
function for example, etc.).

 It's possible to keep a running weighted moving average without actually
 remembering all of the history.  The background writer works that way. I
 don't think that will be helpful here though, because you need a chunk
 of the history to draw a graph of it.

Keeping the history is not a big deal IMHO. And it gives us the freedom
to run a bit more complex analysis anytime later.

 I'm not quite sure how to define which members will run the performance
 tests - I see two options:

* for each member, add a flag run performance tests so that we can
  choose which members are supposed to be safe

OR

* run the tests on all members (if enabled in build-farm.conf) and
  then decide which results are relevant based on data describing the
  environment (collected when running the tests)
 
 I was thinking of only running this on nodes that have gone out of their
 way to enable this, so something more like the first option you gave
 here.  Some buildfarm animals might cause a problem for their owners
 should they suddenly start doing anything new that gobbles up a lot more
 resources.  It's important that any defaults--including what happens if
 you add this feature to the code but don't change the config file--does
 not run any performance tests.

Yes, good points. Default should be 'do not run performance test' then.

* it can handle one member running the tests with different settings
  (various shared_buffer/work_mem sizes, num of clients etc.) and
  various hw configurations (for example magpie contains a regular
  SATA drive as well as an SSD - would be nice to run two sets of
  tests, one for the spinner, one for the SSD)

* this can handle 'pushing' a list of commits to test (instead of
  just testing the HEAD) so that we can ask the members to run the
  tests for particular commits in the past (I consider this to be
  very handy feature)
 
 I would highly recommend against scope creep in these directions.  The
 goal here is not to test hardware or configuration changes.  You've been
 doing a lot of that recently, and this chunk of software is not going to
 be a good way to automate such tests.
 
 The initial goal of the performance farm is to find unexpected
 regressions in the performance of the database code, running some simple
 tests.  It should handle the opposite too, proving improvements work out
 as expected on multiple systems.  The buildfarm structure is suitable
 for that job.

Testing hardware configuration changes was not the goal of the proposed
behavior. The goal was to test multiple (sane) PostgreSQL configs. There
are conditions that might demonstrate themselves only in certain
conditions (e.g. very small/large shared buffers, spinners/SSDs etc.).

Those are exacly the 'unexpected regressions' you've mentioned.

 If you want to simulate a more complicated test, one where things like
 work_mem matter, the first step there is to pick a completely different
 benchmark workload.  You're not going to do it with simple pgbench calls.

Yes, but I do expect to prepare custom pgbench scripts in the future to
test such things. So I want to design the code so that this is possible
(either 

[HACKERS] Question regarding SSL code in backend and frontend

2012-04-04 Thread Tatsuo Ishii
Hi,

While looking into SSL code in secure_read() of be-secure.c and
pqsecure_read() of fe-secure.c, I noticed subtle difference between
them. 

In secure_read:
--
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
if (port-noblock)
{
errno = EWOULDBLOCK;
n = -1;
break;
}
#ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
FD_READ 
| FD_CLOSE : FD_WRITE | FD_CLOSE,

INFINITE);
#endif
goto rloop;
--

while in pqsecure_read:
--
case SSL_ERROR_WANT_READ:
n = 0;
break;
case SSL_ERROR_WANT_WRITE:

/*
 * Returning 0 here would cause caller to wait 
for read-ready,
 * which is not correct since what SSL wants is 
wait for
 * write-ready.  The former could get us stuck 
in an infinite
 * wait, so don't risk it; busy-loop instead.
 */
goto rloop;
--

Those code fragment judges the return value from
SSL_read(). secure_read() does retrying when SSL_ERROR_WANT_READ *and*
SSL_ERROR_WANT_WRITE returned. However, pqsecure_read() does not retry
when SSL_ERROR_WANT_READ. It seems they are not consistent. Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

2012-04-04 Thread Dobes Vandermeer
On Wed, Apr 4, 2012 at 6:26 AM, Josh Berkus j...@agliodbs.com wrote:


  While I was doing this I always thought this would have been a better
  approach for my previous project, an accounting application.  If I could
  just have stored entities like invoice  customer as a single document
 that
  is inserted, updated, etc. atomically it would be a lot simpler and
 faster
  than having to break things out into columns and rows spread over various
  tables.

 Actually, an accounting application is the *worst* candidate for
 document-oriented storage.


I guess I didn't go enough into detail.  As it's a small business
bookkeeping system the records are added after the fact.  Constraint
checking isn't a priority; we would allow someone to enter sales before
purchases and things like that which means the constraint checking has to
be more about flagging issues (we didn't get around to that yet, either).
 It wasn't an ERP and didn't support inventory so there's no worry about
searching for inventory counts in particular locations.  The idea is to
input source documents like invoices  receipts and generate reports for
stakeholders.

I think there is something to be gained by having a first-class concept of
a document in the database.  It might save some trouble managing
parent/child relations, versioning, things like that.

I hand-craft some materialized views back then, too, since the conversion
from a document (like an invoice) to the actual impact of that on account
ledgers and balances was non-trivial and evolving as the feature set
expanded, so it wasn't something you wanted to try and build into your
reporting queries.

Yes, having documents *in addition* to relational data gives you the
 best of both worlds.  You can use relational structures to store data
 which is well-defined and business-critical, and document structures to
 store data which is undefined and not critical.


Well that's exactly what I was trying to get at in the first place :-).
 I'd love to see this kind of functionality in PostgreSQL and I think
materialized views are a pretty powerful way to do that when you are
automatically pulling fields out of the document to make the relational
tables.


  So I kind of think the document database kind of bridges the gap between
 an
  OODBMS and the RDBMS because the document is like a little cluster of

 OODBMS != DocumentDB


Yes, I know.  I was just saying that a document DB is a bit more OO because
the document itself is stored as an object graph rather than just tuples.
 Thus it fits in between RDBMS and OODBMS in a way.  It makes sense in my
head somehow, no need to agree with me on this one.

Regards,

Dobes


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-04-04 Thread Albe Laurenz
Shigeru HANADA wrote:
 During a foreign scan, type input functions are used to convert
 the text representation of values.  If a foreign table is
misconfigured,
 you can get error messages from these functions, like:

 ERROR:  invalid input syntax for type double precision: etwas
 or
 ERROR:  value too long for type character varying(3)

 It might me nice for finding problems if the message were
 something like:

 ERROR:  cannot convert data in foreign scan of tablename, column
col
 in row 42
 DETAIL:  ERROR:  value too long for type character varying(3)
 
 Agreed.  How about showing context information with errcontext() in
 addition to main error message?  Of course, identifiers are quoted if
 necessary.  This way doesn't need additional PG_TRY block, so overhead
 would be relatively cheap.

 postgres=# SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
 ERROR:  invalid input syntax for integer: 1970-01-02 17:00:00+09
 CONTEXT:  column c4 of foreign table ft1
 
 Showing index of the row seems overkill, because most cause of this
kind
 of error is wrong configuration, as you say, and users would be able
to
 address the issue without knowing which record caused the error.

Agreed.  I think that is a better approach than what I suggested.

 As stated previously, I don't think that using local stats on
 foreign tables is a win.  The other patches work fine for me, and
 I'd be happy if that could go into 9.2.
 
 I have opposite opinion on this issue because we need to do some of
 filtering on local side.  We can leave cost/rows estimation to remote
 side about WHERE expressions which are pushed down, but we need
 selectivity of extra filtering done on local side.  For such purpose,
 having local stats of foreign data seems reasonable and useful.
 
 Of course, it has downside that we need to execute explicit ANALYZE
for
 foreign tables which would cause full sequential scan on remote
tables,
 in addition to ANALYZE for remote tables done on remote side as usual
 maintenance work.

This approach is much better and does not suffer from the
limitations the original analyze patch had.

I think that the price of a remote table scan is something
we should be willing to pay for good local statistics.
And there is always the option not to analyze the foreign
table if you are not willing to pay that price.

Maybe the FDW API could be extended so that foreign data wrappers
can provide a random sample to avoid a full table scan.

 Attached patch contains changes below:
 
 pgsql_fdw_v19.patch
   - show context of data conversion error
   - move codes for fetch_count FDW option to option.c
 (refactoring)
 pgsql_fdw_pushdown_v12.patch
   - make deparseExpr function static (refactoring)
 
 I also attached pgsql_fdw_analyze for only testing the effect of local
 statistics.  It contains both backend's ANALYZE command support and
 pgsql_fdw's ANALYZE support.

I think the idea is promising.

I'll mark the patch as ready for committer.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:
 Well, there are really four levels to the API design:
 * Plain old PQexec.
 * Break down PQexec into PQsendQuery and PQgetResult.
 * Avoid waiting in PQgetResult by testing PQisBusy.
 * Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
   on socket writes) by using PQisnonblocking.

 Thats actually nice overview.  I think our basic disagreement comes
 from how we map the early-exit into those modes.
 I want to think of the early-exit row-processing as 5th and 6th modes:

 * Row-by-row processing on sync connection (PQsendQuery() + ???)
 * Row-by-row processing on async connection (PQsendQuery() + ???)

 But instead you want work with almost no changes on existing modes.

Well, the trouble with the proposed PQgetRow/PQrecvRow is that they only
work safely at the second API level.  They're completely unsafe to use
with PQisBusy, and I think that is a show-stopper.  In your own terms,
the 6th mode doesn't work.

More generally, it's not very safe to change the row processor while a
query is in progress.  PQskipResult can get away with doing so, but only
because the entire point of that function is to lose data, and we don't
much care whether some rows already got handled differently.  For every
other use-case, you have to set up the row processor in advance and
leave it in place, which is a guideline that PQgetRow/PQrecvRow violate.

So I think the only way to use row-by-row processing is to permanently
install a row processor that normally returns zero.  It's possible that
we could provide a predefined row processor that acts that way and
invite people to install it.  However, I think it's premature to suppose
that we know all the details of how somebody might want to use this.
In particular the notion of cloning the PGresult for each row seems
expensive and not obviously more useful than direct access to the
network buffer.  So I'd rather leave it as-is and see if any common
usage patterns arise, then add support for those patterns.

 In particular, I flat out will not accept a design in which that option
 doesn't work unless the current call came via PQisBusy, much less some
 entirely new call like PQhasRowOrResult.  It's unusably fragile (ie,
 timing sensitive) if that has to be true.

 Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

Because it breaks if you use PQisBusy *anywhere* in the application.
That's not just a bug hazard but a loss of functionality.  I think it's
important to have a pure is data available state test function that
doesn't cause data to be consumed from the connection, and there's no
way to have that if there are API functions that change the row
processor setting mid-query.  (Another way to say this is that PQisBusy
ought to be idempotent from the standpoint of the application --- we
know that it does perform work inside libpq, but it doesn't change the
state of the connection so far as the app can tell, and so it doesn't
matter if you call it zero, one, or N times between other calls.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

2012-04-04 Thread Tom Lane
umi.tan...@gmail.com writes:
 http://www.postgresql.org/docs/9.1/static/spi-spi-execute.html

 ===
 SPI_execute(INSERT INTO foo SELECT * FROM bar, false, 5);
 will allow at most 5 rows to be inserted into the table.
 ===

 This seems not true unless I'm missing something.

Hmm ... that did work as described, until we broke it :-(.  This is an
oversight in the 9.0 changes that added separate ModifyTuple nodes to
plan trees.  ModifyTuple doesn't return after each updated row, unless
there's a RETURNING clause; which means that the current_tuple_count
check logic in ExecutePlan() no longer stops execution as intended.

Given the lack of complaints since 9.0, maybe we should not fix this
but just redefine the new behavior as being correct?  But it seems
mighty inconsistent that the tuple limit would apply if you have
RETURNING but not when you don't.  In any case, the ramifications
are wider than one example in the SPI docs.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugfix for cursor arguments in named notation

2012-04-04 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 Using a cursor argument name equal to another plpgsql variable results 
 in the error:
 cursor .. has no argument named 

 The attached patch fixes that.

 Instead of solving the issue like is done in the patch, another way 
 would be to expose internal_yylex() so that could be used instead of 
 yylex() by read_cursor_args when reading the argument name, and would 
 always return the argument name in yylval.str.

I think a better way would be to temporarily set
plpgsql_IdentifierLookup to IDENTIFIER_LOOKUP_DECLARE, so as to suppress
variable name lookup in the first place.  The patch as you have it seems
to me to make too many assumptions about what name lookup might produce.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding SSL code in backend and frontend

2012-04-04 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 Those code fragment judges the return value from
 SSL_read(). secure_read() does retrying when SSL_ERROR_WANT_READ *and*
 SSL_ERROR_WANT_WRITE returned. However, pqsecure_read() does not retry
 when SSL_ERROR_WANT_READ. It seems they are not consistent. Comments?

There's no particular reason why they should be consistent, I think.
The assumptions for nonblocking operation are different.

I rather wonder whether the #ifdef WIN32 bit in the backend isn't dead
code though.  If the port isn't in nonblock mode, we shouldn't really
get here at all, should we?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding SSL code in backend and frontend

2012-04-04 Thread Magnus Hagander
On Wed, Apr 4, 2012 at 17:57, Tom Lane t...@sss.pgh.pa.us wrote:
 Tatsuo Ishii is...@postgresql.org writes:
 Those code fragment judges the return value from
 SSL_read(). secure_read() does retrying when SSL_ERROR_WANT_READ *and*
 SSL_ERROR_WANT_WRITE returned. However, pqsecure_read() does not retry
 when SSL_ERROR_WANT_READ. It seems they are not consistent. Comments?

 There's no particular reason why they should be consistent, I think.
 The assumptions for nonblocking operation are different.

 I rather wonder whether the #ifdef WIN32 bit in the backend isn't dead
 code though.  If the port isn't in nonblock mode, we shouldn't really
 get here at all, should we?

Not in a position to look at the code right now, but first guess - we
*always* have the underlying socket in nonblocking mode on win32, so
we can deliver signals properly. It becomes blocking only through our
APIs, but the SSL stuff bypasses that in some places - such as this
one calling win32_waitforsinglesocket...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] invalid search_path complaints

2012-04-04 Thread Tom Lane
Scott Mead sco...@openscg.com writes:
 Personally, I feel that if unix will let you be stupid:
 $ export PATH=/usr/bin:/this/invalid/crazy/path
 $ echo $PATH
 /usr/bin:/this/invalid/crazy/path
 PG should trust that I'll get where I'm going eventually :)

Well, that's an interesting analogy.  Are you arguing that we should
always accept any syntactically-valid search_path setting, no matter
whether the mentioned schemas exist?  It wouldn't be hard to do that.
The fun stuff comes in when you try to say I want a warning in these
contexts but not those, because (a) the behavior you think you want
turns out to be pretty squishy, and (b) it's not always clear from the
implementation level what the context is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] invalid search_path complaints

2012-04-04 Thread Scott Mead
On Wed, Apr 4, 2012 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Mead sco...@openscg.com writes:
  Personally, I feel that if unix will let you be stupid:
  $ export PATH=/usr/bin:/this/invalid/crazy/path
  $ echo $PATH
  /usr/bin:/this/invalid/crazy/path
  PG should trust that I'll get where I'm going eventually :)

 Well, that's an interesting analogy.  Are you arguing that we should
 always accept any syntactically-valid search_path setting, no matter
 whether the mentioned schemas exist?  It wouldn't be hard to do that.


   I think we should always accept a syntactically valid search_path.


 The fun stuff comes in when you try to say I want a warning in these
 contexts but not those, because (a) the behavior you think you want
 turns out to be pretty squishy, and (b) it's not always clear from the
 implementation level what the context is.


ISTM that just issuing a warning whenever you set the search_path (no
matter which context) feels valid (and better than the above *nix
behavior).  I would personally be opposed to seeing it on login however.

--Scott




regards, tom lane



Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/02/2012 01:03 PM, Tom Lane wrote:
 When I said list, I meant a List *.  No fixed size.

 Ok, like this?

I think this could use a bit of editorialization (I don't think the
stripe terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed?  In the worst case a transient load peak could result in
a long list that permanently adds search overhead.  Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.

 Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way.  If you don't mind I'll pick this up
and commit whichever way turns out better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding SSL code in backend and frontend

2012-04-04 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Apr 4, 2012 at 17:57, Tom Lane t...@sss.pgh.pa.us wrote:
 I rather wonder whether the #ifdef WIN32 bit in the backend isn't dead
 code though.  If the port isn't in nonblock mode, we shouldn't really
 get here at all, should we?

 Not in a position to look at the code right now, but first guess - we
 *always* have the underlying socket in nonblocking mode on win32, so
 we can deliver signals properly.

Ah, I think you're right.  So actually, the retry looping is expected
to be never-invoked in the Unix case.  If it did happen, it'd be a busy
wait loop, which would probably be a bad thing ... but it shouldn't
happen, and not clear it's worth adding any code to consider the
possibility more carefully.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] invalid search_path complaints

2012-04-04 Thread Tom Lane
Scott Mead sco...@openscg.com writes:
 On Wed, Apr 4, 2012 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that's an interesting analogy.  Are you arguing that we should
 always accept any syntactically-valid search_path setting, no matter
 whether the mentioned schemas exist?  It wouldn't be hard to do that.

I think we should always accept a syntactically valid search_path.

I could live with that.

 The fun stuff comes in when you try to say I want a warning in these
 contexts but not those, because (a) the behavior you think you want
 turns out to be pretty squishy, and (b) it's not always clear from the
 implementation level what the context is.

 ISTM that just issuing a warning whenever you set the search_path (no
 matter which context) feels valid (and better than the above *nix
 behavior).  I would personally be opposed to seeing it on login however.

You're getting squishy on me ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 8:00 AM, Robert Haas robertmh...@gmail.com wrote:
 There's some apparent regression on the single-client test, but I'm
 inclined to think that's a testing artifact of some kind and also
 probably not very important.  It would be worth paying a small price
 in throughput to avoid many-second entire-database stalls, but on this
 test throughput actually goes up in all cases other than a single
 client; and it's hard to take the single client case seriously as a
 regression anyway because if there's only one thing running, the only
 effect of this patch is to slightly increase the amount of CPU effort
 that we spend before replacement the same buffer we would have
 replaced anyway.  There's no way that's enough to cut 3% off
 performance; I think the explanation must be something like, say,
 autovacuum runs a bit faster because it doesn't hang as much, but then
 that triggers a checkpoint sooner; or something's shuffled itself
 around across cache lines in a way that works out a little worse; or
 maybe it's just that the patched code was tested second.

I reran the single client tests and this time got:

m01 tps = 1357.485132 (including connections establishing)
m01 tps = 1425.967027 (including connections establishing)
m01 tps = 1381.468519 (including connections establishing)
s01 tps = 1411.590074 (including connections establishing)
s01 tps = 1374.938182 (including connections establishing)
s01 tps = 1402.680618 (including connections establishing)

...which seems like ample evidence that there's no real regression
here, if anyone was still worried.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] invalid search_path complaints

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Scott Mead sco...@openscg.com writes:
 On Wed, Apr 4, 2012 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that's an interesting analogy.  Are you arguing that we should
 always accept any syntactically-valid search_path setting, no matter
 whether the mentioned schemas exist?  It wouldn't be hard to do that.

    I think we should always accept a syntactically valid search_path.

 I could live with that.

 The fun stuff comes in when you try to say I want a warning in these
 contexts but not those, because (a) the behavior you think you want
 turns out to be pretty squishy, and (b) it's not always clear from the
 implementation level what the context is.

 ISTM that just issuing a warning whenever you set the search_path (no
 matter which context) feels valid (and better than the above *nix
 behavior).  I would personally be opposed to seeing it on login however.

 You're getting squishy on me ...

My feeling on this is that it's OK to warn if the search_path is set
to something that's not valid, and it might also be OK to not warn.
Right now we emit a NOTICE and I don't feel terribly upset about that;
even if I did, I don't know that it's really worth breaking backward
compatibility for.

The WARNING on login is more troubling to me, because it's
misdirected.  The warning is the result either of a setting that was
never valid in the first place, or of a setting that became invalid
when a schema was renamed or dropped.  The people responsible for the
breakage are not necessarily the same people being warned; the people
being warned may not even have power to fix the problem.

I think that part of the issue here is that it feels to you, as a
developer, that the per-user and per-database settings are applied on
top of the default from postgresql.conf.  But the user doesn't think
of it that way, I think.  To them, they expect the per-user or
per-database setting to be there from the beginning, even though
that might not really be possible from an implementation perspective.
So they don't think of it being applied at startup time, and the
warning seems like a random intrusion (aside from possibly being log
spam).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] poll: CHECK TRIGGER?

2012-04-04 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I don't think I'm getting my point across by explaining, so here's a 
 modified version of the patch that does what I was trying to say.

Minor side point: some of the diff noise in this patch comes from
s/copy_plpgsql_datum/plpgsql_copy_plpgsql_datum/, which seems entirely
useless.  The name already contains plpgsql, and even if it didn't,
there is no particular reason for plpgsql to worry about polluting
global symbol namespace.  Nothing else resolves against its symbols
anyway, at least not on any platform we claim to support.  I would
therefore also argue against the other renamings like
s/exec_move_row/plpgsql_exec_move_row/.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Fwd: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)

2012-04-04 Thread Christopher Browne
On Wed, Apr 4, 2012 at 9:53 AM, Dobes Vandermeer dob...@gmail.com wrote:
 I think there is something to be gained by having a first-class concept of a
 document in the database.  It might save some trouble managing
 parent/child relations, versioning, things like that.

Methinks this needs a *lot* more specific description of what you mean
by document.

The thought that is occurring to me in this context is that the
document is simply an image (.png, .jpeg, .pdf) of a paper document
which might get associated with some of the 'business transactions' in
the database.

Thus, I'd be happy to be able to capture images of invoices, receipts,
and such, and associate them with the highly structured data for the
accounting transactions that they are associated with.

I'm not sure that this is the same thing that you are thinking of.  I
suspect that you might be thinking of a document as being a loosely
structured set of data.  Though with the similarity that such
documents would get associated with the highly structured accounting
transaction data that they relate to.

It's not a ludicrously bad idea to have a series of supplementary data
tables that can get tied to transactions...

create table supplementary_bitmap (
 h_id serial primary key,
 created_on timestamptz not null default now(),
 metadata text not null,
 bitmap bytea
);
create table supplementary_xml (
 x_id serial primary key,
 created_on timestamptz not null default now(),
 metadata text not null,
 data xml
);
create table supplementary_hstore (
 hs_id serial primary key,
 created_on timestamptz not null default now(),
 metadata text not null,
 data hstore
);

And add some optional references to these to some of your tables.

That doesn't notably lend itself to doing a lot of work with the
relationships between bits of supplementary data.

There's not much that I *can* do if I'm attaching images of pictures I
took of invoices with my phone; there's not much about that that's
amenable to further automatic analysis.  It's still pretty useful, if
someone wants some proof that there was an invoice; I can produce a
copy that's tied to the transaction.  That's rather less than OODBMS
:-).
--
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] poll: CHECK TRIGGER?

2012-04-04 Thread Heikki Linnakangas

On 04.04.2012 19:32, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

I don't think I'm getting my point across by explaining, so here's a
modified version of the patch that does what I was trying to say.


Minor side point: some of the diff noise in this patch comes from
s/copy_plpgsql_datum/plpgsql_copy_plpgsql_datum/, which seems entirely
useless.  The name already contains plpgsql, and even if it didn't,
there is no particular reason for plpgsql to worry about polluting
global symbol namespace.  Nothing else resolves against its symbols
anyway, at least not on any platform we claim to support.  I would
therefore also argue against the other renamings like
s/exec_move_row/plpgsql_exec_move_row/.


Agreed. Looking closer, I'm not sure we even need to expose 
exec_move_row() to pl_check.c. It's only used to initialize row-type 
function arguments to NULL. But variables that are not explicitly 
initialized are NULL anyway, and the checker shouldn't use the values 
stored in variables for anything, so I believe that initialization in 
function_check() can be replaced with something much simpler or removed 
altogether.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] poll: CHECK TRIGGER?

2012-04-04 Thread Pavel Stehule
2012/4/4 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 04.04.2012 19:32, Tom Lane wrote:

 Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

 I don't think I'm getting my point across by explaining, so here's a
 modified version of the patch that does what I was trying to say.


 Minor side point: some of the diff noise in this patch comes from
 s/copy_plpgsql_datum/plpgsql_copy_plpgsql_datum/, which seems entirely
 useless.  The name already contains plpgsql, and even if it didn't,
 there is no particular reason for plpgsql to worry about polluting
 global symbol namespace.  Nothing else resolves against its symbols
 anyway, at least not on any platform we claim to support.  I would
 therefore also argue against the other renamings like
 s/exec_move_row/plpgsql_exec_move_row/.


 Agreed. Looking closer, I'm not sure we even need to expose exec_move_row()
 to pl_check.c. It's only used to initialize row-type function arguments to
 NULL. But variables that are not explicitly initialized are NULL anyway, and
 the checker shouldn't use the values stored in variables for anything, so I
 believe that initialization in function_check() can be replaced with
 something much simpler or removed altogether.

+1

Pavel



 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] invalid search_path complaints

2012-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 4, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 You're getting squishy on me ...

 My feeling on this is that it's OK to warn if the search_path is set
 to something that's not valid, and it might also be OK to not warn.
 Right now we emit a NOTICE and I don't feel terribly upset about that;
 even if I did, I don't know that it's really worth breaking backward
 compatibility for.

 The WARNING on login is more troubling to me, because it's
 misdirected.  The warning is the result either of a setting that was
 never valid in the first place, or of a setting that became invalid
 when a schema was renamed or dropped.  The people responsible for the
 breakage are not necessarily the same people being warned; the people
 being warned may not even have power to fix the problem.

Well, we don't have any ability to nag the people responsible,
assuming that those really are different people.  The real question to
me is whether we should produce no warning whatsoever despite the fact
that the setting is failing to operate as intended.  That's not
particularly cool either IMO.  I answered a question in pgsql-novice
just a couple hours ago that I think demonstrates very well the problems
with failing to issue any message about something not doing what it
could be expected to:
http://archives.postgresql.org/pgsql-novice/2012-04/msg8.php

Now, Scott's comment seems to me to offer a principled way out of this:
if we define the intended semantics of search_path as being similar
to the traditional understanding of Unix PATH, then it's not an error
or even unexpected to have references to nonexistent schemas in there.
But as soon as you say I want warnings in some cases, I think we have
a mess that nobody is ever going to be happy with, because there will
never be a clear and correct definition of which cases should get
warnings.

In any case, I think we might be converging on an agreement that the
setting should be *applied* if syntactically correct, whether or not
we are agreed about producing a NOTICE or WARNING for unknown schemas.
If I have not lost track, that is what happened before 9.1 but is not
what is happening now, and we should change it to fix that compatibility
break, independently of the question about messaging.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Andrew Dunstan



On 04/04/2012 12:13 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 04/02/2012 01:03 PM, Tom Lane wrote:

When I said list, I meant a List *.  No fixed size.

Ok, like this?

I think this could use a bit of editorialization (I don't think the
stripe terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed?  In the worst case a transient load peak could result in
a long list that permanently adds search overhead.  Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.


Me either. The logic could possibly be something simple when we free a 
node like while the list tail is an available node prune the tail. But 
as you say, it might not be worth it.





Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way.  If you don't mind I'll pick this up
and commit whichever way turns out better.





Go for it.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
 , everybody's next few CLOG requests hit some other
 buffer but eventually the long-I/O-in-progress buffer again becomes
 least recently used and the next CLOG eviction causes a second backend
 to begin waiting for that buffer.

This still sounds like evidence that the slru is just too small for
this transaction rate. Otherwise there would be some other buffer that
would be accessed similarly infrequently.

Your fix sounds right to me but I would hope it should be fixing
something that would only happen rarely, not every time theres a
write. It sounds like the slru is thrashing quite a bit more than the
code anticipates.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/04/2012 12:13 PM, Tom Lane wrote:
 Does anyone feel that it's a bad idea that list entries are never
 reclaimed?  In the worst case a transient load peak could result in
 a long list that permanently adds search overhead.  Not sure if it's
 worth the extra complexity to delete a list cell that's no longer
 needed, rather than leaving it present and empty.

 Me either. The logic could possibly be something simple when we free a 
 node like while the list tail is an available node prune the tail. But 
 as you say, it might not be worth it.

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry.  If the removal is
conditional then you can't do that and you end up with the complications
of both methods.  Anyway I've not tried to code it yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
 3. I noticed that the blocking described by slru.c:311 blocked by
 slru.c:405 seemed to be clumpy - I would get a bunch of messages
 about that all at once.  This makes me wonder if the SLRU machinery is
 occasionally making a real bad decision about what page to evict, and
 then everybody piles up waiting for that page to be read back in.
 That is sheer hand-waving at this point and might be complete bologna,
 but I'm hoping to find time to investigate further.

Hm, actually, isn't this something your patch would cause? Is it
possible the clumpy ones are the 129 minus 54 additional blocking on this
lock in the patched code? Did it do that in the unpatched code? And
did it do it with fewer than 16 clients?

Because there are only 16 slru pages and 64 clients so occasionally 16
of clients will all be reading a page in and someone will try to flush
the very hottest page from the slru. Or I suppose it would happen
sooner as soon as someone gets pushed up into the working set and hits
a hot enough page.

i didn't actually read the patch. I assume you covered the case where
all the pages are in I/O and so there are no eligible pages to flush?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Alvaro Herrera

Excerpts from Greg Stark's message of mié abr 04 14:11:29 -0300 2012:
 On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
  , everybody's next few CLOG requests hit some other
  buffer but eventually the long-I/O-in-progress buffer again becomes
  least recently used and the next CLOG eviction causes a second backend
  to begin waiting for that buffer.
 
 This still sounds like evidence that the slru is just too small for
 this transaction rate.

What this statement means to me is that the number of slru buffers
should be configurable, not compile-time fixed.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Kyotaro HORIGUCHI
Hello, This is the new version of dblink patch.

- Calling dblink_is_busy prevents row processor from being used.

- some PGresult leak fixed.

- Rebased to current head.

 A hack on top of that hack would be to collect the data into a
 tuplestore that contains all text columns, and then convert to the
 correct rowtype during dblink_get_result, but that seems rather ugly
 and not terribly high-performance.

 What I'm currently thinking we should do is just use the old method
 for async queries, and only optimize the synchronous case.

Ok, I agree with you except for performance issue. I give up to use
row processor for async query with dblink_is_busy called.

 I thought for awhile that this might represent a generic deficiency
 in the whole concept of a row processor, but probably it's mostly
 down to dblink's rather bizarre API.  It would be unusual I think for
 people to want a row processor that couldn't know what to do until
 after the entire query result is received.

I hope so. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


dblink_rowproc_20120405.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] man pages for contrib programs

2012-04-04 Thread Peter Eisentraut
... would be really nice to have.  Especially pgbench and pg_upgrade for
me, but it would be useful to have man pages for everything.

Unfortunately, we can't just replace the sect1's in in Appendix F [0]
with refentry's, because the content model of DocBook doesn't allow
that.  (You can't have a mixed sequence of sect1 and refentry, only one
or the other.)

[0] http://www.postgresql.org/docs/devel/static/contrib.html

Which leads to a somewhat related point.  The current content listing in
Appendix F mixes extensions (backend modules) with client and server
programs.  Who can guess which is which here:

...
pg_archivecleanup
pgbench
pg_buffercache
pgcrypto
pg_freespacemap
pgrowlocks
pg_standby
pg_stat_statements
...

I think it would be useful to split this up into three sections:

F.1. Extensions
F.2. Client Applications
F.3. Server Applications

where the first looks like now and the other two contain the refentry
pages.

(This could also serve as a hint to packagers to split their -contrib
packages, so that say installing pgbench doesn't pull in a boatload of
other stuff.)

We could also consider making two separate appendixes.  Maybe that would
result in a better table of contents.

Comments?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Tom Lane
I wrote:
 The idea I had in mind was to compensate for adding list-removal logic
 by getting rid of the concept of an unused entry.  If the removal is
 conditional then you can't do that and you end up with the complications
 of both methods.  Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not.  So I
committed the patch with just some cosmetic cleanups.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp writes:
 What I'm currently thinking we should do is just use the old method
 for async queries, and only optimize the synchronous case.

 Ok, I agree with you except for performance issue. I give up to use
 row processor for async query with dblink_is_busy called.

Yeah, that seems like a reasonable idea.


Given the lack of consensus around the suspension API, maybe the best
way to get the underlying libpq patch to a committable state is to take
it out --- that is, remove the return zero option for row processors.
Since we don't have a test case for it in dblink, it's hard to escape
the feeling that we may be expending a lot of effort for something that
nobody really wants, and/or misdesigning it for lack of a concrete use
case.  Is anybody going to be really unhappy if that part of the patch
gets left behind?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-04-04 Thread Boszormenyi Zoltan

2012-04-04 17:12 keltezéssel, Boszormenyi Zoltan írta:

2012-04-04 16:22 keltezéssel, Boszormenyi Zoltan írta:

2012-04-04 15:17 keltezéssel, Boszormenyi Zoltan írta:

Hi,

2012-04-04 12:30 keltezéssel, Boszormenyi Zoltan írta:

Hi,

attached is a patch to implement a framework to simplify and
correctly nest multiplexing more than two timeout sources
into the same SIGALRM signal handler.

The framework is using a new internal API for timeouts:

bool enable_timeout(TimeoutName tn, int delayms);
bool disable_timeout(TimeoutName tn, bool keep_indicator);
bool disable_all_timeouts(bool keep_indicators);

A timeout source has these features to allow different initialization,
cleanup and check functions and rescheduling:

typedef void (*timeout_init)(TimestampTz, TimestampTz);
typedef void (*timeout_destroy)(bool);
typedef bool (*timeout_check)(void);
typedef TimestampTz (*timeout_start)(void);

typedef struct {
TimeoutName index;
boolresched_next;
timeout_inittimeout_init;
timeout_destroy timeout_destroy;
timeout_check   timeout_check;
timeout_start   timeout_start;
TimestampTz fin_time;
} timeout_params;

This makes it possible to differentiate between the standby and
statement timeouts, regular deadlock and standby deadlock using
the same signal handler function.

And finally, this makes it possible to implement the lock_timeout
feature that we at Cybertec implemented more than 2 years ago.

The patch also adds two new tests into prepared_xacts.sql to trigger
the lock_timeout instead of statement_timeout.

Documentation and extensive comments are included.


Second version. Every timeout-related functions are now in a separate
source file, src/backend/storage/timeout.c with accessor functions.
There are no related global variables anymore, only the GUCs.


3rd and (for now) final version.


I lied. This is the final one. I fixed a typo in the documentation
and replaced timeout_start_time (previously static to proc.c)
with get_timeout_start(DEADLOCK_TIMEOUT). This one should
have happened in the second version.


Tidied comments, the time checks in
Check*() functions and function order in timeout.c.





Best regards,
Zoltán Böszörményi


One comment for testers: all the timeout GUC values are given in
milliseconds, the kernel interface (setitimer) and TimestampTz uses
microseconds.

The transaction that locks is inherently a read/write one and by the time
the code reaches ProcSleep(), at least a few tens of microseconds has
already passed since the start of the statement even on 3GHz+ CPUs.

Not to mention that computers nowadays have high precision timers
and OSs running on them utilitize those. So, the time returned by
GetCurrentStatementStartTimestamp() will certainly be different from
GetCurrentTimestamp(). This means that the timeouts' fin_time will also
be different.

Long story short, using the same value for statement_timeout and
lock_timeout (or deadlock_timeout for that matter) means that
statement_timeout will trigger first. The story is different only on
a combination of a fast CPU and an OS with greater-then-millisecond
timer resolution.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



Re: [HACKERS] man pages for contrib programs

2012-04-04 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mié abr 04 15:53:20 -0300 2012:
 ... would be really nice to have.  Especially pgbench and pg_upgrade for
 me, but it would be useful to have man pages for everything.
 
 Unfortunately, we can't just replace the sect1's in in Appendix F [0]
 with refentry's, because the content model of DocBook doesn't allow
 that.  (You can't have a mixed sequence of sect1 and refentry, only one
 or the other.)

Hm, would it work to have something like 
sect1 pgbench; /sect1 refentry pgbench; /refentry
so that we get both?  Probably with some conditional to avoid duplicate
output in html/pdf.  (Why isn't this a problem for the SPI pages or
dblink?)

 I think it would be useful to split this up into three sections:
 
 F.1. Extensions
 F.2. Client Applications
 F.3. Server Applications
 
 where the first looks like now and the other two contain the refentry
 pages.

+1, but is there something that would not fit in either category?  Not
sure if we have a SGML page for init-scripts for instance.

If you're going to monkey around in this general, please also look at
the README.  It should probably just go away.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-04-04 Thread Alvaro Herrera

I think this patch is doing two things: first touching infrastructure
stuff and then adding lock_timeout on top of that.  Would it work to
split the patch in two pieces?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel pg_dump

2012-04-04 Thread Alvaro Herrera

Excerpts from Joachim Wieland's message of mié abr 04 15:43:53 -0300 2012:
 On Wed, Apr 4, 2012 at 8:27 AM, Andrew Dunstan and...@dunslane.net wrote:
  Sure, PrintStatus is just there for now to see what's going on. My
  plan was to remove it entirely in the final patch.
 
  We need that final patch NOW, I think. There is very little time for this
  before it will be too late for 9.2.
 
 Here are updated patches:
 
 - An empty directory for the directory archive format is okay now.
 - Removed PrintStatus().

In general I'm not so sure that removing debugging printouts is the best
thing to do.  They might be helpful if in the future we continue to
rework this code.  How about a #define that turns them into empty
statements instead, for example?  I didn't read carefully to see if the
PrintStatus() calls are reasonable to keep, though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] poll: CHECK TRIGGER?

2012-04-04 Thread Pavel Stehule
2012/4/4 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 30.03.2012 12:36, Pavel Stehule wrote:

 2012/3/28 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:

 In prepare_expr(), you use a subtransaction to catch any ERRORs that
 happen

 during parsing the expression. That's a good idea, and I think many of
 the
 check_* functions could be greatly simplified by adopting a similar
 approach. Just ereport() any errors you find, and catch them at the
 appropriate level, appending the error to the output string. Your current
 approach of returning true/false depending on whether there was any
 errors
 seems tedious.


 It cannot be implemented in AST interpret. Without removing some
 requested functionality - fatal_errors.


 I don't think I'm getting my point across by explaining, so here's a
 modified version of the patch that does what I was trying to say. The
 general pattern of the checker functions has been changed. Instead of
 returning a boolean indicating error or no error, with checks for
 fatal_errors scattered around them, the internal checker functions now
 return nothing. Any errors are reported with ereport(), and there is a
 PG_TRY/CATCH block in a couple of carefully chosen places: in check_stmt(),
 so that if you get an error while checking a statement, you continue
 checking on the next statement, and in check_assignment() which is now used
 by check_expr() and a few other helper functions to basically check all
 expressions and SQL statements.

 IMHO this makes the code much more readable, now that the control logic of
 when to return and when to continue is largely gone. A lot of other cleanup
 still needs to be done, I just wanted to demonstrate this ereport+try/catch
 idea with this patch.

I checked your idea and it should to work.

What other cleanup (without mentioned in previous mails) do you think?

Regards

Pavel



 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] man pages for contrib programs

2012-04-04 Thread Peter Eisentraut
On ons, 2012-04-04 at 16:29 -0300, Alvaro Herrera wrote:
  Unfortunately, we can't just replace the sect1's in in Appendix F [0]
  with refentry's, because the content model of DocBook doesn't allow
  that.  (You can't have a mixed sequence of sect1 and refentry, only one
  or the other.)
 
 Hm, would it work to have something like 
 sect1 pgbench; /sect1 refentry pgbench; /refentry
 so that we get both?  Probably with some conditional to avoid duplicate
 output in html/pdf.

I don't think I follow what you are trying to do there.

 (Why isn't this a problem for the SPI pages or dblink?)

The don't mix sects and refentries at the same level.

  I think it would be useful to split this up into three sections:
  
  F.1. Extensions
  F.2. Client Applications
  F.3. Server Applications
  
  where the first looks like now and the other two contain the refentry
  pages.
 
 +1, but is there something that would not fit in either category?  Not
 sure if we have a SGML page for init-scripts for instance.

No, everything we have documented fits in those categories.

 If you're going to monkey around in this general, please also look at
 the README.  It should probably just go away.

Indeed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 1:11 PM, Greg Stark st...@mit.edu wrote:
 On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
 , everybody's next few CLOG requests hit some other
 buffer but eventually the long-I/O-in-progress buffer again becomes
 least recently used and the next CLOG eviction causes a second backend
 to begin waiting for that buffer.

 This still sounds like evidence that the slru is just too small for
 this transaction rate. Otherwise there would be some other buffer that
 would be accessed similarly infrequently.

 Your fix sounds right to me but I would hope it should be fixing
 something that would only happen rarely, not every time theres a
 write. It sounds like the slru is thrashing quite a bit more than the
 code anticipates.

Yes, the SLRU is thrashing heavily.  In this configuration, there are
32 CLOG buffers.  I just added an elog() every time we replace a
buffer.  Here's a sample of how often that's firing, by second, on
this test (pgbench with 32 clients):

   4191 19:54:21
   4540 19:54:22
   4295 19:54:23
   3931 19:54:24
   4294 19:54:25
478 19:54:26
 72 19:54:27
818 19:54:28
876 19:54:29
   1498 19:54:30
   3526 19:54:31
   1874 19:54:32
551 19:54:35
   3746 19:54:36
   3846 19:54:37
   3803 19:54:38
   3593 19:54:39
   3016 19:54:40
   3233 19:54:41
   3190 19:54:42
   3291 19:54:43
   5068 19:54:44
   3877 19:54:45
  2 19:54:46
   1678 19:54:47
   1005 19:54:48
947 19:54:49
   1007 19:54:50
921 19:54:51
931 19:54:52
147 19:54:53
   1103 19:54:54
898 19:54:55
674 19:54:56
274 19:54:57
   1081 19:54:58
   1874 19:54:59
   1067 19:55:00
328 19:55:01
   1507 19:55:02
   3735 19:55:03
138 19:55:04
  1 19:55:05
   2667 19:55:09
   5373 19:55:10
   5175 19:55:11
   5062 19:55:12

So, yes, we're thrashing CLOG pretty hard.  But, I think that's a
mostly separate issue.  Reworking the SLRU code so that it can
efficiently handle a larger number of buffers is probably a good thing
to do, but unless we're planning to make the CLOG SLRU so large that
we NEVER have multiple people trying to replace buffers at the same
time, this fix is still necessary and appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

diff --git a/src/backend/access/transam/slru.c
b/src/backend/access/transam/slru.c
index 3049e01..6f92679 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -390,6 +390,9 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
return slotno;
}

+   elog(LOG, SLRU %d reading page %d, shared-ControlLock,
+   pageno);
+
/* We found no match; assert we selected a freeable slot */
Assert(shared-page_status[slotno] == SLRU_PAGE_EMPTY ||
   (shared-page_status[slotno] == SLRU_PAGE_VALID 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgres long options without value

2012-04-04 Thread Peter Eisentraut
Every so often I find myself trying to write

postgres -D ... --ssl

or

postgres -D ... --debug-print-plan

which fails, because you need to write --ssl=on or
--debug-print-plan=true etc.

Have others had the same experience?  Would it be worth supporting the
case without value to default to on/true?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] man pages for contrib programs

2012-04-04 Thread Thom Brown
On 4 April 2012 19:53, Peter Eisentraut pete...@gmx.net wrote:
 ... would be really nice to have.  Especially pgbench and pg_upgrade for
 me, but it would be useful to have man pages for everything.

 Unfortunately, we can't just replace the sect1's in in Appendix F [0]
 with refentry's, because the content model of DocBook doesn't allow
 that.  (You can't have a mixed sequence of sect1 and refentry, only one
 or the other.)

 [0] http://www.postgresql.org/docs/devel/static/contrib.html

 Which leads to a somewhat related point.  The current content listing in
 Appendix F mixes extensions (backend modules) with client and server
 programs.  Who can guess which is which here:

 ...
 pg_archivecleanup
 pgbench
 pg_buffercache
 pgcrypto
 pg_freespacemap
 pgrowlocks
 pg_standby
 pg_stat_statements
 ...

 I think it would be useful to split this up into three sections:

 F.1. Extensions
 F.2. Client Applications
 F.3. Server Applications

This is something I raised previously, but it didn't really attract
much comment: http://archives.postgresql.org/pgsql-hackers/2011-10/msg00781.php

+1 to anything that separates these out.  Cramming them into one list
like we currently have is confusing.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching to Homebrew as recommended Mac install?

2012-04-04 Thread Jay Levitt

Dave Page wrote:

Exactly - which is why I was objecting to recommending a distribution
of PostgreSQL that came in a packaging system that we were told
changed /usr/local to be world writeable to avoid the use/annoyance of
the standard security measures on the platform.


We... that's not exactly what happened:

I originally wrote:

POSSIBLE OBJECTIONS/PREREQUISITES

1. homebrew installs everything under /usr/local and makes that user-writeable.


So. :)

Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: bytea_agg

2012-04-04 Thread Peter Eisentraut
On fre, 2011-12-23 at 19:51 +0200, Peter Eisentraut wrote:
 On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
  this patch adds a bytea_agg aggregation.
  
  It allow fast bytea concatetation.
 
 Why not call it string_agg?  All the function names are the same between
 text and bytea (e.g., ||, substr, position, length).  It would be nice
 not to introduce arbitrary differences.

Here is a patch to do the renaming.  As it stands, it fails the
opr_sanity regression test, because that complains that there are now
two aggregate functions string_agg with different number of arguments.
It seems to me that that test should really only complain if the common
argument types of the two aggregates are the same, correct?

diff --git i/doc/src/sgml/func.sgml w/doc/src/sgml/func.sgml
index 34fea16..393cfcd 100644
--- i/doc/src/sgml/func.sgml
+++ w/doc/src/sgml/func.sgml
@@ -10963,10 +10963,10 @@ SELECT NULLIF(value, '(none)') ...
  row
   entry
indexterm
-primarybytea_agg/primary
+primarystring_agg/primary
/indexterm
function
- bytea_agg(replaceable class=parameterexpression/replaceable)
+ string_agg(replaceable class=parameterexpression/replaceable)
/function
   /entry
   entry
diff --git i/src/backend/utils/adt/varlena.c w/src/backend/utils/adt/varlena.c
index 65e9af8..c6b351e 100644
--- i/src/backend/utils/adt/varlena.c
+++ w/src/backend/utils/adt/varlena.c
@@ -397,7 +397,7 @@ byteasend(PG_FUNCTION_ARGS)
 }
 
 Datum
-bytea_agg_transfn(PG_FUNCTION_ARGS)
+bytea_string_agg_transfn(PG_FUNCTION_ARGS)
 {
 	StringInfo	state;
 
@@ -415,14 +415,14 @@ bytea_agg_transfn(PG_FUNCTION_ARGS)
 	}
 
 	/*
-	 * The transition type for bytea_agg() is declared to be internal,
+	 * The transition type for string_agg() is declared to be internal,
 	 * which is a pass-by-value type the same size as a pointer.
 	 */
 	PG_RETURN_POINTER(state);
 }
 
 Datum
-bytea_agg_finalfn(PG_FUNCTION_ARGS)
+bytea_string_agg_finalfn(PG_FUNCTION_ARGS)
 {
 	StringInfo	state;
 
diff --git i/src/include/catalog/pg_aggregate.h w/src/include/catalog/pg_aggregate.h
index adda07c..461772c 100644
--- i/src/include/catalog/pg_aggregate.h
+++ w/src/include/catalog/pg_aggregate.h
@@ -229,7 +229,7 @@ DATA(insert ( 2335	array_agg_transfn	array_agg_finalfn		0	2281	_null_ ));
 DATA(insert ( 3538	string_agg_transfn	string_agg_finalfn		0	2281	_null_ ));
 
 /* bytea */
-DATA(insert ( 3545	bytea_agg_transfn	bytea_agg_finalfn		0	2281	_null_ ));
+DATA(insert ( 3545	bytea_string_agg_transfn	bytea_string_agg_finalfn		0	2281	_null_ ));
 
 /*
  * prototypes for functions in pg_aggregate.c
diff --git i/src/include/catalog/pg_proc.h w/src/include/catalog/pg_proc.h
index 49b0754..e1962fe 100644
--- i/src/include/catalog/pg_proc.h
+++ w/src/include/catalog/pg_proc.h
@@ -2433,11 +2433,11 @@ DATA(insert OID = 3536 (  string_agg_finalfn		PGNSP PGUID 12 1 0 0 0 f f f f f f
 DESCR(aggregate final function);
 DATA(insert OID = 3538 (  string_aggPGNSP PGUID 12 1 0 0 0 t f f f f f i 2 0 25 25 25 _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
 DESCR(concatenate aggregate input into a string);
-DATA(insert OID = 3543 (  bytea_agg_transfn		PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 2281 17 _null_ _null_ _null_ _null_ bytea_agg_transfn _null_ _null_ _null_ ));
+DATA(insert OID = 3543 (  bytea_string_agg_transfn	PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 2281 17 _null_ _null_ _null_ _null_ bytea_string_agg_transfn _null_ _null_ _null_ ));
 DESCR(aggregate transition function);
-DATA(insert OID = 3544 (  bytea_agg_finalfn		PGNSP PGUID 12 1 0 0 0 f f f f f f i 1 0 17 2281 _null_ _null_ _null_ _null_ bytea_agg_finalfn _null_ _null_ _null_ ));
+DATA(insert OID = 3544 (  bytea_string_agg_finalfn	PGNSP PGUID 12 1 0 0 0 f f f f f f i 1 0 17 2281 _null_ _null_ _null_ _null_ bytea_string_agg_finalfn _null_ _null_ _null_ ));
 DESCR(aggregate final function);
-DATA(insert OID = 3545 (  bytea_aggPGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 17 17 _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+DATA(insert OID = 3545 (  string_aggPGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 17 17 _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
 DESCR(concatenate aggregate input into a bytea);
 
 /* To ASCII conversion */
diff --git i/src/include/utils/builtins.h w/src/include/utils/builtins.h
index 9fda7ad..201b23e 100644
--- i/src/include/utils/builtins.h
+++ w/src/include/utils/builtins.h
@@ -771,8 +771,8 @@ extern Datum unknownsend(PG_FUNCTION_ARGS);
 
 extern Datum pg_column_size(PG_FUNCTION_ARGS);
 
-extern Datum bytea_agg_transfn(PG_FUNCTION_ARGS);
-extern Datum bytea_agg_finalfn(PG_FUNCTION_ARGS);
+extern Datum bytea_string_agg_transfn(PG_FUNCTION_ARGS);
+extern Datum bytea_string_agg_finalfn(PG_FUNCTION_ARGS);
 extern Datum string_agg_transfn(PG_FUNCTION_ARGS);
 extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
 
diff --git 

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Simon Riggs
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:

 I'll do some testing to try to confirm whether this theory is correct
 and whether the above fix helps.

 Very interesting work.


 Having performed this investigation, I've discovered a couple of
 interesting things.  First, SlruRecentlyUsed() is an ineffective way
 of keeping a page from getting reused, because it's called extremely
 frequently, and on these high-velocity tests it takes almost no time
 at all for the most recently used buffer to become the least recently
 used buffer.

Measurement?

 Therefore, SlruRecentlyUsed() doesn't prevent the lock
 pile-up.  In the unpatched code, once a long buffer I/O starts,
 everybody immediately goes into the tank until the I/O finishes.  If
 you patch the code so that the page is marked recently-used before
 beginning the I/O, everybody's next few CLOG requests hit some other
 buffer but eventually the long-I/O-in-progress buffer again becomes
 least recently used and the next CLOG eviction causes a second backend
 to begin waiting for that buffer.  Lather, rinse, repeat, until
 literally every backend is once again waiting on that buffer I/O.  You
 still get the same problem; it just takes slightly longer to develop.

Sounds believable, I just want to make sure we have measured things.

 On reflection, it seems to me that the right fix here is to make
 SlruSelectLRUPage() to avoid selecting a page on which an I/O is
 already in progress.  In general, those I/Os are all writes.  We don't
 end up waiting for reads because all the old CLOG pages we might want
 to read are still in the OS cache.  So reads complete quickly, on this
 test.

I believe that, but if all buffers are I/O busy we should avoid
waiting on a write I/O if possible.

  Writes take a long time, because there we have to actually get
 the data down to disk, and the disk is busy.  But there's no reason
 for a backend doing a replacement to wait for either a read or a write
 that is in progress: once the read or write completes, we're going to
 loop around and repeat the buffer selection process, and most likely
 pick a buffer completely unrelated to the one whose I/O we waited for.
  We might as well just skip the wait and select that other buffer
 immediately.  The attached patch implements that.

That seems much smarter. I'm thinking this should be back patched
because it appears to be fairly major, so I'm asking for some more
certainty that every thing you say here is valid. No doubt much of it
is valid, but that's not enough.

 Applying this patch does in fact eliminate the stalls.

I'd like to see that measured from a user perspective. It would be
good to see the response time distribution for run with and without
the patch.

 2. I think we might want to revisit Simon's idea of background-writing
 SLRU pages.

Agreed. No longer anywhere near as important. I'll take a little
credit for identifying the right bottleneck, since you weren't a
believer before.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Simon Riggs
On Wed, Apr 4, 2012 at 6:25 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Greg Stark's message of mié abr 04 14:11:29 -0300 2012:
 On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
  , everybody's next few CLOG requests hit some other
  buffer but eventually the long-I/O-in-progress buffer again becomes
  least recently used and the next CLOG eviction causes a second backend
  to begin waiting for that buffer.

 This still sounds like evidence that the slru is just too small for
 this transaction rate.

 What this statement means to me is that the number of slru buffers
 should be configurable, not compile-time fixed.

I think the compile time fixed allows it to be loop unrolled and
executed in parallel.

Using a parameter makes the lookups slower. Worth testing. Life changes.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Simon Riggs
On Wed, Apr 4, 2012 at 9:05 PM, Robert Haas robertmh...@gmail.com wrote:

 Yes, the SLRU is thrashing heavily.  In this configuration, there are
 32 CLOG buffers.  I just added an elog() every time we replace a
 buffer.  Here's a sample of how often that's firing, by second, on
 this test (pgbench with 32 clients):

Interesting. You've spoken at length how this hardly ever happens and
so this can't have any performance effect. That was the reason for
kicking out my patch addressing clog history, wasn't it?

Why is this pgbench run accessing so much unhinted data that is  1
million transactions old? Do you believe those numbers? Looks weird.

Perhaps we should retest the clog history patch?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres long options without value

2012-04-04 Thread Euler Taveira
On 04-04-2012 17:07, Peter Eisentraut wrote:
 postgres -D ... --debug-print-plan
 
 which fails, because you need to write --ssl=on or
 --debug-print-plan=true etc.
 
 Have others had the same experience?  Would it be worth supporting the
 case without value to default to on/true?
 
Please, don't do it. You can be fooled when we change a parameter default
value (specially if you have it in a script that is used in different
versions) from major versions.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Andrew Dunstan



On 04/04/2012 03:09 PM, Tom Lane wrote:

I wrote:

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry.  If the removal is
conditional then you can't do that and you end up with the complications
of both methods.  Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not.  So I
committed the patch with just some cosmetic cleanups.




OK, thanks for doing this.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Marko Kreen
On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the lack of consensus around the suspension API, maybe the best
 way to get the underlying libpq patch to a committable state is to take
 it out --- that is, remove the return zero option for row processors.
 Since we don't have a test case for it in dblink, it's hard to escape
 the feeling that we may be expending a lot of effort for something that
 nobody really wants, and/or misdesigning it for lack of a concrete use
 case.  Is anybody going to be really unhappy if that part of the patch
 gets left behind?

Agreed.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Kyotaro HORIGUCHI
I'm afraid not re-initializing materialize_needed for the next query
in the latest dblink patch.
I will confirm that and send the another one if needed in a few hours.

# I need to catch the train I usually get on..

 Hello, This is the new version of dblink patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Why is this pgbench run accessing so much unhinted data that is  1
 million transactions old? Do you believe those numbers? Looks weird.

I think this is in the nature of the workload pgbench does. Because
the updates are uniformly distributed, not concentrated 90% in 10% of
the buffers like most real-world systems, (and I believe pgbench only
does index lookups) the second time a tuple is looked at is going to
average N/2 transactions later where N is the number of tuples. Given
a scale factor of 300 that's 15 million transactions.

More aggressively hinting other tuples on the page that we have no
other business looking at might help, though that would require extra
finess to avoid causing extra clog reads. Presumably you would only
want to hint other tuples whose xids were in clog pages that were
actually in memory currently.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 9:05 PM, Robert Haas robertmh...@gmail.com wrote:
 Here's a sample of how often that's firing, by second, on
 this test (pgbench with 32 clients):

   4191 19:54:21
   4540 19:54:22

Hm, so if that's evenly spread out that's 1/4ms between slru flushes
and if each flush takes 5-10ms that's going to be 20-40 flushes
concurrently going on.

I'm curious to see a distribution of how many flushes are already
concurrently happening whenever a flush is initiated. This should be
possible to get by counting the number of pages that were skipped in
your patch as it went through the slru.

Also, oops, sorry. I mixed up the 32 clog buffers with the 16 files
that the slru.c remembers during a flush to fsync later. I still don't
understand why it doesn't just allocate enough space to remember to
fsync the worst case which is one file per clog buffer though which
would only be twice as many.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given the lack of consensus around the suspension API, maybe the best
 way to get the underlying libpq patch to a committable state is to take
 it out --- that is, remove the return zero option for row processors.

 Agreed.

Done that way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp writes:
 I'm afraid not re-initializing materialize_needed for the next query
 in the latest dblink patch.
 I will confirm that and send the another one if needed in a few hours.

I've committed a revised version of the previous patch.  I'm not sure
that the case of dblink_is_busy not being used is interesting enough
to justify contorting the logic, and I'm worried about introducing
corner case bugs for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: bytea_agg

2012-04-04 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-12-23 at 19:51 +0200, Peter Eisentraut wrote:
 On ons, 2011-12-21 at 11:04 +0100, Pavel Stehule wrote:
 this patch adds a bytea_agg aggregation.

 Why not call it string_agg?

 Here is a patch to do the renaming.  As it stands, it fails the
 opr_sanity regression test, because that complains that there are now
 two aggregate functions string_agg with different number of arguments.
 It seems to me that that test should really only complain if the common
 argument types of the two aggregates are the same, correct?

Uh, no.  That test is there for good and sufficient reasons, as per its
comment:

-- Check that there are not aggregates with the same name and different
-- numbers of arguments.  While not technically wrong, we have a project policy
-- to avoid this because it opens the door for confusion in connection with
-- ORDER BY: novices frequently put the ORDER BY in the wrong place.
-- See the fate of the single-argument form of string_agg() for history.

The renaming you propose would only be acceptable to those who have
forgotten that history.  I haven't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Why is this pgbench run accessing so much unhinted data that is  1
 million transactions old? Do you believe those numbers? Looks weird.

 I think this is in the nature of the workload pgbench does. Because
 the updates are uniformly distributed, not concentrated 90% in 10% of
 the buffers like most real-world systems, (and I believe pgbench only
 does index lookups) the second time a tuple is looked at is going to
 average N/2 transactions later where N is the number of tuples.

That's a good point, and it makes me wonder whether pgbench is the right
test case to be micro-optimizing around.  It would be a good idea to at
least compare the numbers for something with more locality of reference.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Josh Berkus
On 4/4/12 4:02 PM, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
 On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Why is this pgbench run accessing so much unhinted data that is  1
 million transactions old? Do you believe those numbers? Looks weird.
 
 I think this is in the nature of the workload pgbench does. Because
 the updates are uniformly distributed, not concentrated 90% in 10% of
 the buffers like most real-world systems, (and I believe pgbench only
 does index lookups) the second time a tuple is looked at is going to
 average N/2 transactions later where N is the number of tuples.
 
 That's a good point, and it makes me wonder whether pgbench is the right
 test case to be micro-optimizing around.  It would be a good idea to at
 least compare the numbers for something with more locality of reference.

Jignesh, would DVDstore help for this?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster compression, again

2012-04-04 Thread Daniel Farina
On Tue, Apr 3, 2012 at 7:29 AM, Huchev hugochevr...@gmail.com wrote:
 For a C implementation, it could interesting to consider LZ4 algorithm, since
 it is written natively in this language. In contrast, Snappy has been ported
 to C by Andy from the original C++ Google code, which lso translate into
 less extensive usage and tests.

From what I can tell, the C implementation of snappy has more tests
than this LZ4 implementation, including a fuzz tester.  It's a
maintained part of Linux as well, and used for btrfs --- this is why
it was ported.  The high compression version of LZ4 is apparently
LGPL.  And, finally, there is the issue of patents: snappy has several
multi-billion dollar companies that can be held liable (originator
Google, as well as anyone connected to Linux), and to the best of my
knowledge, nobody has been held to extortion yet.

Consider me unconvinced as to this line of argument.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: bytea_agg

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 11:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The renaming you propose would only be acceptable to those who have
 forgotten that history.  I haven't.

I had. I looked it up
http://archives.postgresql.org/pgsql-bugs/2010-08/msg00044.php

That was quite a thread.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 4:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Measurement?

 Sounds believable, I just want to make sure we have measured things.

Yes, I measured things.  I didn't post the results because they're
almost identical to the previous set of results which I already
posted.  That is, I wrote the patch; I ran it through the
instrumentation framework; the same long waits with the same set of
file/line combinations were still present.  Then I wrote the patch
that is attached to the OP, and also tested that, and those long waits
went away completely.

 I believe that, but if all buffers are I/O busy we should avoid
 waiting on a write I/O if possible.

I thought about that, but I don't see that there's any point in
further complicating the algorithm.  The current patch eliminates ALL
the long waits present in this code path, which means that the
situation where every CLOG buffer is I/O-busy at the same time either
never happens, or never causes any significant stalls.  I think it's a
bad idea to make this any more complicated than is necessary to do the
right thing in real-world cases.

 That seems much smarter. I'm thinking this should be back patched
 because it appears to be fairly major, so I'm asking for some more
 certainty that every thing you say here is valid. No doubt much of it
 is valid, but that's not enough.

Yeah, I was thinking about that.  What we're doing right now seems
pretty stupid, so maybe it's worth considering a back-patch.  OTOH,
I'm usually loathe to tinker with performance in stable releases.
I'll defer to the opinions of others on this point.

 Applying this patch does in fact eliminate the stalls.

 I'd like to see that measured from a user perspective. It would be
 good to see the response time distribution for run with and without
 the patch.

My feeling is that you're not going to see very much difference in a
latency-by-second graph, because XLogInsert is responsible for lots
and lots of huge stalls also.  That's going to mask the impact of
fixing this problem.  However, it's not much work to run the test, so
I'll do that.

 2. I think we might want to revisit Simon's idea of background-writing
 SLRU pages.

 Agreed. No longer anywhere near as important. I'll take a little
 credit for identifying the right bottleneck, since you weren't a
 believer before.

I don't think I ever said it was a bad idea; I just couldn't measure
any benefit.  I think now we know why, or at least have a clue; and
maybe some ideas about how to measure it better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 4:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Interesting. You've spoken at length how this hardly ever happens and
 so this can't have any performance effect. That was the reason for
 kicking out my patch addressing clog history, wasn't it?

Uh, no, the reason for kicking out your clog history patch was that it
caused throughput to drop by a factor of 3 on a pgbench test at scale
factor 300.  I assume you've got a bug there somewhere, or maybe
there's some other effect that hasn't been quantified.

 Why is this pgbench run accessing so much unhinted data that is  1
 million transactions old? Do you believe those numbers? Looks weird.

Seems pretty normal to me, for the reasons Greg Stark states.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 7:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Why is this pgbench run accessing so much unhinted data that is  1
 million transactions old? Do you believe those numbers? Looks weird.

 I think this is in the nature of the workload pgbench does. Because
 the updates are uniformly distributed, not concentrated 90% in 10% of
 the buffers like most real-world systems, (and I believe pgbench only
 does index lookups) the second time a tuple is looked at is going to
 average N/2 transactions later where N is the number of tuples.

 That's a good point, and it makes me wonder whether pgbench is the right
 test case to be micro-optimizing around.  It would be a good idea to at
 least compare the numbers for something with more locality of reference.

I agree that there are other benchmarks that are worth optimizing for,
but this particular change is more in the nature of a bug fix.  The
current code is waiting for an I/O on buffer A when there's no real
need and we're going afterwards proceed to NOT select buffer A anyway
(or at least, with no more probability than that it will select any
other buffer).

I don't think we're micro-optimizing, either.  I don't consider
avoiding a 10-second cessation of all database activity to be a
micro-optimization even on a somewhat artificial benchmark.

One other thing to think about is that pgbench at scale factor 300 is
not exactly a large working set.  You could easily imagine a
real-world data set that is more the size of scale factor 3000, and
10% of it is hot, and you'd have pretty much the same problem.  The
indexes would be a little deeper and so on, but I see no reason why
you wouldn't be able to reproduce this effect with the right test
set-up.  I am sure there will come a point when we've learned as much
as we can from pgbench and must graduate to more complex benchmarks to
have any hope of finding problems worth fixing, but we are surely
still quite a long ways off from that happy day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-04-04 Thread Kyotaro HORIGUCHI
  I'm afraid not re-initializing materialize_needed for the next query
  in the latest dblink patch.

I've found no need to worry about the re-initializing issue.

 I've committed a revised version of the previous patch.

Thank you for that. 

 I'm not sure that the case of dblink_is_busy not being used is
 interesting enough to justify contorting the logic, and I'm
 worried about introducing corner case bugs for that.

I'm afraid of indefinite state by mixing sync and async queries
or API call sequence for async query out of my expectations
(which is rather narrow).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] invalid search_path complaints

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Now, Scott's comment seems to me to offer a principled way out of this:
 if we define the intended semantics of search_path as being similar
 to the traditional understanding of Unix PATH, then it's not an error
 or even unexpected to have references to nonexistent schemas in there.
 But as soon as you say I want warnings in some cases, I think we have
 a mess that nobody is ever going to be happy with, because there will
 never be a clear and correct definition of which cases should get
 warnings.

I'm not sure I'm ready to sign on the dotted line with respect to
every aspect of your argument here, but that definition seems OK to
me.  In practice it's likely that a lot of the NOTICEs we emit now
will only be seen when restoring dumps, and certainly in that case
it's just junk anyway.  So I think this would be fine.

 In any case, I think we might be converging on an agreement that the
 setting should be *applied* if syntactically correct, whether or not
 we are agreed about producing a NOTICE or WARNING for unknown schemas.
 If I have not lost track, that is what happened before 9.1 but is not
 what is happening now, and we should change it to fix that compatibility
 break, independently of the question about messaging.

Sounds that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade improvements

2012-04-04 Thread Harold Giménez
Hi all,

I've written a pg_upgrade wrapper for upgrading our users (heroku) to
postgres 9.1. In the process I encountered a specific issue that could
easily be improved. We've had this process work consistently for many users
both internal and external, with the exception of just a few for whom the
process fails and required manual handholding.

Before it performs the upgrade, the pg_upgrade program starts the old
cluster, does various checks, and then attempts to stop it. On occasion
stopping the cluster fails - I've posted command output on a gist [1].
Manually running the pg_upgrade shortly afterwards succeeds. We believe
stopping the cluster times out because there are other connections to the
cluster that are established in that small window. There could be incoming
connections for a number of reasons: either the user or the user's
applications are reestablishing connections, or something like collectd on
the localhost attempts to connect during that small window.

Possible workarounds on the current version:

* Add an iptables rule to temporarily reject connections from the outside.
This is not viable because in a multitenant environment a process may write
an iptables rule, and meanwhile another process may permanently save rules,
including the temporary one. We can defend against that, but it does add a
lot of complexity.
* Rewrite pg_hba.conf temporarily while the pg_upgrade script runs to
disallow any other connections.

A possible solution for pg_upgrade is for it to make pg_upgrade use the
--force flag when stopping the cluster to kick connections out. There is no
reason to be polite in this case. Another idea that was kicked around with
my colleagues was to start the cluster in single-user mode, or only allow
unix socket connections somewhere in /tmp. Anything that rejects other
connections would be helpful.

It would also be nice if the invocation of pg_ctl didn't pipe its output to
/dev/null. I'm sure it would contain information that would directly point
at the root cause and could've saved some debugging and hand waving time.

Finally, just a note that while we haven't performed a huge number of
upgrades yet, we have upgraded a few production systems and for the most
part it has worked great.

Regards,

-Harold

[1] https://gist.github.com/112c97378c490d8f70fc


Re: [HACKERS] man pages for contrib programs

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 4:10 PM, Thom Brown t...@linux.com wrote:
 +1 to anything that separates these out.  Cramming them into one list
 like we currently have is confusing.

+1 as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade improvements

2012-04-04 Thread Stephen Frost
Harold,

* Harold Giménez (harold.gime...@gmail.com) wrote:
 Possible workarounds on the current version:

This has actually been discussed before and unfortunately there aren't
any trivial solutions.

 * Rewrite pg_hba.conf temporarily while the pg_upgrade script runs to
 disallow any other connections.

This is probably my favorite, from a 'practical' and 'effective'
standpoint.  I wonder if there'd be a way to specify the pg_hba.conf
to use on the command-line or in some other way, to avoid having to
actually modify the existing one (which would have to be 'unmodified'
after, of course..).

The other options would work, of course.  Perhaps my second favorite
option (second because I think it'd be more challenging and invasive) is
making the PG daemon only listens on a unix socket (which is not where
the default unix socket is).

The single-user option *sounds* viable, but, iirc, it actually isn't due
to the limitations on what can be done in that mode.

 It would also be nice if the invocation of pg_ctl didn't pipe its output to
 /dev/null. I'm sure it would contain information that would directly point
 at the root cause and could've saved some debugging and hand waving time.

Agreed.

 Finally, just a note that while we haven't performed a huge number of
 upgrades yet, we have upgraded a few production systems and for the most
 part it has worked great.

Great!

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade improvements

2012-04-04 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 The single-user option *sounds* viable, but, iirc, it actually isn't due
 to the limitations on what can be done in that mode.

Yeah.  IMO the right long-term fix is to be able to run pg_dump and psql
talking to a standalone backend, but nobody's gotten round to making
that possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers