date:20110113

Re: [HACKERS] Streaming base backups

2011-01-13 Thread Fujii Masao

On Fri, Jan 14, 2011 at 4:13 AM, Magnus Hagander  wrote:
>> While walsender is sending a base backup, WalSndWakeup should
>> not send the signal to that walsender?
>
> True, it's not necessary. How bad does it actually hurt things though?
> Given that the walsender running the backup isn't actually waiting on
> the latch, it doesn't actually send a signal, does it?

Yeah, you are right. Once WalSndWakeup sends the signal to walsender,
latch->is_set is set. So, then WalSndWakeup does nothing against that
walsender until latch->is_set is reset. Since ResetLatch is not called while
walsender is sending a base backup, that would be harmless.

>> At the end of the backup by walsender, it forces a switch to a new
>> WAL file and waits until the last WAL file has been archived. So we
>> should change postmaster so that it doesn't cause the archiver to
>> end before walsender ends when shutdown is requested?
>
> Um. I have to admit I'm not entirely following what you mean enough to
> confirm it, but it *sounds* correct :-)
>
> What scenario exactly is the problematic one?

1. Smart shutdown is requested while walsender is sending a backup.
2. Shutdown causes archiver to end.
 (Though shutdown sends SIGUSR2 to walsender to exit, walsender
  running backup doesn't respond for now)
3. At the end of backup, walsender calls do_pg_stop_backup, which
 forces a switch to a new WAL file and waits until the last WAL file has
 been archived.
 *BUT*, since archiver has already been dead, walsender waits for
 that forever.

>> Also, when shutdown is requested, the walsender which is
>> streaming WAL should not end before another walsender which
>> is sending a backup ends, to stream the backup-end WAL?
>
> Not sure I see the reason for that. If we're shutting down in the
> middle of the base backup, we don't have any support for continuing
> that one after we're back up - you have to start over.

For now, shutdown is designed to cause walsender to end after
sending all the WAL records. So I thought that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - file_fdw

2011-01-13 Thread Itagaki Takahiro

On Fri, Jan 14, 2011 at 14:20, Shigeru HANADA  wrote:
> After copying statisticsof pgbench_xxx tables into csv_xxx tables,
> planner generates same plans as for local tables, but costs of
> ForeignScan nodes are little lower than them of SeqScan nodes.
> Forced Nested Loop uses Materialize node as expected.

Interesting. It means we need per-column statistics for foreign
tables in addition to cost values.

> ISTM that new interface which is called from ANALYZE would help to
> update statistics of foreign talbes.  If we could leave sampling
> argorythm to FDWs, acquire_sample_rows() might fit for that purpose.

We will discuss how to collect statistics from foreign tables
in the next development cycle. I think we have two choice here:

 #1. Retrieve sample rows from remote foreign tables and
 store stats in the local pg_statistic.
 #2. Use remote statistics for each foreign table directly.

acquire_sample_rows() would be a method for #1, Another approach
for #2 is to use remote statistics directly. We provide hooks to
generate virtual statistics with get_relation_stats_hook() and
families. We could treat statistics for foreign tables in a similar
way as the hook.

file_fdw likes #1 because there are no external storage to store
statistics for CSV files, but pgsql_fdw might prefer #2 because
the remote server already has stats for the underlying table.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - file_fdw

2011-01-13 Thread Shigeru HANADA

On Fri, 14 Jan 2011 13:03:27 +0900
Itagaki Takahiro  wrote:

> Good catch. I merged your fix into the attached patch.

Thanks, I'll rebase my patches.

> BTW, why didn't planner choose a materialized plan for the inner loop?
> FDW scans are typically slower than heap scans or TupleTableslot scans,
> it seems reasonable for me to add a Materialize node at the top of the
> inner Foreign Scan, especially when we don't use indexes for the scan
> keys or join keys.

Maybe because foreign tables lack statistics, and file_fdw's estimate
isn't smart enough.

After copying statisticsof pgbench_xxx tables into csv_xxx tables,
planner generates same plans as for local tables, but costs of
ForeignScan nodes are little lower than them of SeqScan nodes.

==
postgres=# explain analyze select * from csv_accounts a, csv_branches b where 
a.bid = b.bid;
  QUERY PLAN
--
 Hash Join  (cost=0.33..45467.32 rows=100 width=197) (actual 
time=0.234..8044.077 rows=100 loops=1)
   Hash Cond: (a.bid = b.bid)
   ->  Foreign Scan on csv_accounts a  (cost=0.00..31717.00 rows=100 
width=97) (actual time=0.107..4147.074 rows=100 loops=1)
   ->  Hash  (cost=0.20..0.20 rows=10 width=100) (actual time=0.085..0.085 
rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Foreign Scan on csv_branches b  (cost=0.00..0.20 rows=10 
width=100) (actual time=0.027..0.056 rows=10 loops=1)
 Total runtime: 9690.686 ms
(7 rows)

postgres=# explain analyze select * from pgbench_accounts a, pgbench_branches b 
where a.bid = b.bid;
  QUERY PLAN
--
 Hash Join  (cost=1.23..40145.22 rows=100 width=197) (actual 
time=0.146..5693.883 rows=100 loops=1)
   Hash Cond: (a.bid = b.bid)
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..26394.00 rows=100 
width=97) (actual time=0.073..1884.018 rows=100 loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=100) (actual time=0.048..0.048 
rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10 
width=100) (actual time=0.003..0.021 rows=10 loops=1)
 Total runtime: 7333.713 ms
(7 rows)
==

Forced Nested Loop uses Materialize node as expected.

==
postgres=# set enable_hashjoin = false;
SET
postgres=# explain select * from csv_accounts a, csv_branches b where a.bid = 
b.bid;
QUERY PLAN
---
 Nested Loop  (cost=0.00..181717.23 rows=100 width=197)
   Join Filter: (a.bid = b.bid)
   ->  Foreign Scan on csv_accounts a  (cost=0.00..31717.00 rows=100 
width=97)
   ->  Materialize  (cost=0.00..0.25 rows=10 width=100)
 ->  Foreign Scan on csv_branches b  (cost=0.00..0.20 rows=10 width=100)
(5 rows)

postgres=# explain select * from pgbench_accounts a, pgbench_branches b where 
a.bid = b.bid;
QUERY PLAN
---
 Nested Loop  (cost=0.00..176395.12 rows=100 width=197)
   Join Filter: (a.bid = b.bid)
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..26394.00 rows=100 
width=97)
   ->  Materialize  (cost=0.00..1.15 rows=10 width=100)
 ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10 width=100)
(5 rows)
==

ISTM that new interface which is called from ANALYZE would help to
update statistics of foreign talbes.  If we could leave sampling
argorythm to FDWs, acquire_sample_rows() might fit for that purpose.

If a FDW doesn't provide analyze handler, postgres might be able to
execute "SELECT * FROM foreign_table LIMIT sample_num" internally to
get sample rows.

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

2011-01-13 Thread KaiGai Kohei

I tried to pick up this patch to review.

It seems to me fine, enough simple and works as explained in the
implementation level, apart from reasonability of this feature.
(Tom was not 100% agree with this feature 1.5month ago.)

I'm not certain whether the current regression test should be
updated, or not. The pg_regress launches psql with -q option,
so completionTag is always ignored.

Thanks,

(2010/11/29 0:14), Marti Raudsepp wrote:
> Hi list,
> 
> Often enough when developing PostgreSQL views and functions, I have
> pasted the CREATE OR REPLACE commands into the wrong window/shell and
> ran them there without realizing that I'm creating a function in the
> wrong database, instead of replacing. Currently psql does not provide
> any feedback of which action really occured.
> 
> Only after writing this patch I realized that I could instead raise a
> NOTICE, like current IF EXISTS/IF NOT EXISTS clauses do. Is that a
> better way to solve this?
> 
> This patch returns command tag "CREATE X" or "REPLACE X" for
> LANGAUGE/VIEW/RULE/FUNCTION. This is done by passing completionTag to
> from ProcessUtility to more functions, and adding a 'bool *didUpdate'
> argument to some lower-level functions. I'm not sure if passing back
> the status in a bool* is considered good style, but this way all the
> functions look consistent.
> 
> Regards,
> Marti
> 
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2011-01-13 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Joel Jacobson's message of jue ene 13 06:31:06 -0300 2011:
>> The example from Tom Lane below results in a database which is not
>> possible to correctly dump using pg_dump.

> I wouldn't care too much about that particular case -- you can't query
> any of the views either.

Yeah, the particular case is useless, but IIRC it's possible to
construct non-useless cases where there's a circular dependency
involving a view and something else (probably a function, but I'm too
lazy to try to make an example right now).  pg_dump's hack to break
the circularity by separating the view from its rule can save the day
in such cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] limiting hint bit I/O

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I whipped up the attached patch tonight.
>
> This appears to remove the BM_JUST_DIRTIED logic.  Please explain why
> that's not completely broken.  Even if it isn't completely broken,
> it would seem better to do something like that as a separate patch.

Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
has been set while a buffer write is in progress.  With this patch,
only BM_HINT_BITS can be set while the buffer write is in progress;
BM_DIRTY cannot.  Perhaps one could make the argument that this would
be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
set while a buffer I/O is in progress if it is set due to a hint-bit
update, and then we don't really care if the update gets lost.
Although that seems a bit confusing...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database file copy

2011-01-13 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of jue ene 13 00:05:53 -0300 2011:
> Srini Raghavan wrote:
> > Thank you very much for reviewing, appreciate the feedback.? As pointed out 
> > by 
> > you, it is always best to test it out with the latest version, so, I tested 
> > the 
> > same approach with postgres 9.0.2 on windows just now, and it works! 
> > 
> > 
> > I forgot to mention earlier that in addition to setting 
> > vacuum_freeze_table_age 
> > to 0, vacuum_freeze_min_age must also be set to 0 to reset xmin with the 
> > FrozenXid. 
> 
> I wonder if you should be using VACUUM FREEZE instead of having to set
> variables.  

The documentation says you shouldn't:

FREEZE
Selects aggressive "freezing" of tuples. Specifying FREEZE is equivalent to
performing VACUUM with the vacuum_freeze_min_age parameter set to zero. The
FREEZE option is deprecated and will be removed in a future release; set the
parameter instead.
http://www.postgresql.org/docs/9.0/static/sql-vacuum.html

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2011-01-13 Thread Alvaro Herrera

Excerpts from Joel Jacobson's message of jue ene 13 06:31:06 -0300 2011:
> The example from Tom Lane below results in a database which is not
> possible to correctly dump using pg_dump.

I wouldn't care too much about that particular case -- you can't query
any of the views either.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] limiting hint bit I/O

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> I whipped up the attached patch tonight.

This appears to remove the BM_JUST_DIRTIED logic.  Please explain why
that's not completely broken.  Even if it isn't completely broken,
it would seem better to do something like that as a separate patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-13 Thread Tom Lane

Bruce Momjian  writes:
> What does pg_upgrade need to do about this for 9.1?

Nothing.  An existing GIN index can still do all the same queries it
could do before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] limiting hint bit I/O

2011-01-13 Thread Robert Haas

I whipped up the attached patch tonight.  It's pretty quick and dirty,
so it's possible I've missed something, but the intent is to suppress
writing of hint bits by buffers allocating backends, and by
checkpoints, and write them only from the background writer cleaning
scan.  It therefore should (and does) avoid the problem that the first
scan of a relation after a bulk load is much slower than subsequent
scans.  I used this test case:

create table s as select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,100) g;

I didn't do any special configuration, so this was large enough to not
fit in shared_buffers, but small enough to fit in the OS cache.  Then
I did this repeatedly:

select sum(1) from s;

Without the patch, the first run took 1602 ms, and subsequent runs
took 207-216 ms.

With the patch, the first run took 270 ms, and subsequent runs
declined very, very slowly.  I got bored after getting down into the
240 ms range and ran VACUUM FREEZE, after which times dropped to about
197 ms.  (This also happens without the patch - VACUUM FREEZE seems to
speed things up a bit more than just setting all the hint bits.)

I find these results pretty depressing.  Obviously, the ~6x speedup on
the first run is great, but even after many runs subsequent runs it
was still 10-15% slower.  Certainly, for some people this patch might
be an improvement, but on the whole I can't see applying it, unless
someone can spot something I've done wrong that casts a different
light on the situation.  I am a little bit at a loss to explain how
I'm getting these results when others posted results that appeared to
show hint bits making very little difference.

Any insights appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f89e52..9a43e7f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -602,9 +602,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		/*
 		 * If the buffer was dirty, try to write it out.  There is a race
 		 * condition here, in that someone might dirty it after we released it
-		 * above, or even while we are writing it out (since our share-lock
-		 * won't prevent hint-bit updates).  We will recheck the dirty bit
-		 * after re-locking the buffer header.
+		 * above.  (Once we acquire a share-lock on the buffer contents, it's
+		 * no longer possible for it to get marked dirty, though hint bit
+		 * updates could still occur.) We will recheck the dirty bit after
+		 * re-locking the buffer header.
 		 */
 		if (oldFlags & BM_DIRTY)
 		{
@@ -774,10 +775,11 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		LockBufHdr(buf);
 
 		/*
-		 * Somebody could have pinned or re-dirtied the buffer while we were
-		 * doing the I/O and making the new hashtable entry.  If so, we can't
-		 * recycle this buffer; we must undo everything we've done and start
-		 * over with a new victim buffer.
+		 * Somebody could have re-dirtied the buffer before we acquired the
+		 * shared content lock, or pinned it at any time up until we
+		 * we re-locked the buffer header.  If so, we can't recycle this
+		 * buffer; we must undo everything we've done and start over with a
+		 * new victim buffer.
 		 */
 		oldFlags = buf->flags;
 		if (buf->refcount == 1 && !(oldFlags & BM_DIRTY))
@@ -801,7 +803,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 * 1 so that the buffer can survive one clock-sweep pass.)
 	 */
 	buf->tag = newTag;
-	buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
+	buf->flags &= ~(BM_VALID | BM_DIRTY | BM_HINT_BITS | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
 	if (relpersistence == RELPERSISTENCE_PERMANENT)
 		buf->flags |= BM_TAG_VALID | BM_PERMANENT;
 	else
@@ -976,7 +978,7 @@ MarkBufferDirty(Buffer buffer)
 	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
 		VacuumCostBalance += VacuumCostPageDirty;
 
-	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
+	bufHdr->flags |= BM_DIRTY;
 
 	UnlockBufHdr(bufHdr);
 }
@@ -1612,7 +1614,16 @@ SyncOneBuffer(int buf_id, bool skip_recently_used)
 		return result;
 	}
 
-	if (!(bufHdr->flags & BM_VALID) || !(bufHdr->flags & BM_DIRTY))
+	/*
+	 * We can get here either because we're checkpointing, or from the
+	 * background writer cleaning scan.  In the latter case, we want to
+	 * write out buffers with hint bit changes so that such changes eventually
+	 * make their way to disk.  In the former case we normally won't get called
+	 * if only hint bit changes have occurred, but if it somehow happens we'll
+	 * just write the page anyway.
+	 */
+	if (!(bufHdr->flags & BM_VALID)
+		|| !(bufHdr->flags & (BM_DIRTY | BM_HINT_BITS)))
 	{
 		/* It's clean, so nothing to do */
 		UnlockBufHdr(bufHdr);
@@ -

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-13 Thread Bruce Momjian

Robert Haas wrote:
> On Tue, Jan 4, 2011 at 4:49 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Tue, Jan 4, 2011 at 4:09 PM, Tom Lane  wrote:
> >>> * Existing GIN indexes are upwards compatible so far as on-disk storage
> >>> goes, but they will of course be missing entries for empty, null, or
> >>> null-containing items. ?Users who want to do searches that should find
> >>> such items will need to reindex after updating to 9.1.
> >
> >> This is the only part of this proposal that bothers me a little bit.
> >> It would be nice if the system could determine whether a GIN index is
> >> "upgraded from 9.0 or earlier and thus doesn't contain these entries"
> >> - and avoid trying to use the index for these sorts of queries in
> >> cases where it might return wrong answers.
> >
> > I don't think it's really worth the trouble. ?The GIN code has been
> > broken for these types of queries since day one, and yet we've had only
> > maybe half a dozen complaints about it. ?Moreover there's no practical
> > way to "avoid trying to use the index", since in many cases the fact
> > that a query requires a full-index scan isn't determinable at plan time.
> >
> > The best we could really do is throw an error at indexscan start, and
> > that doesn't seem all that helpful. ?But it probably wouldn't take much
> > code either, if you're satisfied with that answer. ?(I'm envisioning
> > adding a version ID to the GIN metapage and then checking that before
> > proceeding with a full-index scan.)
> 
> I'd be satisfied with that answer.  It at least makes it a lot more
> clear when you've got a problem.  If this were a more common scenario,
> I'd probably advocate for a better solution, but the one you propose
> seems adequate given the frequency of the problem as you describe it.

What does pg_upgrade need to do about this for 9.1?  Just tell people
they might get an GIN error someday?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 8:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jan 13, 2011 at 8:10 PM, Tom Lane  wrote:
>>> Florian Pflug  writes:
 So maybe there should be a GUC for this?
>
>>> No need (and rather inflexible anyway).  If you don't want an orphaned
>>> backend to continue, you send it SIGTERM.
>
>> It is not easy to make this work in such a way that you can ensure a
>> clean, automatic restart of PostgreSQL after a postmaster death.
>> Which is what at least some people want.
>
> True.  It strikes me also that the postmaster does provide some services
> other than accepting new connections:
>
> * ensuring that everybody gets killed if a backend crashes
>
> * respawning autovac launcher and other processes that might exit
> harmlessly
>
> * is there still any cross-backend signaling that goes through the
> postmaster?  We got rid of the sinval case, but I don't recall if
> there's others.
>
> While you could probably live without these in the scenario of "let my
> honking big query finish before restarting", you would not want to do
> without them in unattended operation.

Yep.  I'm pretty doubtful that you're going to want them even in that
case, but you're surely not going to want them in unattended
operation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 13, 2011 at 8:10 PM, Tom Lane  wrote:
>> Florian Pflug  writes:
>>> So maybe there should be a GUC for this?

>> No need (and rather inflexible anyway).  If you don't want an orphaned
>> backend to continue, you send it SIGTERM.

> It is not easy to make this work in such a way that you can ensure a
> clean, automatic restart of PostgreSQL after a postmaster death.
> Which is what at least some people want.

True.  It strikes me also that the postmaster does provide some services
other than accepting new connections:

* ensuring that everybody gets killed if a backend crashes

* respawning autovac launcher and other processes that might exit
harmlessly

* is there still any cross-backend signaling that goes through the
postmaster?  We got rid of the sinval case, but I don't recall if
there's others.

While you could probably live without these in the scenario of "let my
honking big query finish before restarting", you would not want to do
without them in unattended operation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 8:10 PM, Tom Lane  wrote:
> Florian Pflug  writes:
>> I don't believe there's one right answer to that.
>
> Right.  Force-kill presumes there is only one right answer.
>
>> Assume postgres is driving a website, and the postmaster crashes shortly
>> after a pg_dump run started. You probably won't want your website to be
>> offline while pg_dump is finishing its backup.
>
>> If, on the other hand, your data warehousing database is running a
>> multi-hour query, you might prefer that query to finish, even at the price
>> of not being able to accept new connections.
>
>> So maybe there should be a GUC for this?
>
> No need (and rather inflexible anyway).  If you don't want an orphaned
> backend to continue, you send it SIGTERM.

It is not easy to make this work in such a way that you can ensure a
clean, automatic restart of PostgreSQL after a postmaster death.
Which is what at least some people want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 7:32 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane  wrote:
>>> Killing active sessions when it's not absolutely necessary is not an
>>> asset.
>
>> That's a highly arguable point and I certainly don't agree with it.
>
> Your examples appear to rely on the assumption that background processes
> exit instantly when the postmaster dies.  Which they should not.

But they do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Florian Pflug  writes:
> I don't believe there's one right answer to that.

Right.  Force-kill presumes there is only one right answer.

> Assume postgres is driving a website, and the postmaster crashes shortly
> after a pg_dump run started. You probably won't want your website to be
> offline while pg_dump is finishing its backup.

> If, on the other hand, your data warehousing database is running a
> multi-hour query, you might prefer that query to finish, even at the price
> of not being able to accept new connections.

> So maybe there should be a GUC for this?

No need (and rather inflexible anyway).  If you don't want an orphaned
backend to continue, you send it SIGTERM.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Florian Pflug

On Jan14, 2011, at 01:32 , Tom Lane wrote:
> Robert Haas  writes:
>> On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane  wrote:
>>> Killing active sessions when it's not absolutely necessary is not an
>>> asset.
> 
>> That's a highly arguable point and I certainly don't agree with it.
> 
> Your examples appear to rely on the assumption that background processes
> exit instantly when the postmaster dies.  Which they should not.

Even if they stay around, no new connections will be possible once the
postmaster is gone. So this really comes down to what somebody perceives
to be a bigger problem - new connections failing or existing connections
being terminated.

I don't believe there's one right answer to that.

Assume postgres is driving a website, and the postmaster crashes shortly
after a pg_dump run started. You probably won't want your website to be
offline while pg_dump is finishing its backup.

If, on the other hand, your data warehousing database is running a
multi-hour query, you might prefer that query to finish, even at the price
of not being able to accept new connections.

So maybe there should be a GUC for this?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane  wrote:
>> Killing active sessions when it's not absolutely necessary is not an
>> asset.

> That's a highly arguable point and I certainly don't agree with it.

Your examples appear to rely on the assumption that background processes
exit instantly when the postmaster dies.  Which they should not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error code for "terminating connection due to conflict with recovery"

2011-01-13 Thread Tatsuo Ishii

> Tatsuo Ishii  writes:
>>> Please add this to the currently open CommitFest:
>>> https://commitfest.postgresql.org/action/commitfest_view/open
> 
>> Done. Comments are welcome. Unless there's objection, I will commit it
>> this weekend.
> 
> If you're expecting anyone to actually *review* it during the CF,
> that's a bit premature.

No problem to wait for longer. I will wait by the end of January for
the present.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander  wrote:
>> +1, I like the idea. Would it still be there to override if necessary?

> Depends what people want to do.  We could make the default "0kB", and
> define that to mean "auto-tune", or we could remove the parameter
> altogether.  I think I was envisioning the latter, but if people are
> hesitant to do that we could do the former instead.

I think we need to keep the override capability until the autotune
algorithm has proven itself in the field for a couple of years.

I agree with Josh that a negative value should be used to select the
autotune method.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-13 Thread David E. Wheeler

On Jan 13, 2011, at 4:15 PM, Stephen J. Butler wrote:

> Suppose one of these compatibility objects is passed into legacy code
> as $_[0]. The problem is that 'ref $_[0]' will return 'Pg::ArrayArg'
> instead of what it used to, '' (empty string). Other than that, I
> think it performs as people would expect.

Well, frankly, since up to this patch you *never* got an ARRAY reference 
argument, who would be calling `ref` on it anyway?

> You could even change 'as_s' to generate the string on the fly as
> requested instead of generating both representations at instantiation.

Yep.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-13 Thread Kevin Grittner

Anssi Kääriäinen wrote:
 
> I think I found a problem. This is using SSI v8. The table
> definition:
> 
> create table test_t (id integer, val1 text, val2 integer);
 
> create index test_idx on test_t(id) where val2 = 1;
 
> insert into test_t (select generate_series(0, 1), 'a', 2);
> insert into test_t (select generate_series(0, 10), 'a', 1);
 
> T1:
> hot2=> begin transaction isolation level serializable;
 
> hot2=> select * from test_t where val2 = 1;
 
> hot2=> update test_t set val2 = 2 where val2 = 1 and id = 10;
 
> T2:
> hot2=> begin transaction isolation level serializable;
 
> hot2=> select * from test_t where val2 = 1;
 
> hot2=> update test_t set val2 = 2 where val2 = 1 and id = 9;
 
> hot2=> commit;
 
> T1:
> hot2=> commit;
 
I hope you have no objection to having the code you wrote included
in the test suite which is part of the patch.  Well, if you do, I'll
pull it back out and invent something similar...  ;-)
 
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=2502cccbdd5e5d44be469549b91fe49c0554ec3e
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-13 Thread Stephen J. Butler

On Thu, Jan 13, 2011 at 2:06 AM, Martijn van Oosterhout
 wrote:
> I played with this a little and it is fairly easy to make a variable
> such that $a is the string representation and $a[0] the first value of
> the array. The problem is that you can't pass such a variable into a
> subroutine.

I played with this too:


#!/usr/bin/perl -w

use strict;

package Pg::ArrayArg;

use overload
'""'=> \&as_s,
'@{}'   => \&as_a;

sub new {
my $proto = shift;
my $class = ref $proto || $proto;

bless {
string  => shift,
array   => shift
}, $class;
}

sub as_s {
shift->{ 'string' };
}

sub as_a {
shift->{ 'array' };
}

package main;

my $aa = Pg::ArrayArg->new( '{1,2}', [ 1, 2 ] );

printf "ref = %s\n", ref $aa;
print "string = $aa\n";
printf "string = %s\n", $aa;
printf "array index = (%s, %s)\n", $aa->[ 0 ], $aa->[ 1 ];
printf "array_ref = %s\n", scalar @$aa;

print "regexp test = ";
if ($aa =~ /^{(.*)}$/) {
print "looks like array\n";
printf "join of split = %s\n", join ';', split /,/, $1;
} else {
print "doesn't look like array\n";
}

Suppose one of these compatibility objects is passed into legacy code
as $_[0]. The problem is that 'ref $_[0]' will return 'Pg::ArrayArg'
instead of what it used to, '' (empty string). Other than that, I
think it performs as people would expect.

You could even change 'as_s' to generate the string on the fly as
requested instead of generating both representations at instantiation.

Just my $0.02.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error code for "terminating connection due to conflict with recovery"

2011-01-13 Thread Tom Lane

Tatsuo Ishii  writes:
>> Please add this to the currently open CommitFest:
>> https://commitfest.postgresql.org/action/commitfest_view/open

> Done. Comments are welcome. Unless there's objection, I will commit it
> this weekend.

If you're expecting anyone to actually *review* it during the CF,
that's a bit premature.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error code for "terminating connection due to conflict with recovery"

2011-01-13 Thread Tatsuo Ishii

> On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii  wrote:
>> Ok. Here is the patch for this. I use 40P02, instead of 40004.
> 
> Please add this to the currently open CommitFest:
> 
> https://commitfest.postgresql.org/action/commitfest_view/open

Done. Comments are welcome. Unless there's objection, I will commit it
this weekend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread David Fetter

On Thu, Jan 13, 2011 at 03:29:13PM -0800, Jeff Davis wrote:
> On Thu, 2011-01-13 at 11:14 -0800, David Fetter wrote:
> > I get that we can't prevent all pilot error, but I was hoping we
> > could bullet-proof this a little more, especially in light of a
> > certain extremely popular server OS's OOM killer's default
> > behavior.
> 
> That's a good point.  I'm not sure how much action can reasonably be
> taken, however.

We may find out from Florian's experiments :)

> > Yes, I get that that behavior is crazy, and stupid, and that
> > people should shut it off, but it *is* our problem if we let the
> > postmaster start (or continue) when it's set that way.
> 
> As an aside, linux has actually changed the heuristic:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10

Great!  In a decade or so, no more servers will be running with an
earlier kernel ;)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Jeff Davis

On Thu, 2011-01-13 at 11:14 -0800, David Fetter wrote:
> I get that we can't prevent all pilot error, but I was hoping we could
> bullet-proof this a little more, especially in light of a certain
> extremely popular server OS's OOM killer's default behavior.

That's a good point. I'm not sure how much action can reasonably be
taken, however.

> Yes, I get that that behavior is crazy, and stupid, and that people
> should shut it off, but it *is* our problem if we let the postmaster
> start (or continue) when it's set that way.

As an aside, linux has actually changed the heuristic:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Kevin Grittner

Robert Haas  wrote:

> Would anyone like to argue vigorously for or against the above
> proposal?

Greg's numbers look reasonable to me, and there's nobody I'd trust
more to come up with reasonable numbers for this.  One less tunable
is a good thing, especially since this designed to scale from
someone slapping it on his laptop for a first quick try, all the way
up to industrial strength production environments.  I guess a manual
override doesn't bother me too much, but I am a bit dubious of its
value, and there is value in keeping the GUC count down

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Josh Berkus

Robert,

>> Unfortunately, we might still need a manual parameter for override
>> because of the interaction between wal_buffers and
>> synchronous_commit=off, since it sets the max size of the unflushed data
>> buffer.  Discuss?
> 
> Do we have any evidence there's actually a problem in that case, or
> that a larger value of wal_buffers solves it?  I mean, the background
> writer is going to start a background flush as quickly as it can...

I don't think anyone has done any testing.  However, the setting is
there and some users might be convinced that they need it.

>> And the "auto" setting should be -1, not 0kB.  We use -1 for "use
>> default" for several other GUCs.
> 
> No can do.  Gotta have things in the same units.

That's certainly not true with, for example, log_temp_files.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Do magic using pg_depend

2011-01-13 Thread Joel Jacobson

https://github.com/gluefinance/pov/blob/master/doc/example_database.sql

Please feel free to put any of this on the PostgreSQL wiki like
suggested by David Fetter.

This is an example of some functionality provided and used by pov
(PostgreSQL Object version control system).
Most of, if not all, this stuff has already been implemented in pg_dump,
but since pov is a SQL-based system it was necessary to implement the
same functionality
using only sql/plpgsql/plperl.

Author: Joel Jacobson, Glue Finance AB, Sweden, 
Datestamp: 2011-01-13 23:42 Europe/Stockholm
License: MIT (http://www.opensource.org/licenses/mit-license.php)

We will learn how to do a lot of PostgreSQL-magic only by using the
nice system table "pg_depend".
Today we will,
a) create nice directional graphs of all object dependencies,
b) sort all objects in a truly sorted topological creatable order,
c) show create/drop commands for most of the objects.

Let the show begin!

Installation:

$ git clone g...@github.com:gluefinance/pov.git
$ cd pov
$ sh install_example_database.sh

a) Generate directional graph in DOT-format.
COPY (SELECT diagraph FROM pov.pg_depend_dot) TO '/tmp/example_database.dot';

Then use the dot (http://www.graphviz.org/) to generate graphs in svg,
png, or any format.
dot -oexample_database.png -Tpng example_database.dot
dot -oexample_database.svg -Tsvg example_database.dot

Or view it in the SQL prompt:

test=# select * from pov.pg_depend_dot;
  diagraph
-
 digraph pg_depend {
 "function plpgsql_call_handler() 1255.11599.0" -> "language
plpgsql 2612.11602.0" [color=black label=n]
 "function plpgsql_inline_handler(internal) 1255.11600.0" ->
"language plpgsql 2612.11602.0" [color=black label=n]
 "function plpgsql_validator(oid) 1255.11601.0" -> "language
plpgsql 2612.11602.0" [color=black label=n]
 "function plperl_call_handler() 1255.23562.0" -> "language plperl
2612.23565.0" [color=black label=n]
 "function plperl_inline_handler(internal) 1255.23563.0" ->
"language plperl 2612.23565.0" [color=black label=n]
 "function plperl_validator(oid) 1255.23564.0" -> "language plperl
2612.23565.0" [color=black label=n]
 "function f1(integer) 1255.23656.0" -> "view v4 1259.23688.0"
[color=black label=n]
 "function f1(integer) 1255.23656.0" -> "constraint t3_id_check on
table t3 2606.23673.0" [color=black label=n]
 "table t1 1259.23651.0" -> "table t1 column id 1259.23651.1"
[color=yellow label=an]
 "table t1 column id 1259.23651.1" -> "view v1 1259.23676.0"
[color=black label=n]
 "table t1 column id 1259.23651.1" -> "constraint t1_pkey on table
t1 2606.23655.0" [color=blue label=a]
 "table t1 column id 1259.23651.1" -> "constraint t2_id_fkey on
table t2 2606.23664.0" [color=black label=n]
 "sequence s1 1259.23657.0" -> "default for table t3 column id
2604.23672.0" [color=black label=n]
 "table t2 1259.23659.0" -> "table t2 column id 1259.23659.1"
[color=yellow label=an]
 "table t2 column id 1259.23659.1" -> "view v2 1259.23680.0"
[color=black label=n]
 "table t2 column id 1259.23659.1" -> "constraint t2_pkey on table
t2 2606.23663.0" [color=blue label=a]
 "table t2 column id 1259.23659.1" -> "constraint t2_id_fkey on
table t2 2606.23664.0" [color=blue label=a]
 "table t3 1259.23669.0" -> "table t3 column id 1259.23669.1"
[color=yellow label=an]
 "table t3 column id 1259.23669.1" -> "default for table t3 column
id 2604.23672.0" [color=blue label=a]
 "table t3 column id 1259.23669.1" -> "constraint t3_id_check on
table t3 2606.23673.0" [color=yellow label=na]
 "table t3 column id 1259.23669.1" -> "constraint t3_pkey on table
t3 2606.23675.0" [color=blue label=a]
 "view v1 1259.23676.0" -> "view v1 column id 1259.23676.1"
[color=black label=n]
 "view v1 column id 1259.23676.1" -> "view v3 1259.23684.0"
[color=black label=n]
 "view v2 1259.23680.0" -> "view v2 column id 1259.23680.1"
[color=black label=n]
 "view v2 column id 1259.23680.1" -> "view v3 1259.23684.0"
[color=black label=n]
 "view v3 1259.23684.0" -> "view v3 column id1 1259.23684.1"
[color=black label=n]
 "view v3 1259.23684.0" -> "view v3 column id2 1259.23684.2"
[color=black label=n]
 "view v3 column id1 1259.23684.1" -> "view v4 1259.23688.0"
[color=black label=n]
 "view v3 column id2 1259.23684.2" -> "view v4 1259.23688.0"
[color=black label=n]
 "constraint t1_pkey on table t1 2606.23655.0" -> "constraint
t2_id_fkey on table t2 2606.23664.0" [color=black label=n]
 "schema public 2615.2200.0" -> "function f1(integer)
1255.23656.0" [color=black label=n]
 "schema public 2615.2200.0" -> "table t1 1259.23651.0"
[color=black label=n]
 "schema public 2615.2200.0" -> "sequence s1 1259.23657.0"
[color=black label=n]
 "schema public 2615.2200.0" -> "table t2 1259.23659.0"

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 6:02 PM, Josh Berkus  wrote:
>
>> Depends what people want to do.  We could make the default "0kB", and
>> define that to mean "auto-tune", or we could remove the parameter
>> altogether.  I think I was envisioning the latter, but if people are
>> hesitant to do that we could do the former instead.
>
> Unfortunately, we might still need a manual parameter for override
> because of the interaction between wal_buffers and
> synchronous_commit=off, since it sets the max size of the unflushed data
> buffer.  Discuss?

Do we have any evidence there's actually a problem in that case, or
that a larger value of wal_buffers solves it?  I mean, the background
writer is going to start a background flush as quickly as it can...

> And the "auto" setting should be -1, not 0kB.  We use -1 for "use
> default" for several other GUCs.

No can do.  Gotta have things in the same units.

> Other than that, I think Greg's numbers are fine, and strongly support
> having one less thing to tune.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Josh Berkus


> Depends what people want to do.  We could make the default "0kB", and
> define that to mean "auto-tune", or we could remove the parameter
> altogether.  I think I was envisioning the latter, but if people are
> hesitant to do that we could do the former instead.

Unfortunately, we might still need a manual parameter for override
because of the interaction between wal_buffers and
synchronous_commit=off, since it sets the max size of the unflushed data
buffer.  Discuss?

And the "auto" setting should be -1, not 0kB.  We use -1 for "use
default" for several other GUCs.

Other than that, I think Greg's numbers are fine, and strongly support
having one less thing to tune.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Florian Pflug

On Jan13, 2011, at 21:42 , Tom Lane wrote:
> Aidan Van Dyk  writes:
>> If postmaster has a few fds to spare, what about having it open a pipe
>> to every child it spawns.  It never has to read/write to it, but
>> postmaster closing will signal the client's fd.  The client just has
>> to pop the fd into whatever nrmal poll/select event handlign it uses
>> to notice when the "parent's pipe" is closed.
> 
> Hmm.  Or more generally: there's one FIFO.  The postmaster holds both
> sides open.  Backends hold the write side open.  (They can close the
> read side, but that would just be to free up a FD.)  Background children
> close the write side.  Now a background process can use EOF on the read
> side of the FIFO to tell it that postmaster and all backends have
> exited.  You still don't get a signal, but at least the condition you're
> testing for is the one we actually want and not an approximation.

I was thinking along a similar line, and put together small test case to
prove that this actually works. The attached test program simulates the
interactions of a parent process (think postmaster), some utility processes
(think walwriter, bgwriter, ...) and some backends. It uses two pairs of
fd created with pipe(), called LifeSignParent and LifeSignParentBackends.

The writing end of the former is held open only in the parent process,
while the writing end of the latter is held open in the parent process and
all regular backend processes. Backend processes use select() to monitor
the reading end of the LifeSignParent fd pair. Since nothing is ever written
to the writing end, the fd becomes readable only when the parent exits,
because that is how select() signals EOF. Once that happens the backend
exits. The utility processes do the same, but monitor the reading end of
LifeSignParentBackends, and thus exit only after the parent and all regular
backends have died.

Since the lifesign checking uses select(), any place that already uses 
select can easily check for vanishing life signs. CHECK_FOR_INTERRUPTS could
simply check the life sign once every few seconds.

If we want an absolutely reliable signal instead of checking in
CHECK_FOR_INTERRUPTS, every backend would need to launch a monitor subprocess
which monitors the life sign, and exits once it vanishes. The backend would
then get a SIGCHLD once the postmaster dies. Seems like overkill, though.

The whole thing won't work on Windows, since even if it's got a pipe() or
socketpair() call, with EXEC_BACKEND there's no way of transferring these
fds to the child processes. AFAIK, however, Windows has other means with 
which such life signs can be implemented. For example, I seem to remember
that WaitForMultipleObjects() can be used to wait for process-related events.
But windows really isn't my area of expertise...

I have tested this on the latest Ubunutu LTS release (10.04.1) as well as
Mac OS X 10.6.6, and it seems to work correctly on both systems. I'd be
happy to hear from anyone who has access to other systems on whether this
works or not. The expected output is

Launched utility 5095
Launched backend 5097
Launched utility 5096
Launched backend 5099
Launched backend 5098
Utility 5095 detected live parent or backend
Backend 5097 detected live parent
Utility 5096 detected live parent or backend
Backend 5099 detected live parent
Backend 5098 detected live parent
Parent exiting
Backend 5097 exiting after parent died
Backend 5098 exiting after parent died
Backend 5099 exiting after parent died
Utility 5096 exiting after parent and backends died
Utility 5095 exiting after parent and backends died

Everything after "Parent exiting" might be interleaved with a shell prompt,
of course.

best regards,
Florian Pflug

liveness.c
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander  wrote:
> +1, I like the idea. Would it still be there to override if necessary?

Depends what people want to do.  We could make the default "0kB", and
define that to mean "auto-tune", or we could remove the parameter
altogether.  I think I was envisioning the latter, but if people are
hesitant to do that we could do the former instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Magnus Hagander

On Thu, Jan 13, 2011 at 23:19, Robert Haas  wrote:
> On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith  wrote:
>> If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
>> setting, it would for the most part become an autotuned parameter.  That
>> would make it 0.75 to 1MB at the standard anemic Linux default kernel
>> parameters.  Maybe more than some would like, but dropping shared_buffers
>> from 24MB to 23MB to keep this from being ridiculously undersized is
>> probably a win.  That percentage would reach 16MB by the time shared_buffers
>> was increased to 533MB, which also seems about right to me.  On a really bad
>> setup (brief pause to flip off Apple) with only 4MB to work with total,
>> you'd end up with wal_buffers between 64 and 128K, so very close to the
>> status quo.
>>
>> Code that up, and we could probably even remove the parameter as a tunable
>> altogether.  Very few would see a downside relative to any sensible
>> configuration under the current situation, and many people would notice
>> better automagic performance with one less parameter to tweak.  Given the
>> recent investigations about the serious downsides of tiny wal_buffers values
>> on new Linux kernels when using open_datasync, a touch more aggression about
>> this setting seems particularly appropriate to consider now.  That's been
>> swapped out as the default, but it's still possible people will switch to
>> it.
>
> Would anyone like to argue vigorously for or against the above proposal?
>
> I'll start: I think this is a good idea.  I don't have a strong
> opinion on whether the exact details of Greg proposes above are
> precisely optimal, but I think they're in the right ballpark.
> Furthermore, we already have other things that are tuned in somewhat
> similar ways (e.g. the size of the fsync request queue defaults to the
> number of shared buffers) so there's precedent for it.  It's one less
> parameter that you have to set to make things just work.

+1, I like the idea. Would it still be there to override if necessary?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] auto-sizing wal_buffers

2011-01-13 Thread Robert Haas

On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith  wrote:
> If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
> setting, it would for the most part become an autotuned parameter.  That
> would make it 0.75 to 1MB at the standard anemic Linux default kernel
> parameters.  Maybe more than some would like, but dropping shared_buffers
> from 24MB to 23MB to keep this from being ridiculously undersized is
> probably a win.  That percentage would reach 16MB by the time shared_buffers
> was increased to 533MB, which also seems about right to me.  On a really bad
> setup (brief pause to flip off Apple) with only 4MB to work with total,
> you'd end up with wal_buffers between 64 and 128K, so very close to the
> status quo.
>
> Code that up, and we could probably even remove the parameter as a tunable
> altogether.  Very few would see a downside relative to any sensible
> configuration under the current situation, and many people would notice
> better automagic performance with one less parameter to tweak.  Given the
> recent investigations about the serious downsides of tiny wal_buffers values
> on new Linux kernels when using open_datasync, a touch more aggression about
> this setting seems particularly appropriate to consider now.  That's been
> swapped out as the default, but it's still possible people will switch to
> it.

Would anyone like to argue vigorously for or against the above proposal?

I'll start: I think this is a good idea.  I don't have a strong
opinion on whether the exact details of Greg proposes above are
precisely optimal, but I think they're in the right ballpark.
Furthermore, we already have other things that are tuned in somewhat
similar ways (e.g. the size of the fsync request queue defaults to the
number of shared buffers) so there's precedent for it.  It's one less
parameter that you have to set to make things just work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread David Fetter

On Thu, Jan 13, 2011 at 02:21:44PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > I get that we can't prevent all pilot error, but I was hoping we
> > could bullet-proof this a little more, especially in light of a
> > certain extremely popular server OS's OOM killer's default
> > behavior.
> 
> > Yes, I get that that behavior is crazy, and stupid, and that
> > people should shut it off, but it *is* our problem if we let the
> > postmaster start (or continue) when it's set that way.
> 
> Packagers who are paying attention have fixed that ;-)

Are we privileging packaged over unpackaged?  Some distro over others?  ;)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reviewers needed!

2011-01-13 Thread Josh Berkus


> So far I have 6 people who have volunteered to be round-robin
> reviewers, and 7 people who are listed as reviewers on the CF site
> already.  That leaves 45 patches without a reviewer, plus whatever
> comes in in the next day or so.  This is not going to work unless a
> lot more people pitch in.

I'll be joining as an RRR on the 24th.  I'm too booked before then.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reviewers needed!

2011-01-13 Thread Robert Haas

On Tue, Jan 11, 2011 at 9:17 PM, Robert Haas  wrote:
> [ abject plea for reviewers ]

So far I have 6 people who have volunteered to be round-robin
reviewers, and 7 people who are listed as reviewers on the CF site
already.  That leaves 45 patches without a reviewer, plus whatever
comes in in the next day or so.  This is not going to work unless a
lot more people pitch in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Heikki Linnakangas


On 13.01.2011 22:57, Josh Berkus wrote:

On 1/13/11 12:11 PM, Robert Haas wrote:

That's going to depend on the situation.  If the database fits in
memory, then it's just going to work.  If it fits on disk, it's less
obvious whether it'll be good or bad, but an arbitrary limitation here
doesn't serve us well.


FWIW, if we had this feature right now in 9.0 we (PGX) would be using
it.  We run into the case of DB in memory, multiple slaves fairly often
these days.


Anyway, here's an updated patch with all the known issues fixed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5b6a230..400e12e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -60,8 +60,6 @@
 
 
 /* File path names (all relative to $PGDATA) */
-#define BACKUP_LABEL_FILE		"backup_label"
-#define BACKUP_LABEL_OLD		"backup_label.old"
 #define RECOVERY_COMMAND_FILE	"recovery.conf"
 #define RECOVERY_COMMAND_DONE	"recovery.done"
 
@@ -338,7 +336,8 @@ typedef struct XLogCtlInsert
 	XLogPageHeader currpage;	/* points to header of block in cache */
 	char	   *currpos;		/* current insertion point in cache */
 	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
-	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
+	int			forcePageWrites;	/* forcing full-page writes for PITR? */
+	bool		exclusiveBackup;	/* a backup was started with pg_start_backup() */
 } XLogCtlInsert;
 
 /*
@@ -8313,16 +8312,38 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
-	startpoint = do_pg_start_backup(backupidstr, fast);
+	startpoint = do_pg_start_backup(backupidstr, fast, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), "%X/%X",
 			 startpoint.xlogid, startpoint.xrecoff);
 	PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));
 }
 
+/*
+ * do_pg_start_backup is the workhorse of the user-visible pg_start_backup()
+ * function. It creates the necessary starting checkpoint and constructs the
+ * backup label file.
+ * 
+ * There are two kind of backups: exclusive and non-exclusive. An exclusive
+ * backup is started with pg_start_backup(), and there can be only one active
+ * at a time. The backup label file of an exclusive backup is written to
+ * $PGDATA/backup_label, and it is removed by pg_stop_backup().
+ *
+ * A non-exclusive backup is used for the streaming base backups (see
+ * src/backend/replication/basebackup.c). The difference to exclusive backups
+ * is that the backup label file is not written to disk. Instead, its would-be
+ * contents are returned in *labelfile, and the caller is responsible for
+ * including it in the backup archive as 'backup_label'. There can be many
+ * non-exclusive backups active at the same time, and they don't conflict
+ * with exclusive backups either.
+ *
+ * Every successfully started non-exclusive backup must be stopped by calling
+ * do_pg_stop_backup() or do_pg_abort_backup().
+ */
 XLogRecPtr
-do_pg_start_backup(const char *backupidstr, bool fast)
+do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
 {
+	bool		exclusive = (labelfile == NULL);
 	XLogRecPtr	checkpointloc;
 	XLogRecPtr	startpoint;
 	pg_time_t	stamp_time;
@@ -8332,6 +8353,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	uint32		_logSeg;
 	struct stat stat_buf;
 	FILE	   *fp;
+	StringInfoData labelfbuf;
 
 	if (!superuser() && !is_authenticated_user_replication_role())
 		ereport(ERROR,
@@ -8350,6 +8372,12 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 			  errmsg("WAL level not sufficient for making an online backup"),
  errhint("wal_level must be set to \"archive\" or \"hot_standby\" at server start.")));
 
+	if (strlen(backupidstr) > MAXPGPATH)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("backup label too long (max %d bytes)",
+		MAXPGPATH)));
+
 	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
 	 * during an on-line backup even if not doing so at other times, because
@@ -8368,15 +8396,19 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	 * ensure adequate interlocking against XLogInsert().
 	 */
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
-	if (XLogCtl->Insert.forcePageWrites)
+	if (exclusive)
 	{
-		LWLockRelease(WALInsertLock);
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("Run pg_stop_backup() and try again.")));
+		if (XLogCtl->Insert.exclusiveBackup)
+		{
+			LWLockRelease(WALInsertLock);
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("a backup is already in progress"),
+	 errhint("Run pg_stop_backup() and try again.")));
+		}
+		XLogCtl->Insert.exclusiveBackup = true;
 	}
-	XLogCtl->Insert.forcePageWrites = true;
+	XLogCtl->Insert.forcePageWrites++;
 	LWLockRelease(WALInsertLock);

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 3:37 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I strongly believe you're in the minority on that one, for the same
>> reasons that I don't think most people would agree with your notion of
>> what should be the default shutdown mode.  A database that can't
>> accept new connections is a liability, not an asset.
>
> Killing active sessions when it's not absolutely necessary is not an
> asset.

That's a highly arguable point and I certainly don't agree with it.  A
database with no postmaster and no background processes can't possibly
be expected to function in any sort of halfway reasonable way.  In
particular:

1. No checkpoints will occur, so the time required for recovery will
grow longer without bound.
2. All walsenders will exit, so no transactions will be replicated to standbys.
3. Transactions committed asynchronously won't be flushed to disk, and
are lost entirely unless enough other WAL activity occurs before the
last backend dies to force a WAL write.
4. Autovacuum won't run until the system is properly restarted, and to
make matters worse there's no statistics collector, so the information
that might trigger a later run will be lost also.
5. At some point, you'll run out of clean buffers, after which
performance will start to suck as backends have to do their own
writes.
6. At some probably later point, the fsync request queue will fill up,
after which performance will go into the toilet.  On 9.1devel, this
takes less than a minute of moderate activity on my MacOS X machine.

All in all, running for any significant period of time in this state
is likely a recipe for disaster, even if for some inexplicable reason
you don't care about the fact that the system won't accept any new
connections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread David Fetter

On Thu, Jan 13, 2011 at 09:18:06PM +0100, Florian Pflug wrote:
> On Jan13, 2011, at 21:01 , Aidan Van Dyk wrote:
> > On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas  wrote:
> >> I'm not convinced.  I was thinking that we could simply treat it
> >> like SIGQUIT, if it's available.  I doubt there's a real use case
> >> for continuing to run queries after the postmaster and all the
> >> background processes are dead.  Expedited death seems like much
> >> better behavior.  Even checking PostmasterIsAlive() once per
> >> query would be reasonable, except that it'd add a system call to
> >> check for a condition that almost never holds, which I'm not
> >> eager to do.
> > 
> > If postmaster has a few fds to spare, what about having it open a
> > pipe to every child it spawns.  It never has to read/write to it,
> > but postmaster closing will signal the client's fd.  The client
> > just has to pop the fd into whatever nrmal poll/select event
> > handlign it uses to notice when the "parent's pipe" is closed.
> 
> I just started to experiment with that idea, and wrote a small test
> program to check if that'd work. I'll post the results when I'm
> done.

Great! :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Josh Berkus

On 1/13/11 12:11 PM, Robert Haas wrote:
> That's going to depend on the situation.  If the database fits in
> memory, then it's just going to work.  If it fits on disk, it's less
> obvious whether it'll be good or bad, but an arbitrary limitation here
> doesn't serve us well.

FWIW, if we had this feature right now in 9.0 we (PGX) would be using
it.  We run into the case of DB in memory, multiple slaves fairly often
these days.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Aidan Van Dyk  writes:
> If postmaster has a few fds to spare, what about having it open a pipe
> to every child it spawns.  It never has to read/write to it, but
> postmaster closing will signal the client's fd.  The client just has
> to pop the fd into whatever nrmal poll/select event handlign it uses
> to notice when the "parent's pipe" is closed.

Hmm.  Or more generally: there's one FIFO.  The postmaster holds both
sides open.  Backends hold the write side open.  (They can close the
read side, but that would just be to free up a FD.)  Background children
close the write side.  Now a background process can use EOF on the read
side of the FIFO to tell it that postmaster and all backends have
exited.  You still don't get a signal, but at least the condition you're
testing for is the one we actually want and not an approximation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Magnus Hagander

On Thu, Jan 13, 2011 at 21:37, Tom Lane  wrote:
> Robert Haas  writes:
>> I strongly believe you're in the minority on that one, for the same
>> reasons that I don't think most people would agree with your notion of
>> what should be the default shutdown mode.  A database that can't
>> accept new connections is a liability, not an asset.
>
> Killing active sessions when it's not absolutely necessary is not an
> asset.

It certainly can be. Consider any connection pooling scenario, which
would represent the vast majority of larger deployments today - if you
don't kill the sessions, they will never go away.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> I strongly believe you're in the minority on that one, for the same
> reasons that I don't think most people would agree with your notion of
> what should be the default shutdown mode.  A database that can't
> accept new connections is a liability, not an asset.

Killing active sessions when it's not absolutely necessary is not an
asset.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Kevin Grittner

Robert Haas  wrote:

> A database that can't accept new connections is a liability, not
> an asset.

+1

I have so far been unable to imagine a use case for the production
databases I use where I would prefer to see backends continue after
postmaster failure.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Florian Pflug

On Jan13, 2011, at 21:01 , Aidan Van Dyk wrote:
> On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas  wrote:
>> I'm not convinced.  I was thinking that we could simply treat it like
>> SIGQUIT, if it's available.  I doubt there's a real use case for
>> continuing to run queries after the postmaster and all the background
>> processes are dead.  Expedited death seems like much better behavior.
>> Even checking PostmasterIsAlive() once per query would be reasonable,
>> except that it'd add a system call to check for a condition that
>> almost never holds, which I'm not eager to do.
> 
> If postmaster has a few fds to spare, what about having it open a pipe
> to every child it spawns.  It never has to read/write to it, but
> postmaster closing will signal the client's fd.  The client just has
> to pop the fd into whatever nrmal poll/select event handlign it uses
> to notice when the "parent's pipe" is closed.

I just started to experiment with that idea, and wrote a small test
program to check if that'd work. I'll post the results when I'm done.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 2:19 PM, Ross J. Reedstrom  wrote:
> On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote:
>>
>> > It makes it very convenient to set up standbys, without having to worry
>> > that you'll conflict e.g with a nightly backup. I don't imagine people
>> > will use streaming base backups for very large databases anyway.
>>
>> Also, imagine that you're provisioning a 10-node replication cluster on
>> EC2.  This would make that worlds easier.
>
> Hmm, perhaps. My concern is that a naive attempt to do that is going to
> have 10 base-backups happening at the same time, completely slamming the
> master, and none of them completing is a reasonable time. Is this
> possible, or is it that simultaneity will buy you hot caches and backup
> #2 -> #10 all run faster?

That's going to depend on the situation.  If the database fits in
memory, then it's just going to work.  If it fits on disk, it's less
obvious whether it'll be good or bad, but an arbitrary limitation here
doesn't serve us well.

P.S. Your reply-to header is busted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 3:01 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane  wrote:
>>> I wonder whether we could have some sort of latch-like counter that
>>> would count the number of active backends and deliver signals when the
>>> count went to zero.  However, if the goal is to defend against random
>>> applications of SIGKILL, there's probably no way to make this reliable
>>> in userspace.
>
>> I don't think you can get there 100%.  We could, however, make a rule
>> that when a background process fails a PostmasterIsAlive() check, it
>> sends SIGQUIT to everyone it can find in the ProcArray, which would at
>> least ensure a timely exit in most real-world cases.
>
> You're going in the wrong direction there: we're trying to have the
> system remain sane when the postmaster crashes, not see how quickly
> it can screw up every remaining session.

I strongly believe you're in the minority on that one, for the same
reasons that I don't think most people would agree with your notion of
what should be the default shutdown mode.  A database that can't
accept new connections is a liability, not an asset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Aidan Van Dyk

On Thu, Jan 13, 2011 at 2:53 PM, Robert Haas  wrote:
> I'm not convinced.  I was thinking that we could simply treat it like
> SIGQUIT, if it's available.  I doubt there's a real use case for
> continuing to run queries after the postmaster and all the background
> processes are dead.  Expedited death seems like much better behavior.
> Even checking PostmasterIsAlive() once per query would be reasonable,
> except that it'd add a system call to check for a condition that
> almost never holds, which I'm not eager to do.

If postmaster has a few fds to spare, what about having it open a pipe
to every child it spawns.  It never has to read/write to it, but
postmaster closing will signal the client's fd.  The client just has
to pop the fd into whatever nrmal poll/select event handlign it uses
to notice when the "parent's pipe" is closed.

A FIFO would allow postmaster to not need as many file handles, and
clients reading the fifo would notice when the writer (postmaster)
closes it.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane  wrote:
>> I wonder whether we could have some sort of latch-like counter that
>> would count the number of active backends and deliver signals when the
>> count went to zero.  However, if the goal is to defend against random
>> applications of SIGKILL, there's probably no way to make this reliable
>> in userspace.

> I don't think you can get there 100%.  We could, however, make a rule
> that when a background process fails a PostmasterIsAlive() check, it
> sends SIGQUIT to everyone it can find in the ProcArray, which would at
> least ensure a timely exit in most real-world cases.

You're going in the wrong direction there: we're trying to have the
system remain sane when the postmaster crashes, not see how quickly
it can screw up every remaining session.

BTW, in Unix-land we could maybe rely on SysV semaphores' SEM_UNDO
feature to keep a trustworthy count of how many live processes there
are.  But I don't know whether there's anything comparable for Windows.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Ross J. Reedstrom

On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote:
> 
> > It makes it very convenient to set up standbys, without having to worry
> > that you'll conflict e.g with a nightly backup. I don't imagine people
> > will use streaming base backups for very large databases anyway.
> 
> Also, imagine that you're provisioning a 10-node replication cluster on
> EC2.  This would make that worlds easier.

Hmm, perhaps. My concern is that a naive attempt to do that is going to
have 10 base-backups happening at the same time, completely slamming the
master, and none of them completing is a reasonable time. Is this
possible, or is it that simultaneity will buy you hot caches and backup
#2 -> #10 all run faster?

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer & Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 2:45 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane  wrote:
>>> Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use.
>>> It sucks because you don't get a signal on parent death.  With the
>>> arrival of the latch code, having to check for PostmasterIsAlive
>>> frequently is the only reason for an idle background process to consume
>>> CPU at all.
>
>> What we really need is SIGPARENT.  I wonder if the Linux folks would
>> consider adding such a thing.  Might be useful to others as well.
>
> That's pretty much a dead-end idea unfortunately; it would never be
> portable enough to let us change our system structure to rely on it.
> Even more to the point, "go away when the postmaster does" isn't
> really the behavior we want anyway.  "Go away when the last backend
> does" is what we want.

I'm not convinced.  I was thinking that we could simply treat it like
SIGQUIT, if it's available.  I doubt there's a real use case for
continuing to run queries after the postmaster and all the background
processes are dead.  Expedited death seems like much better behavior.
Even checking PostmasterIsAlive() once per query would be reasonable,
except that it'd add a system call to check for a condition that
almost never holds, which I'm not eager to do.

> I wonder whether we could have some sort of latch-like counter that
> would count the number of active backends and deliver signals when the
> count went to zero.  However, if the goal is to defend against random
> applications of SIGKILL, there's probably no way to make this reliable
> in userspace.

I don't think you can get there 100%.  We could, however, make a rule
that when a background process fails a PostmasterIsAlive() check, it
sends SIGQUIT to everyone it can find in the ProcArray, which would at
least ensure a timely exit in most real-world cases.

> Another idea is to have a "postmaster minder" process that respawns the
> postmaster when it's killed.  The hard part of that is that the minder
> can't be connected to shared memory (else its OOM cross-section is just
> as big as the postmaster's), and that makes it difficult for it to tell
> when all the children have gone away.  I suppose it could be coded to
> just retry every few seconds until success.  This doesn't improve the
> behavior of background processes at all, though.

It hardly seems worth it.  Given a reliable interlock against multiple
postmasters, the real concern is making sure that a half-dead
postmaster gets itself all-dead quickly so that the DBA can start up a
new one before he gets fired.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane  wrote:
>> Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use.
>> It sucks because you don't get a signal on parent death.  With the
>> arrival of the latch code, having to check for PostmasterIsAlive
>> frequently is the only reason for an idle background process to consume
>> CPU at all.

> What we really need is SIGPARENT.  I wonder if the Linux folks would
> consider adding such a thing.  Might be useful to others as well.

That's pretty much a dead-end idea unfortunately; it would never be
portable enough to let us change our system structure to rely on it.
Even more to the point, "go away when the postmaster does" isn't
really the behavior we want anyway.  "Go away when the last backend
does" is what we want.

I wonder whether we could have some sort of latch-like counter that
would count the number of active backends and deliver signals when the
count went to zero.  However, if the goal is to defend against random
applications of SIGKILL, there's probably no way to make this reliable
in userspace.

Another idea is to have a "postmaster minder" process that respawns the
postmaster when it's killed.  The hard part of that is that the minder
can't be connected to shared memory (else its OOM cross-section is just
as big as the postmaster's), and that makes it difficult for it to tell
when all the children have gone away.  I suppose it could be coded to
just retry every few seconds until success.  This doesn't improve the
behavior of background processes at all, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible bug in pg_settings/pg_depend

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 2:04 PM, Joel Jacobson  wrote:
> 2011/1/13 Tom Lane :
>> Yes, probably.  It's certainly possible to have the same linkage occur
>> with different deptypes.  We don't try hard to avoid dups because they
>> don't matter.
>
> "with different deptypes", yes, but in this case there were two
> linkages of the same deptype.
>
> Just seems a bit strange I only found one such in the entire database,
> smells like some kind of bug, but might not be, I dunno, just thought
> it was worth investigating a bit, but if you're sure about it I of
> course trust you.

Instead of trusting him, you could investigate why it happens.  A
quick test shows this eliminates both dependencies:

drop rule pg_settings_u on pg_settings;

It appears that both of the dependencies in question are from that
rule and pointing to pg_settings.name, and it looks like that rule
mentions the name column of pg_settings twice.  With a little further
experimentation you can probably tease out whether each of the two
mentions produced a separate dependency...  my guess is "yes".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 2:16 PM, Tom Lane  wrote:
> Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use.
> It sucks because you don't get a signal on parent death.  With the
> arrival of the latch code, having to check for PostmasterIsAlive
> frequently is the only reason for an idle background process to consume
> CPU at all.

What we really need is SIGPARENT.  I wonder if the Linux folks would
consider adding such a thing.  Might be useful to others as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

David Fetter  writes:
> I get that we can't prevent all pilot error, but I was hoping we could
> bullet-proof this a little more, especially in light of a certain
> extremely popular server OS's OOM killer's default behavior.

> Yes, I get that that behavior is crazy, and stupid, and that people
> should shut it off, but it *is* our problem if we let the postmaster
> start (or continue) when it's set that way.

Packagers who are paying attention have fixed that ;-)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

Florian Pflug  writes:
> Couldn't normal backends call PostmasterIsAlive and exit if not, just
> like the startup process, the stats collector, autovacuum, bgwriter,
> walwriter, walreceiver, walsender and the wal archiver already do?

> I assumed they do, but now that I grepped the code it seems they don't.

That's intentional: they keep going until the user closes the session or
someone sends them a signal to do otherwise.  The other various
background processes have to watch PostmasterIsAlive because there is no
session to close.

Frankly I'd prefer to get rid of PostmasterIsAlive, not extend its use.
It sucks because you don't get a signal on parent death.  With the
arrival of the latch code, having to check for PostmasterIsAlive
frequently is the only reason for an idle background process to consume
CPU at all.

Another problem with the scheme is that it only works as long as the
background process is providing a *non critical* service.  Eventually we
are probably going to need some way for bgwriter/walwriter to stay alive
long enough to service orphaned backends, rather than disappearing
instantly if the postmaster goes away.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread David Fetter

On Thu, Jan 13, 2011 at 12:45:07PM -0600, Kevin Grittner wrote:
> Tom Lane  wrote:
>  
> > I can't see automating it though.  We already have a perfectly
> > good solution to the automated shutdown problem.
>  
> Oh, I totally agree with that.  I somehow thought we'd gotten off
> into how someone could recover after shooting their foot.

I get that we can't prevent all pilot error, but I was hoping we could
bullet-proof this a little more, especially in light of a certain
extremely popular server OS's OOM killer's default behavior.

Yes, I get that that behavior is crazy, and stupid, and that people
should shut it off, but it *is* our problem if we let the postmaster
start (or continue) when it's set that way.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-13 Thread Magnus Hagander

On Wed, Jan 12, 2011 at 10:39, Fujii Masao  wrote:
> On Mon, Jan 10, 2011 at 11:09 PM, Magnus Hagander  wrote:
>> I've committed the backend side of this, without that. Still working
>> on the client, and on cleaning up Heikki's patch for grammar/parser
>> support.
>
> Great work!
>
> I have some comments:
>
> While walsender is sending a base backup, WalSndWakeup should
> not send the signal to that walsender?

True, it's not necessary. How bad does it actually hurt things though?
Given that the walsender running the backup isn't actually waiting on
the latch, it doesn't actually send a signal, does it?

> In sendFile or elsewhere, we should periodically check whether
> postmaster is alive and whether the flag was set by the signal?

That, however, we probably should.

> At the end of the backup by walsender, it forces a switch to a new
> WAL file and waits until the last WAL file has been archived. So we
> should change postmaster so that it doesn't cause the archiver to
> end before walsender ends when shutdown is requested?

Um. I have to admit I'm not entirely following what you mean enough to
confirm it, but it *sounds* correct :-)

What scenario exactly is the problematic one?

> Also, when shutdown is requested, the walsender which is
> streaming WAL should not end before another walsender which
> is sending a backup ends, to stream the backup-end WAL?

Not sure I see the reason for that. If we're shutting down in the
middle of the base backup, we don't have any support for continuing
that one after we're back up - you have to start over.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-13 Thread Joel Jacobson

2011/1/13 David Fetter :
> Please put a self-contained example on the snippets page, and please
> also to check that it actually runs before doing so.  You'd mangled
> some aliases in the query you sent, which leads me to believe you
> hadn't actually tried running it.

I actually hadn't really solved the problem at the time I wrote my last email,
it turned out I had to do things a bit differently to avoid running
into problems with corner cases.

I will put together a self-contained example like you suggested and
get back shortly :-)

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Florian Pflug

On Jan13, 2011, at 19:00 , Tom Lane wrote:
> At least on Unix I don't believe there is any other solution.  You
> could try looking at ps output but there's a fundamental race condition,
> ie the postmaster could spawn another child just before you kill it,
> whereupon the child is reassigned to init and there's no longer a good
> way to tell that it came from that postmaster.

Maybe I'm totally confused, but ...

Couldn't normal backends call PostmasterIsAlive and exit if not, just
like the startup process, the stats collector, autovacuum, bgwriter,
walwriter, walreceiver, walsender and the wal archiver already do?

I assumed they do, but now that I grepped the code it seems they don't.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible bug in pg_settings/pg_depend

2011-01-13 Thread Joel Jacobson

2011/1/13 Tom Lane :
> Yes, probably.  It's certainly possible to have the same linkage occur
> with different deptypes.  We don't try hard to avoid dups because they
> don't matter.

"with different deptypes", yes, but in this case there were two
linkages of the same deptype.

Just seems a bit strange I only found one such in the entire database,
smells like some kind of bug, but might not be, I dunno, just thought
it was worth investigating a bit, but if you're sure about it I of
course trust you.

-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Kevin Grittner

Tom Lane  wrote:

> I can't see automating it though.  We already have a perfectly
> good solution to the automated shutdown problem.

Oh, I totally agree with that.  I somehow thought we'd gotten off
into how someone could recover after shooting their foot.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> At least on Unix I don't believe there is any other solution.  You
>> could try looking at ps output but there's a fundamental race
>> condition, ie the postmaster could spawn another child just before
>> you kill it, whereupon the child is reassigned to init and there's
>> no longer a good way to tell that it came from that postmaster.
 
> Couldn't you run `ps auxf` and kill any postgres process which is
> not functioning as postmaster (those are pretty easy to distinguish)
> and which isn't the child of such a process?  Is there ever a reason
> to allow such an orphan to run?

That's not terribly hard to do by hand, especially since the cautious
DBA could also do things like checking a process' CWD to verify which
postmaster it had belonged to.  I can't see automating it though.
We already have a perfectly good solution to the automated shutdown
problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object, patch v2

2011-01-13 Thread Tom Lane

Robert Haas  writes:
> On Wed, Jan 12, 2011 at 7:47 PM, Tom Lane  wrote:
>> IMO, what this patch needs is to not output the types unless they are
>> actually different from the default (which can be inferred from the AM
>> type and the function arguments). That would fix my concern about it
>> emitting information that is 99.44% useless.

> I guess we could do that, but I don't understand how you're supposed
> to infer them, which means probably a lot of other people won't
> either.

Read the CREATE OPERATOR CLASS source code.  (It likely would be best to
refactor that a bit so it would expose some way to obtain the implied
defaults --- I don't think that's done explicitly now, and it's
certainly not exported from opclasscmds.c.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible bug in pg_settings/pg_depend

2011-01-13 Thread Tom Lane

Joel Jacobson  writes:
> Are multiple identical entires in pg_depend possible?

Yes, probably.  It's certainly possible to have the same linkage occur
with different deptypes.  We don't try hard to avoid dups because they
don't matter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Kevin Grittner

Tom Lane  wrote:

> At least on Unix I don't believe there is any other solution.  You
> could try looking at ps output but there's a fundamental race
> condition, ie the postmaster could spawn another child just before
> you kill it, whereupon the child is reassigned to init and there's
> no longer a good way to tell that it came from that postmaster.

Couldn't you run `ps auxf` and kill any postgres process which is
not functioning as postmaster (those are pretty easy to distinguish)
and which isn't the child of such a process?  Is there ever a reason
to allow such an orphan to run?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

David Fetter  writes:
> On Thu, Jan 13, 2011 at 10:41:28AM -0500, Tom Lane wrote:
>> It's just that you're then looking at having to manually clean up the
>> child processes and then restart the postmaster; a process that is not
>> only tedious but does offer the possibility of screwing yourself.

> Does this mean that there's no cross-platform way to ensure that
> killing a process results in its children's timely (i.e. before damage
> can occur) death?  That such a way isn't practical from a performance
> point of view?

The simple, easy, cross-platform solution is this: don't kill -9 the
postmaster.  Send it one of the provisioned shutdown signals and let it
kill its children for you.

At least on Unix I don't believe there is any other solution.  You
could try looking at ps output but there's a fundamental race condition,
ie the postmaster could spawn another child just before you kill it,
whereupon the child is reassigned to init and there's no longer a good
way to tell that it came from that postmaster.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-13 Thread Magnus Hagander

On Thu, Jan 13, 2011 at 18:43, Robert Haas  wrote:
> On Thu, Jan 13, 2011 at 11:08 AM, Magnus Hagander  wrote:
>> On Wed, Jan 12, 2011 at 03:03, Robert Haas  wrote:
>>> On Tue, Jan 11, 2011 at 7:24 AM, Magnus Hagander  
>>> wrote:
> No, do this at top
>
> if (walsnd->state == state)
>  return;
>
> Keep spinlocks when actually setting it.
>>>
>>> I think this is safe...
>>>
 Aha. Thanks for the pointers, pfa a new version.
>>>
>>> ...but I think you also need to take the spinlock when reading the value.
>>
>> Even when it can only ever be set by one process (the owning
>> walsender), and the variable is atomic (as it should be, since it's a
>> single enum/int)?
>
> The fact that it can only be modified by one process makes it safe for
> *that process* to read it without taking the lock, but another process
> that wants to read it still needs the lock, I believe - otherwise you
> might get a slightly stale value.  That's probably not a *huge* deal
> in this case, but I think it'd be better to get it right because
> people tend to copy these sorts of things elsewhere, and it'd be bad
> if it got copied into some place more critical.

ok, thanks for the pointers - fix applied.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-13 Thread David Fetter

On Wed, Jan 12, 2011 at 09:09:31PM +0100, Joel Jacobson wrote:
> (sorry for top posting,

No worries.

> iPhone + drunk)

A dangerous combination indeed.  I hear water, NSAIDs and time can
help with the hangover ;)

> pg_depend_before is a select * from pg_depend before creating the
> test db model

Please put a self-contained example on the snippets page, and please
also to check that it actually runs before doing so.  You'd mangled
some aliases in the query you sent, which leads me to believe you
hadn't actually tried running it.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 11:08 AM, Magnus Hagander  wrote:
> On Wed, Jan 12, 2011 at 03:03, Robert Haas  wrote:
>> On Tue, Jan 11, 2011 at 7:24 AM, Magnus Hagander  wrote:
 No, do this at top

 if (walsnd->state == state)
  return;

 Keep spinlocks when actually setting it.
>>
>> I think this is safe...
>>
>>> Aha. Thanks for the pointers, pfa a new version.
>>
>> ...but I think you also need to take the spinlock when reading the value.
>
> Even when it can only ever be set by one process (the owning
> walsender), and the variable is atomic (as it should be, since it's a
> single enum/int)?

The fact that it can only be modified by one process makes it safe for
*that process* to read it without taking the lock, but another process
that wants to read it still needs the lock, I believe - otherwise you
might get a slightly stale value.  That's probably not a *huge* deal
in this case, but I think it'd be better to get it right because
people tend to copy these sorts of things elsewhere, and it'd be bad
if it got copied into some place more critical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] C++ keywords in headers (was Re: [GENERAL] #include )

2011-01-13 Thread Bruce Momjian

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Excerpts from Tom Lane's message of lun dic 27 13:54:56 -0300 2010:
> >> [ lightbulb ] ... although we could improve that quite a bit if we
> >> processed each .h file separately instead of insisting on smashing
> >> everything into one compilation.  Let me go try that.
> 
> > FWIW I have this patch lingering about that I wrote months ago, to check
> > for header problems (not C++ stuff, just things like forgetting to
> > include some necessary header in some other header).  Since it needs a
> > lot of polish (needs to ignore certain headers, and avoid leave
> > lingering files around), I didn't commit it; and I haven't updated it to
> > the new Make recursive stuff, either.
> 
> src/tools/pginclude/ already contains several scripts for this sort of
> thing.  Bruce runs them by hand occasionally, although I just found out
> that he's evidently not run the does-each-header-compile-standalone
> test in awhile.  It would probably pay to automate these.

It is true I have not run those tests in a while.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread David Fetter

On Thu, Jan 13, 2011 at 10:41:28AM -0500, Tom Lane wrote:
> David Fetter  writes:
> > I've noticed over the years that we give people dire warnings never to
> > send a KILL signal to the postmaster, but I'm unsure as to what are
> > potential consequences of this, as in just exactly how this can result
> > in problems.  Is there some reference I can look to for explanations
> > of the mechanism(s) whereby the damage occurs?
> 
> There's no risk of data corruption, if that's what you're thinking of.
> It's just that you're then looking at having to manually clean up the
> child processes and then restart the postmaster; a process that is not
> only tedious but does offer the possibility of screwing yourself.

Does this mean that there's no cross-platform way to ensure that
killing a process results in its children's timely (i.e. before damage
can occur) death?  That such a way isn't practical from a performance
point of view?

> In particular the risk is that someone clueless enough to do this would
> next decide that removing $PGDATA/postmaster.pid, rather than killing
> all the existing children, is the quickest way to get the postmaster
> restarted.  Once he's done that, his data will shortly be hosed beyond
> recovery, because now he has two noncommunicating sets of backends
> massaging the same files via separate sets of shared buffers.

Right.

> The reason this sequence of events doesn't seem improbable is that the
> error you get when you try to start a new postmaster, if there are still
> old backends running, is
> 
> FATAL:  pre-existing shared memory block (key 5490001, ID 15609) is still in 
> use
> HINT:  If you're sure there are no old server processes still running, remove 
> the shared memory block or just delete the file "postmaster.pid".
> 
> Maybe we should rewrite that HINT --- while it's *possible* that
> removing the shmem block or deleting postmaster.pid is the right thing
> to do, it's not exactly *likely*.  I think we need to put a bit more
> emphasis on the "If ..." part.  Like "If you are prepared to swear on
> your mother's grave that there are no old server processes still
> running, consider removing postmaster.pid.  But first check for existing
> processes again."

Maybe the hint could give an OS-tailored way to check this...

> (BTW, I notice that this interlock against starting a new postmaster
> appears to be broken in HEAD, which is likely not unrelated to the
> fact that the contents of postmaster.pid seem to be totally bollixed
> :-()

D'oh!  Well, I hope knowing it's a problem gives some kind of glimmer
as to how to solve it :)

Is this worth writing tests for?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq documentation cleanups (repost 3)

2011-01-13 Thread Bruce Momjian

Robert Haas wrote:
> On Wed, Jan 12, 2011 at 8:54 PM, Bruce Momjian  wrote:
> > I am also attaching a few more of Leslie's changes that I think are
> > useful. ?The first clarifies a confusion Leslie had about the fact that
> > "return" is referencing the return value of the function and not the
> > value returned in the pointer.
> 
> Hmm.  Well, if that's the confusion, I don't think inserting the words
> "by the function" is the right way to fix it - it certainly isn't
> returned by anything else.  You could change it to say "It is also
> possible for *errmsg to be NULL even when the return value is also
> NULL; this indicates..."
> 
> > The second change is, I think, better wording.
> 
> OK.
> 
> > The third moves the "deprecated" text to the start of the function
> > description. ?Leslie pointed out that that is how we do it for other
> > libpq functions, so we should move it for consistency.
> 
> That seems to me to read pretty awkwardly.  You could perhaps strike
> the chunk and the whole first paragraph and simply write "PQoidStatus
> is an older, deprecated version of PQoidValue.  It returns its result
> as a character string rather than an Oid, and is not thread-safe." and
> then cut directly to the synopsis.  That would be consistent with what
> we've done elsewhere; moving just that one sentence is not.

OK, I have made the adjustments you mentioned with my own wording
(attached and applied).  Let me know of any more needed adjustments.
Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 58e593d..fe661b8 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -972,8 +972,8 @@ PQconninfoOption *PQconninfoParse(const char *conninfo, char **errmsg);
If errmsg is not NULL, then *errmsg is set
to NULL on success, else to a malloc'd error string explaining
the problem.  (It is also possible for *errmsg to be
-   set to NULL even when NULL is returned; this indicates an out-of-memory
-   situation.)
+   set to NULL and the function to return NULL;
+   this indicates an out-of-memory condition.)
   
 
   
@@ -1352,7 +1352,7 @@ ConnStatusType PQstatus(const PGconn *conn);
   
See the entry for PQconnectStartParams, PQconnectStart
and PQconnectPoll with regards to other status codes that
-   might be seen.
+   might be returned.
   
  
 
@@ -3163,23 +3163,15 @@ Oid PQoidValue(const PGresult *res);
 
  
   
-   Returns a string with the OID of the inserted row, if the
-   SQL command was an INSERT
-   that inserted exactly one row, or a EXECUTE of
-   a prepared statement consisting of a suitable
-   INSERT.  (The string will be 0 if
-   the INSERT did not insert exactly one row, or
-   if the target table does not have OIDs.)  If the command was not
-   an INSERT, returns an empty string.
+   This function is deprecated in favor of
+   PQoidValue and is not thread-safe.
+   It returns a string with the OID of the inserted row, while
+   PQoidValue returns the OID value.
 
 char *PQoidStatus(const PGresult *res);
 
   
 
-  
-   This function is deprecated in favor of
-   PQoidValue.  It is not thread-safe.
-  
  
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-13 Thread Alex Hunsaker

On Thu, Jan 13, 2011 at 01:06, Martijn van Oosterhout  wrote:
> On Thu, Jan 13, 2011 at 12:06:33AM -0700, Alex Hunsaker wrote:
>> > I had supposed that it would be possible to do the string conversion
>> > lazily, ie, only if the string value was actually demanded.
>>
>> Yep, In-fact if we wanted we could even die (or throw an exception in
>> other language speak :) ) when the string value is demanded.
>
> I played with this a little and it is fairly easy to make a variable
> such that $a is the string representation and $a[0] the first value of
> the array. The problem is that you can't pass such a variable into a
> subroutine.

[ snip ]
> my @a=(1,2);
>
> tie $a, "MyClass", \@a;
>
> print "\$a='$a'\n";
> print "\$a[0]='$a[0]'\n";


Erm... the reason you can't seem to pass it to any subroutines is its
actually 2 variables: $a, @a.
When you print "$a\n"; you are using the tied variable that uses @a;
And when you print "$a[0]\n"; you are accessing the array directly.
I think you just had an unfortunate variable name, otherwise strict
would have complained appropriately. :)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql crashes on encoding mismatch

2011-01-13 Thread Hitoshi Harada

2011/1/13 Tom Lane :
> Hitoshi Harada  writes:
>> I found a crash case (assertion failure) when runing psql -f
>> utf8_encoded_script.sql against client_encoding = shift_jis in
>> postgresql.conf. Though encoding mismatch is obviously user's fault, a
>> crash doesn't explain anything to him.
>
> I'm not too impressed with this patch: it seems like the most it will
> accomplish is to move the failure to some other, equally obscure, place
> --- because you'll still have a string that's invalidly encoded.
> Moreover, if you've got wrongly encoded data, it wouldn't be hard at all
> for it to mess up psql's lexing; consider cases such as a
> character-that's-not-as-long-as-we-think just in front of a quote mark.
>
> Shouldn't we instead try to verify the multibyte encoding somewhere
> upstream of here?

I had thought it before going into the patch, too. However, the fact
that psql(fe-misc.c) doesn't have PQverfiymb() although it has
PQmblen() implied to me that encoding verification should be done in
server side perhaps. I might be too ignorant to imagine the lexing
problem of your quote mark, but my crash sample has multibyte
characters in sql comment, which is ignored in the server parsing. If
we decided that the case raises error, wouldn't some existing
applications be broken? I can imagine they are in the same situation
of encoding mismatch and are run without problem I found by chance.

Just for reference I attach the case sql file. To reproduce it:

1. initdb
2. edit client_encoding = shift_jis in postgresql.conf
3. start postgres
4. psql -f case_utf8.sql

Note: the line break should be LF as the file stands. CR-LF cannot
reproduce the problem.

Regards,

-- 
Hitoshi Harada

case_utf8.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walreceiver fsyncs excessively

2011-01-13 Thread Heikki Linnakangas


On 13.01.2011 14:34, Fujii Masao wrote:

On Thu, Jan 13, 2011 at 9:01 PM, Heikki Linnakangas
  wrote:

Attached patch fixes libpqrcv_receive() so that it calls PQconsumeInput()
before concluding that there's no data available. The excessive fsyncing can
lead to very bad performance, so this needs to be appled to 9.0 too.


Seems good.

Can we remove the "justconnected" flag, thanks to the patch?


Yes, good point. Committed with "justconnected" removed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-13 Thread Kevin Grittner

Heikki Linnakangas  wrote:
> On 13.01.2011 16:51, Kevin Grittner wrote:
 
>> But we acquired a relation lock up front, when we determined that
>> this would be a heap scan, so we could short-circuit this whole
>> thing if within the heapgettup_pagemode function we could
>> determine that this was a scan of the whole relation.
> 
> That sounds simple enough. Add a boolean field to HeapScanDesc, 
> "rs_relpredicatelocked", and set it when you acquire the relation
> lock.
 
Heikki, I can't thank you enough.  The fix is here:
 
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=64ca508a0e2fa9c21dc76a5d6a5f549c27f511fa
 
The timings are now:
 
begin transaction isolation level repeatable read;
Time: 324.938 ms
Time: 228.045 ms
Time: 227.963 ms

begin transaction isolation level serializable;
Time: 311.954 ms
Time: 311.928 ms
Time: 311.848 ms

begin transaction isolation level serializable, read only;
Time: 227.471 ms
Time: 228.137 ms
Time: 227.778 ms

begin transaction isolation level serializable, read only,
deferrable;
Time: 227.899 ms
Time: 249.772 ms
Time: 228.026 ms

begin transaction isolation level repeatable read;
Time: 231.173 ms
Time: 245.041 ms
Time: 228.149 ms
 
I'm surprised the difference is still that high as a percentage, and
will investigate, but this seems survivable.  When I do the math,
the difference comes out to 83.885 nanoseconds per row.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-13 Thread Magnus Hagander

On Wed, Jan 12, 2011 at 03:03, Robert Haas  wrote:
> On Tue, Jan 11, 2011 at 7:24 AM, Magnus Hagander  wrote:
>>> No, do this at top
>>>
>>> if (walsnd->state == state)
>>>  return;
>>>
>>> Keep spinlocks when actually setting it.
>
> I think this is safe...
>
>> Aha. Thanks for the pointers, pfa a new version.
>
> ...but I think you also need to take the spinlock when reading the value.

Even when it can only ever be set by one process (the owning
walsender), and the variable is atomic (as it should be, since it's a
single enum/int)?

Anyway, it should be as simple as copying it out to a local variable
when it's already in the spinlock and then use that, right?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kill -KILL: What happens?

2011-01-13 Thread Tom Lane

David Fetter  writes:
> I've noticed over the years that we give people dire warnings never to
> send a KILL signal to the postmaster, but I'm unsure as to what are
> potential consequences of this, as in just exactly how this can result
> in problems.  Is there some reference I can look to for explanations
> of the mechanism(s) whereby the damage occurs?

There's no risk of data corruption, if that's what you're thinking of.
It's just that you're then looking at having to manually clean up the
child processes and then restart the postmaster; a process that is not
only tedious but does offer the possibility of screwing yourself.

In particular the risk is that someone clueless enough to do this would
next decide that removing $PGDATA/postmaster.pid, rather than killing
all the existing children, is the quickest way to get the postmaster
restarted.  Once he's done that, his data will shortly be hosed beyond
recovery, because now he has two noncommunicating sets of backends
massaging the same files via separate sets of shared buffers.

The reason this sequence of events doesn't seem improbable is that the
error you get when you try to start a new postmaster, if there are still
old backends running, is

FATAL:  pre-existing shared memory block (key 5490001, ID 15609) is still in use
HINT:  If you're sure there are no old server processes still running, remove 
the shared memory block or just delete the file "postmaster.pid".

Maybe we should rewrite that HINT --- while it's *possible* that
removing the shmem block or deleting postmaster.pid is the right thing
to do, it's not exactly *likely*.  I think we need to put a bit more
emphasis on the "If ..." part.  Like "If you are prepared to swear on
your mother's grave that there are no old server processes still
running, consider removing postmaster.pid.  But first check for existing
processes again."

(BTW, I notice that this interlock against starting a new postmaster
appears to be broken in HEAD, which is likely not unrelated to the fact
that the contents of postmaster.pid seem to be totally bollixed :-()

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-13 Thread Kevin Grittner

Heikki Linnakangas  wrote:
 
> That sounds simple enough. Add a boolean field to HeapScanDesc, 
> "rs_relpredicatelocked", and set it when you acquire the relation
> lock.
 
I'll take a look at doing that.  Thanks!
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-13 Thread Heikki Linnakangas


On 13.01.2011 16:51, Kevin Grittner wrote:

Right.  As each tuple is read we need to ensure that there is a
predicate lock to cover it.  Since finer-grained locks can be
combined into coarser-grained locks we need to start with the fine
grained and move toward checking the coarser grains, to avoid
missing a lock during promotion.  So for each tuple we calculate a
hash, find a partition, lock it, and lookup the tuple as a lock
target.  When that's not found we do the same thing for the page.
When that's not found we do the same thing for the relation.

But we acquired a relation lock up front, when we determined that
this would be a heap scan, so we could short-circuit this whole
thing if within the heapgettup_pagemode function we could determine
that this was a scan of the whole relation.


That sounds simple enough. Add a boolean field to HeapScanDesc, 
"rs_relpredicatelocked", and set it when you acquire the relation lock.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-13 Thread Kevin Grittner

Heikki Linnakangas  wrote:
 
> where exactly is the extra overhead coming from?
 
Keep in mind that this is a sort of worst case scenario.  The data
is fully cached in shared memory and we're doing a sequential pass
just counting the rows.  In an earlier benchmark (which I should
re-do after all this refactoring), random access queries against a
fully cached data set only increased run time by 1.8%.  Throw some
disk access into the mix, and the overhead is likely to get lost in
the noise.
 
But, as I said, count(*) seems to be the first thing many people try
as a benchmark, and this is a symptom of a more general issue, so
I'd like to find a good solution.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-13 Thread Kevin Grittner

Heikki Linnakangas  wrote:

> Pardon my ignorance, but where exactly is the extra overhead
> coming from? Searching for a predicate lock?

Right.  As each tuple is read we need to ensure that there is a
predicate lock to cover it.  Since finer-grained locks can be
combined into coarser-grained locks we need to start with the fine
grained and move toward checking the coarser grains, to avoid
missing a lock during promotion.  So for each tuple we calculate a
hash, find a partition, lock it, and lookup the tuple as a lock
target.  When that's not found we do the same thing for the page. 
When that's not found we do the same thing for the relation.

But we acquired a relation lock up front, when we determined that
this would be a heap scan, so we could short-circuit this whole
thing if within the heapgettup_pagemode function we could determine
that this was a scan of the whole relation.

The profiling also showed that it was spending an obscene amount of
time calculating hash values (over 10% of total run time!).  I'm
inclined to think that a less sophisticated algorithm (like just
adding oid, page, and tuple offset numbers) would generate very
*real* savings with the down side being a very hypothetical
*possible* cost to longer chains in the HTAB.  But that's a separate
issue, best settled on the basis of benchmarks rather than
theoretical discussions.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] kill -KILL: What happens?

2011-01-13 Thread David Fetter

Folks,

I've noticed over the years that we give people dire warnings never to
send a KILL signal to the postmaster, but I'm unsure as to what are
potential consequences of this, as in just exactly how this can result
in problems.  Is there some reference I can look to for explanations
of the mechanism(s) whereby the damage occurs?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add function dependencies

2011-01-13 Thread Joel Jacobson

Thanks to the new pg_stat_xact_user_functions and
pg_stat_xact_user_tables views in 9.1, it will be possible to
automatically "sample" which functions uses which functions/tables to
generate a nice directional graph of the dependency tree, based on
recent real-life activity, excluding any unused relations/functions
not-in-use anymore. It's actually a feature to not include these, as
they make the graph a lot more complicated.

If you want a graph on the activity during Mondays between 2:30pm and
2:31pm, such a graph could easily be generated, or if you want it for
30 days (which would probably include a lot more edges in the graph),
it can also be generated. :-)

It would be quite easy to automatically inject some small code
snippets to the top and bottom of each user function, to get the diff
of select * from pg_stat_xact_user_functions and
pg_stat_xact_user_tables between the entry point of each function and
the exit point.

It would be a lot nicer if it would be possible to automatically let
PostgreSQL sample such data for you, providing nice system views with
information on the sampled data per function, allowing you to query it
and ask,

 - What functions has funciton public.myfunc(int) called and what
tables has it inserted/selected/updated/deleted from since the last
time I resetted the
statistics?

Just an idea...

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Warning compiling pg_dump (MinGW, Windows XP)

2011-01-13 Thread Pavel Golub

Hello, Pgsql-hackers.

I'm getting such warnings:

pg_dump.c: In function 'dumpSequence':
pg_dump.c:11449:2: warning: unknown conversion type character 'l' in format
pg_dump.c:11449:2: warning: too many arguments for format
pg_dump.c:11450:2: warning: unknown conversion type character 'l' in format
pg_dump.c:11450:2: warning: too many arguments for format

Line numbers my not be the same in the official sources, because I've
made some changes. But the lines are:

snprintf(bufm, sizeof(bufm), INT64_FORMAT, SEQ_MINVALUE);
snprintf(bufx, sizeof(bufx), INT64_FORMAT, SEQ_MAXVALUE);

In my oppinion configure failed for MinGw+Windows in this case. Am I
right? Can someone give me a hint how to avoid this?

Thanks in advance

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walreceiver fsyncs excessively

2011-01-13 Thread Fujii Masao

On Thu, Jan 13, 2011 at 9:01 PM, Heikki Linnakangas
 wrote:
> While testing Fujii-san's patch to flush any already-written WAL on error in
> walreceiver, I added a debugging elog to XLogWalRcvFlush() to print out how
> far it has written and flushed.
>
> I saw an unexpected pattern while the standby catches up with the master:
>
> LOG:  streaming replication successfully connected to primary
> LOG:   flushing flush=0/0 write=0/1E02
> LOG:   flushing flush=0/1E02 write=0/1E04
> LOG:   flushing flush=0/1E04 write=0/1E06
> LOG:   flushing flush=0/1E06 write=0/1E08
> LOG:   flushing flush=0/1E08 write=0/1E0A
> LOG:   flushing flush=0/1E0A write=0/1E0C
> LOG:   flushing flush=0/1E0C write=0/1E0E
> LOG:   flushing flush=0/1E0E write=0/1E10
> LOG:   flushing flush=0/1E10 write=0/1E12
> LOG:   flushing flush=0/1E12 write=0/1E14
>
> The master sends the WAL at full-speed, but walreceiver always fsyncs it in
> 128 kB chunks. That's excessive, it should be able to read and write to disk
> the whole WAL segment before flushing.
>
> There's a little flaw in the walreceiver logic that tries to read all the
> available WAL before flushing and sleeping. The way libpqrcv_receive is
> written, when it's called with timeout==0 it will not call PQconsumeInput.
> So what happens is that when walreceiver main loop calls libpqrcv_receive()
> in a loop to fetch all WAL that's available without blocking, it actually
> only reads the WAL that's in the libpq receive buffer - it will not read the
> WAL that's in the TCP read buffer.
>
> Attached patch fixes libpqrcv_receive() so that it calls PQconsumeInput()
> before concluding that there's no data available. The excessive fsyncing can
> lead to very bad performance, so this needs to be appled to 9.0 too.

Seems good.

Can we remove the "justconnected" flag, thanks to the patch?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pg_ctl failover Re: [HACKERS] Latches, signals, and waiting

2011-01-13 Thread Fujii Masao

On Thu, Jan 13, 2011 at 7:00 PM, Heikki Linnakangas
 wrote:
> +1 for "promote". People unfamiliar with the replication stuff might not
> immediately understand that it's related to replication, but they wouldn't
> have any use for the option anyway. It should be clear to anyone who needs
> it.

I did s/failover/promote. Here is the updated patch.

>> - pg_ctl should unlink failover_files when it failed to send failover 
>> signals.

Done.

And, I changed some descriptions about trigger in high-availability.sgml
and recovery-config.sgml.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pg_ctl_failover_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Walreceiver fsyncs excessively

2011-01-13 Thread Heikki Linnakangas

While testing Fujii-san's patch to flush any already-written WAL on 
error in walreceiver, I added a debugging elog to XLogWalRcvFlush() to 
print out how far it has written and flushed.


I saw an unexpected pattern while the standby catches up with the master:

LOG:  streaming replication successfully connected to primary
LOG:   flushing flush=0/0 write=0/1E02
LOG:   flushing flush=0/1E02 write=0/1E04
LOG:   flushing flush=0/1E04 write=0/1E06
LOG:   flushing flush=0/1E06 write=0/1E08
LOG:   flushing flush=0/1E08 write=0/1E0A
LOG:   flushing flush=0/1E0A write=0/1E0C
LOG:   flushing flush=0/1E0C write=0/1E0E
LOG:   flushing flush=0/1E0E write=0/1E10
LOG:   flushing flush=0/1E10 write=0/1E12
LOG:   flushing flush=0/1E12 write=0/1E14

The master sends the WAL at full-speed, but walreceiver always fsyncs it 
in 128 kB chunks. That's excessive, it should be able to read and write 
to disk the whole WAL segment before flushing.


There's a little flaw in the walreceiver logic that tries to read all 
the available WAL before flushing and sleeping. The way libpqrcv_receive 
is written, when it's called with timeout==0 it will not call 
PQconsumeInput. So what happens is that when walreceiver main loop calls 
libpqrcv_receive() in a loop to fetch all WAL that's available without 
blocking, it actually only reads the WAL that's in the libpq receive 
buffer - it will not read the WAL that's in the TCP read buffer.


Attached patch fixes libpqrcv_receive() so that it calls 
PQconsumeInput() before concluding that there's no data available. The 
excessive fsyncing can lead to very bad performance, so this needs to be 
appled to 9.0 too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 5aac85d..9e8504b 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -351,28 +351,33 @@ libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len)
 		PQfreemem(recvBuf);
 	recvBuf = NULL;
 
-	/*
-	 * If the caller requested to block, wait for data to arrive. But if this
-	 * is the first call after connecting, don't wait, because there might
-	 * already be some data in libpq buffer that we haven't returned to
-	 * caller.
-	 */
-	if (timeout > 0 && !justconnected)
+	/* Try to receive a CopyData message */
+	rawlen = PQgetCopyData(streamConn, &recvBuf, 1);
+	if (rawlen == 0)
 	{
-		if (!libpq_select(timeout))
-			return false;
+		/*
+		 * No data available yet. If the caller requested to block, wait for
+		 * more data to arrive. But if this is the first call after connecting,
+		 * don't wait, because there might already be some data in libpq buffer
+		 * that we haven't returned to caller.
+		 */
+		if (timeout > 0 && !justconnected)
+		{
+			if (!libpq_select(timeout))
+return false;
+		}
+		justconnected = false;
 
 		if (PQconsumeInput(streamConn) == 0)
 			ereport(ERROR,
 	(errmsg("could not receive data from WAL stream: %s",
 			PQerrorMessage(streamConn;
-	}
-	justconnected = false;
 
-	/* Receive CopyData message */
-	rawlen = PQgetCopyData(streamConn, &recvBuf, 1);
-	if (rawlen == 0)			/* no data available yet, then return */
-		return false;
+		/* Now that we've consumed some input, try again */
+		rawlen = PQgetCopyData(streamConn, &recvBuf, 1);
+		if (rawlen == 0)
+			return false;
+	}
 	if (rawlen == -1)			/* end-of-streaming or error */
 	{
 		PGresult   *res;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing GIN for empty/null/full-scan cases

2011-01-13 Thread Dimitri Fontaine

Tom Lane  writes:
> "David E. Wheeler"  writes:
>> On Jan 12, 2011, at 4:35 PM, Tom Lane wrote:
>>> No, what we need is a decent extension package manager ;-)
>
>> Yeah. Maybe you can do that this weekend? Or, I dunno, while you sleep 
>> tonight?
>
> Supposedly it's in the queue for the upcoming CF :-)

Hehe, and some provision have been made to support upgrading from 9.0 to
9.1 too:

  http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html#AEN50748

But that won't solve the dump-from-9.0 and restore-into-9.1 by itself,
the only way for us to solve that problem that I can think of would be
to backpatch a new feature.  Do it the old-way until you upgrade from
9.1 to later might be our answer here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pg_ctl failover Re: [HACKERS] Latches, signals, and waiting

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 5:00 AM, Heikki Linnakangas
 wrote:
> On 13.01.2011 04:29, Itagaki Takahiro wrote:
>>
>> On Thu, Jan 13, 2011 at 00:14, Fujii Masao  wrote:

 pg_ctl failover ? At the moment, the location of the trigger file is
 configurable, but if we accept a constant location like
 "$PGDATA/failover"
 pg_ctl could do the whole thing, create the file and send signal. pg_ctl
 on
 Window already knows how to send the "signal" via the named pipe signal
 emulation.
>>>
>>> The attached patch implements the above-mentioned pg_ctl failover.
>>
>> I have three comments:
>> - Will we call it "failover"? We will use the command also in "switchover"
>>   operations. "pg_ctl promote" might be more neutral, but users might be
>>   hard to imagine replication feature from "promote".
>
> I agree that "failover" or even "switchover" is too specific. You might want
> promote a server even if you keep the old master still running, if you're
> creating a temporary copy of the master repository for testing purposes etc.
>
> +1 for "promote". People unfamiliar with the replication stuff might not
> immediately understand that it's related to replication, but they wouldn't
> have any use for the option anyway. It should be clear to anyone who needs
> it.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2011-01-13 Thread Christian Ullrich


* Joel Jacobson wrote:


The example from Tom Lane below results in a database which is not
possible to correctly dump using pg_dump.

The view v1 strangely becomes a table in the dump output?!


This is no bug, it's a feature (tm).

pg_dump is clever enough to detect the circular dependency and break it 
open by creating v1 in two steps.


A view in PostgreSQL is simply an empty table with an ON SELECT DO 
INSTEAD rule named "_RETURN" on it. pg_dump first creates the empty 
table, then view v2 depending on that table, and finally the _RETURN 
rule turning v1 into a view and reintroducing the circular dependency.


--
Christian

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_primary_conninfo

2011-01-13 Thread Robert Haas

On Wed, Jan 12, 2011 at 11:52 PM, Fujii Masao  wrote:
> So I'm thinking to make ProcessConfigFile() parse not only postgresql.conf
> but also recovery.conf rather than move all the recovery parameters to
> postgresql.conf.
>
> Comments?

+1.

Actually moving the settings can be done later in about 5 seconds if
we all agree it's a good idea, but let's not get bogged down in that
now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2011-01-13 Thread Marko Tiikkaja


On 2011-01-13 11:31 AM +0200, Joel Jacobson wrote:

The example from Tom Lane below results in a database which is not
possible to correctly dump using pg_dump.

The view v1 strangely becomes a table in the dump output?!



CREATE RULE "_RETURN" AS ON SELECT TO v1 DO INSTEAD SELECT v2.f1, v2.f2 FROM v2;


This statement turns the table into a view.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq documentation cleanups (repost 3)

2011-01-13 Thread Robert Haas

On Wed, Jan 12, 2011 at 8:54 PM, Bruce Momjian  wrote:
> I am also attaching a few more of Leslie's changes that I think are
> useful.  The first clarifies a confusion Leslie had about the fact that
> "return" is referencing the return value of the function and not the
> value returned in the pointer.

Hmm.  Well, if that's the confusion, I don't think inserting the words
"by the function" is the right way to fix it - it certainly isn't
returned by anything else.  You could change it to say "It is also
possible for *errmsg to be NULL even when the return value is also
NULL; this indicates..."

> The second change is, I think, better wording.

OK.

> The third moves the "deprecated" text to the start of the function
> description.  Leslie pointed out that that is how we do it for other
> libpq functions, so we should move it for consistency.

That seems to me to read pretty awkwardly.  You could perhaps strike
the chunk and the whole first paragraph and simply write "PQoidStatus
is an older, deprecated version of PQoidValue.  It returns its result
as a character string rather than an Oid, and is not thread-safe." and
then cut directly to the synopsis.  That would be consistent with what
we've done elsewhere; moving just that one sentence is not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object, patch v2

2011-01-13 Thread Robert Haas

On Wed, Jan 12, 2011 at 7:47 PM, Tom Lane  wrote:
> Andreas Karlsson  writes:
>> Here is a very simple change of the patch to make the output look more
>> like the syntax of ALTER OPERATOR FAMILY to improve consistency.
>
> IMO, what this patch needs is to not output the types unless they are
> actually different from the default (which can be inferred from the AM
> type and the function arguments).  That would fix my concern about it
> emitting information that is 99.44% useless.

I guess we could do that, but I don't understand how you're supposed
to infer them, which means probably a lot of other people won't
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 111 matches

Mail list logo