[PERFORM] Kill a session

2006-07-11 Thread Craig A. James

There have been dozens, perhaps hundreds, of entries in the pg-admin, 
pg-general, and pg-performance lists regarding killing a session, but as far as 
I can tell, there is no Postgres solution.  Did I miss something?

This raises the question: Why doesn't Postgres have a "kill session" command that works?  
Oracle has it, and it's invaluable; there is no substitute.  Various writers to these PG lists have 
raised the question repeatedly.  Is it just a matter that nobody has had the time to do it (which I 
respect!), or is there a reason why the Postgres team decided a "kill session" is a bad 
idea?

The rest of this email is just to illustrate the convoluted solution I've had 
to adopt, and even with this, I can't get it to work quite right.

Background: In our web app, we give our users a fair amount of power to formulate 
difficult queries.  These long-running queries are fork/exec'd from the Apache CGI, and 
we give the user a "job status" page, with the option to kill the job.

I can kill off the CGI, since Apache owns the process.  But the "stock answer" 
of

   kill -2 backend-pid

won't work, because I don't want my Apache jobs running as super-user (!) or as 
setuid processes.

So here's my solution:  Install a couple of C extensions like this:

   Datum get_session_id(PG_FUNCTION_ARGS)
   {
 PG_RETURN_INT32(getpid());
   }

   Datum kill_session(PG_FUNCTION_ARGS)
   {
 int4 session_id, status;
 session_id = PG_GETARG_INT32(0);
 fprintf(stderr, "KILLING SESSION: %d, 15\n", session_id);
 status = kill(session_id, 15);
 PG_RETURN_BOOL((status == 0) ? true : false);
   }

These are installed with the appropriate "CREATE OR REPLACE ..." sql.  Although 
this is dangerous (anyone who can log in to Postgres can kill any Postgres job!), its 
safe enough in a controlled enviroment.  It allows an Apache CGI to issue the kill(2) 
command through the Postgres backend, which is running as the Postgres user, and thus has 
permission to do the deed.  When I start a job, I record the backend's PID, which allows 
another process to connect and kill the first one.  Alright, it's a hack, but it's the 
best I could think of.

But in spite earlier posting in these forums that say the killing the backend was the way to go, 
this doesn't really work.  First, even though the "postgres" backend job is properly 
killed, a "postmaster" job keeps running at 99% CPU, which is pretty useless.  Killing 
the client's backend didn't kill the process actually doing the work!

Second, the "KILLING SESSION" message to stderr is only printed in the PG log file sporadically.  
This confuses me, since the "KILLING SESSION" is printed by a *different* process than the one 
being killed, so it shouldn't be affected.  So what happens to fprintf()'s output?  Most of the time, I just 
get "unexpected EOF on client connection" in the log which is presumably the postmaster complaining 
that the postgres child process died.

I know the kill_session() is working because it returns "true", and the job is 
in fact killed.  But the query keeps running in postmaster (or is it something else, like 
a rollback?), and the stderr output disappears.

Thanks,
Craig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Kill a session

2006-07-11 Thread Tino Wildenhain

Craig A. James wrote:
There have been dozens, perhaps hundreds, of entries in the pg-admin, 
pg-general, and pg-performance lists regarding killing a session, but as 
far as I can tell, there is no Postgres solution.  Did I miss something?


This raises the question: Why doesn't Postgres have a "kill session" 
command that works?  Oracle has it, and it's invaluable; there is no 
substitute.  Various writers to these PG lists have raised the question 
repeatedly.  Is it just a matter that nobody has had the time to do it 
(which I respect!), or is there a reason why the Postgres team decided a 
"kill session" is a bad idea?


You are sure you read:


http://www.postgresql.org/docs/8.1/interactive/protocol-flow.html#AEN60635

?


Regards
Tino Wildenhain

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Kill a session

2006-07-11 Thread Magnus Hagander
> There have been dozens, perhaps hundreds, of entries in the 
> pg-admin, pg-general, and pg-performance lists regarding 
> killing a session, but as far as I can tell, there is no 
> Postgres solution.  Did I miss something?
> 
> This raises the question: Why doesn't Postgres have a "kill 
> session" command that works?  Oracle has it, and it's 
> invaluable; there is no substitute.  Various writers to these 
> PG lists have raised the question repeatedly.  Is it just a 
> matter that nobody has had the time to do it (which I 
> respect!), or is there a reason why the Postgres team decided 
> a "kill session" is a bad idea?

[snip]

I beleive the function to kill a backend is actually in the codebase,
it's just commented out because it's considered dangerous. There are
some possible issues (see -hackers archives) about sending SIGTERM
without actually shutting down the whole cluster.

Doing the client-side function to call is the easy part.

In many cases you just need to cancel a query, in which case you can use
pg_cancel_backend() for exmaple. If you need to actually kill it, your
only supported way is to restart postgresql. 

> But in spite earlier posting in these forums that say the 
> killing the backend was the way to go, this doesn't really 
> work.  First, even though the "postgres" backend job is 
> properly killed, a "postmaster" job keeps running at 99% CPU, 
> which is pretty useless.  Killing the client's backend didn't 
> kill the process actually doing the work!

Then you killed the wrong backend...


> Second, the "KILLING SESSION" message to stderr is only 
> printed in the PG log file sporadically.  This confuses me, 
> since the "KILLING SESSION" is printed by a *different* 
> process than the one being killed, so it shouldn't be 
> affected.  So what happens to fprintf()'s output?  Most of 
> the time, I just get "unexpected EOF on client connection" in 
> the log which is presumably the postmaster complaining that 
> the postgres child process died.

No, that's the postgres chlid process complaining that your client
(CGI?) died without sending a close message.


> I know the kill_session() is working because it returns 
> "true", and the job is in fact killed.  But the query keeps 
> running in postmaster (or is it something else, like a 
> rollback?), and the stderr output disappears.

No queries run in postmaster. They all run in postgres backends. The
postmaster does very little actual work, other than keeping track of
everybody else.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Unsubscribe

2006-07-11 Thread Jghaffou
Best regards,
Jamal


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Bruno Wolff III
On Mon, Jul 10, 2006 at 17:55:38 +1000,
  Neil Hepworth <[EMAIL PROTECTED]> wrote:
> 
> running on our server (obviously we need to update certain queries,
> e.g. delete .. using.. and test with 8.1 first) - I will be pushing
> for an upgrade as soon as possible.  And the fsync=false is a

You can set add_missing_from if you want to delay rewriting queries that
use that feature.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Neil Hepworth
Thanks for the tip; I'll try that when we initially upgrade, hopefully soon.
 
Neil 
On 12/07/06, Bruno Wolff III <[EMAIL PROTECTED]> wrote:
On Mon, Jul 10, 2006 at 17:55:38 +1000,Neil Hepworth <[EMAIL PROTECTED]
> wrote:>> running on our server (obviously we need to update certain queries,> e.g. delete .. using.. and test with 8.1 first) - I will be pushing> for an upgrade as soon as possible.  And the fsync=false is a
You can set add_missing_from if you want to delay rewriting queries thatuse that feature.


Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Jeff Frost

On Wed, 12 Jul 2006, Neil Hepworth wrote:


I am using version PostgreSQL 7.3.10 (RPM:
postgresql73-rhel21-7.3.10-2).  Unfortunately vacuumdb -a -v does not
give the FSM info at the end (need a newer version of postgres for
that).  Running the same queries on 8.1 reduces the time taken to
about 16 minutes, though I didn't run the test on the same hardware or
OS as I want to keep my test server as close to production as
possible, so I ran the 8.1 server on my Windows laptop (2GHz Centrino
Duo with 2GB of RAM, yes the laptop is brand new :).


Well, looks like you're at least fairly up to date, but there is a fix in 
7.3.11 that you might want to get by upgrading to 7.3.15:


 * Fix race condition in transaction log management
   There was a narrow window in which an I/O operation could be
   initiated for the wrong page, leading to an Assert failure or data
   corruption.

It also appears that you can run autovacuum with 7.3 (I thought maybe it only 
went back as far as 7.4).


So, is the 16 minutes on your laptop with 8.1 for windows vs 1hr on the server 
for the whole set of loops?  If so, 4x isn't a bad improvement. :-)  So, 
assuming you dumped/loaded the same DB onto your laptop's postgresql server, 
what does the vacuumdb -a -v say on the laptop?  Perhaps we can use it to see 
if your fsm settings are ok.


BTW, did you see Scott's posting here:

http://archives.postgresql.org/pgsql-performance/2006-07/msg00091.php

Since we didn't hear from you for a while, I thought perhaps Scott had hit on 
the fix.  Have you tried that yet?  It certainly would help the planner out.


You might also want to turn on autovacuum and see if that helps.

What's your disk subsystem like?  In fact, what's the entire DB server 
hardware like?




I run through a loop, executing the following or similar queries 8
times (well actually 12 but the last 4 don't do anything) - Jeff I've
attached complete outputs as files.

A debug output further below (numbers after each method call name,
above each SQL statement, are times to run that statement in
milliseconds, the times on the lines "" are cumulative).  So total for
one loop is 515 seconds, multiple by 8 and that gets me to over an
hour); it is actually the deletes that take the most time; 179 seconds
and 185 seconds each loop.



CREATE TABLE fttemp670743219 AS SELECT * FROM ftone LIMIT 0
INSERT INTO fttemp670743219 ( epId, start, direction, classid,
consolidation, cnt )  SELECT epId, TO_TIMESTAMP(start, '-MM-DD
HH24:00:00.0')::timestamp AS start, direction, classid, 60 AS
consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND
start < TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000', '-MM-DD
HH24:00:00.0')::timestamp GROUP BY epId, direction,
TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp, classid
DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND
ftone.direction = fttemp670743219.direction AND ftone.start =
fttemp670743219.start AND ftone.consolidation =
fttemp670743219.consolidation AND ftone.classid =
fttemp670743219.classid
INSERT INTO ftone ( epId, start, consolidation, direction, classid,
cnt ) SELECT epId, start, consolidation, direction, classid, cnt FROM
fttemp670743219
DROP TABLE fttemp670743219
DELETE FROM ftone WHERE consolidation = 0 AND start <
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000',
'-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp



ftone: 0:
createConsolidatedInTemporary: 188:
CREATE TABLE fttemp678233382 AS SELECT * FROM ftone LIMIT 0
createConsolidatedInTemporary: 76783:
INSERT INTO fttemp678233382 ( epPairdefnid, start, direction, classid,
consolidation, cnt )  SELECT epPairdefnid, TO_TIMESTAMP(start,
'-MM-DD HH24:00:00.0')::timestamp AS start, direction, classid, 60
AS consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0
AND start < TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD HH24:00:00.0')::timestamp GROUP BY epPairdefnid,
direction, TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp,
classid
replaceConsolidatedInMainTable: 179178:
DELETE FROM ONLY ftone WHERE ftone.epPairdefnid =
fttemp678233382.epPairdefnid AND ftone.direction =
fttemp678233382.direction AND ftone.start = fttemp678233382.start AND
ftone.consolidation = fttemp678233382.consolidation AND ftone.classid
= fttemp678233382.classid
replaceConsolidatedInMainTable: 61705:
INSERT INTO ftone ( epPairdefnid, start, consolidation, direction,
classid, cnt ) SELECT epPairdefnid, start, consolidation, direction,
classid, cnt FROM fttemp678233382
consolidate: 2656:
DROP TABLE fttemp678233382
MAIN LOOP TOTAL consolidate: 320526
deleteOlderThan: 184616:
DELETE FROM ftone WHERE consolidation = 0 AND start <
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD 00:00:00.0')::tim

Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Jeff Frost

On Tue, 11 Jul 2006, Jeff Frost wrote:


On Wed, 12 Jul 2006, Neil Hepworth wrote:

You might also want to turn on autovacuum and see if that helps.

What's your disk subsystem like?  In fact, what's the entire DB server 
hardware like?


By the way, how big does the temp table get?  If it's large, it might make the 
DELETE slow because it doesn't have any indexes on any of the comparison 
columns.


DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND 
ftone.direction = fttemp670743219.direction AND ftone.start = 
fttemp670743219.start AND ftone.consolidation = fttemp670743219.consolidation 
AND ftone.classid = fttemp670743219.classid


In your explain analyze from before, it seems that there were 0 rows in that 
table:



  ->  Seq Scan on fttemp1600384653  (cost=0.00..0.00 rows=1
width=4) (actual time=206.00..206.00 rows=0 loops=1)
Total runtime: 3767.52 msec


but that was with the smaller set size I believe.





I run through a loop, executing the following or similar queries 8
times (well actually 12 but the last 4 don't do anything) - Jeff I've
attached complete outputs as files.

A debug output further below (numbers after each method call name,
above each SQL statement, are times to run that statement in
milliseconds, the times on the lines "" are cumulative).  So total for
one loop is 515 seconds, multiple by 8 and that gets me to over an
hour); it is actually the deletes that take the most time; 179 seconds
and 185 seconds each loop.



CREATE TABLE fttemp670743219 AS SELECT * FROM ftone LIMIT 0
INSERT INTO fttemp670743219 ( epId, start, direction, classid,
consolidation, cnt )  SELECT epId, TO_TIMESTAMP(start, '-MM-DD
HH24:00:00.0')::timestamp AS start, direction, classid, 60 AS
consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND
start < TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000', '-MM-DD
HH24:00:00.0')::timestamp GROUP BY epId, direction,
TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp, classid
DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND
ftone.direction = fttemp670743219.direction AND ftone.start =
fttemp670743219.start AND ftone.consolidation =
fttemp670743219.consolidation AND ftone.classid =
fttemp670743219.classid
INSERT INTO ftone ( epId, start, consolidation, direction, classid,
cnt ) SELECT epId, start, consolidation, direction, classid, cnt FROM
fttemp670743219
DROP TABLE fttemp670743219
DELETE FROM ftone WHERE consolidation = 0 AND start <
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000',
'-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp



ftone: 0:
createConsolidatedInTemporary: 188:
CREATE TABLE fttemp678233382 AS SELECT * FROM ftone LIMIT 0
createConsolidatedInTemporary: 76783:
INSERT INTO fttemp678233382 ( epPairdefnid, start, direction, classid,
consolidation, cnt )  SELECT epPairdefnid, TO_TIMESTAMP(start,
'-MM-DD HH24:00:00.0')::timestamp AS start, direction, classid, 60
AS consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0
AND start < TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD HH24:00:00.0')::timestamp GROUP BY epPairdefnid,
direction, TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp,
classid
replaceConsolidatedInMainTable: 179178:
DELETE FROM ONLY ftone WHERE ftone.epPairdefnid =
fttemp678233382.epPairdefnid AND ftone.direction =
fttemp678233382.direction AND ftone.start = fttemp678233382.start AND
ftone.consolidation = fttemp678233382.consolidation AND ftone.classid
= fttemp678233382.classid
replaceConsolidatedInMainTable: 61705:
INSERT INTO ftone ( epPairdefnid, start, consolidation, direction,
classid, cnt ) SELECT epPairdefnid, start, consolidation, direction,
classid, cnt FROM fttemp678233382
consolidate: 2656:
DROP TABLE fttemp678233382
MAIN LOOP TOTAL consolidate: 320526
deleteOlderThan: 184616:
DELETE FROM ftone WHERE consolidation = 0 AND start <
TO_TIMESTAMP((TO_TIMESTAMP('2006-07-12 11:02:13.865444+1000',
'-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),
'-MM-DD 00:00:00.0')::timestamp
MAIN LOOP TOTAL deleteExpiredData: 505142
MAIN LOOP TOTAL generateStatistics: 515611



Thanks again,
  Neil


On 11/07/06, Jeff Frost <[EMAIL PROTECTED]> wrote:



On Mon, 10 Jul 2006, Neil Hepworth wrote:

> I should also explain that I run through these queries on multiple
> tables and with some slightly different parameters for the
> "consolidation" so I run through those 3 queries (or similar) 9 times
> and this takes a total of about 2 hours, with high CPU usage.  And I
> am running the queries from a remote Java application (using JDBC),
> the client is using postgresql-8.0-311.jdbc3.jar.  The explain analyse
> results I have provided below are from running via pgAdmin, n

Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Neil Hepworth
Yes, it was the same DB so, yes 8.1 gives roughly a four fold improvement (assuming hardware and OS differences aren't that significant - I'd expect the Linux version to be faster if anything); which certainly ain't bad! :)

 
Good idea for the vacuumdb -a -v on the laptop, I re imported the database and than ran it output below:
 
INFO:  free space map contains 949 pages in 537 relationsDETAIL:  A total of 9024 page slots are in use (including overhead).9024 page slots are required to track all free space.Current limits are:  2 page slots, 1000 relations, using 186 KB.
VACUUM
 
I am about to start testing Scott's suggestion now (thanks Scott - wasn't ignoring you, just didn't have time yesterday), and I'll get back with the results.
 
Before I posted the problem to this list I was focusing more on the settings in postgresql.conf than optimising the query as I thought this might be a general problem, for all my DB updates/queries, with the way the planner was optimising queries; maybe assuming CPU cost was too cheap?  Do you think I was off track in my initial thinking?  Optimising these queries is certainly beneficial but I don't want postgres to hog the CPU for any extended period (other apps also run on the server), so I was wondering if the general config settings could to be tuned to always prevent this (regardless of how poorly written my queries are :)?

 
Neil 
On 12/07/06, Jeff Frost <[EMAIL PROTECTED]> wrote: 
On Wed, 12 Jul 2006, Neil Hepworth wrote:> I am using version PostgreSQL 7.3.10 (RPM:> postgresql73-rhel21-7.3.10-2
).  Unfortunately vacuumdb -a -v does not> give the FSM info at the end (need a newer version of postgres for> that).  Running the same queries on 8.1 reduces the time taken to> about 16 minutes, though I didn't run the test on the same hardware or
> OS as I want to keep my test server as close to production as> possible, so I ran the 8.1 server on my Windows laptop (2GHz Centrino> Duo with 2GB of RAM, yes the laptop is brand new :).Well, looks like you're at least fairly up to date, but there is a fix in
7.3.11 that you might want to get by upgrading to 7.3.15: * Fix race condition in transaction log management   There was a narrow window in which an I/O operation could be   initiated for the wrong page, leading to an Assert failure or data
   corruption.It also appears that you can run autovacuum with 7.3 (I thought maybe it onlywent back as far as 7.4).So, is the 16 minutes on your laptop with 8.1 for windows vs 1hr on the server
for the whole set of loops?  If so, 4x isn't a bad improvement. :-)  So,assuming you dumped/loaded the same DB onto your laptop's postgresql server,what does the vacuumdb -a -v say on the laptop?  Perhaps we can use it to see
if your fsm settings are ok.BTW, did you see Scott's posting here:http://archives.postgresql.org/pgsql-performance/2006-07/msg00091.php
Since we didn't hear from you for a while, I thought perhaps Scott had hit onthe fix.  Have you tried that yet?  It certainly would help the planner out.You might also want to turn on autovacuum and see if that helps.
What's your disk subsystem like?  In fact, what's the entire DB serverhardware like?>> I run through a loop, executing the following or similar queries 8> times (well actually 12 but the last 4 don't do anything) - Jeff I've
> attached complete outputs as files.>> A debug output further below (numbers after each method call name,> above each SQL statement, are times to run that statement in> milliseconds, the times on the lines "" are cumulative).  So total for
> one loop is 515 seconds, multiple by 8 and that gets me to over an> hour); it is actually the deletes that take the most time; 179 seconds> and 185 seconds each loop.>> 
>> CREATE TABLE fttemp670743219 AS SELECT * FROM ftone LIMIT 0> INSERT INTO fttemp670743219 ( epId, start, direction, classid,> consolidation, cnt )  SELECT epId, TO_TIMESTAMP(start, '-MM-DD
> HH24:00:00.0')::timestamp AS start, direction, classid, 60 AS> consolidation, SUM(cnt) AS cnt FROM ftone WHERE consolidation = 0 AND> start < TO_TIMESTAMP('2006-07-11 14:04:34.156433+1000', '-MM-DD
> HH24:00:00.0')::timestamp GROUP BY epId, direction,> TO_TIMESTAMP(start, '-MM-DD HH24:00:00.0')::timestamp, classid> DELETE FROM ONLY ftone WHERE ftone.epId = fttemp670743219.epId AND> ftone.direction
 = fttemp670743219.direction AND ftone.start => fttemp670743219.start AND ftone.consolidation => fttemp670743219.consolidation AND ftone.classid ="">> fttemp670743219.classid> INSERT INTO ftone ( epId, start, consolidation, direction, classid,
> cnt ) SELECT epId, start, consolidation, direction, classid, cnt FROM> fttemp670743219> DROP TABLE fttemp670743219> DELETE FROM ftone WHERE consolidation = 0 AND start <> TO_TIMESTAMP((TO_TIMESTAMP('2006-07-11 14:04:
34.156433+1000',> '-MM-DD 00:00:00.0')::timestamp - INTERVAL '10080 MINUTE'),> '-MM-DD 00:00:00.0')::timestam