Re: [GENERAL] Query using partitioned table hangs

2009-12-08 Thread Rob W
--- On Mon, 12/7/09, Tom Lane wrote:
 Have you looked into pg_locks to see if it's blocked
 waiting for a lock?
 The TRUNCATE in particular would require exclusive lock on
 the table, so it could be waiting for some other process 
 that's touched the table.

Thanks Tom - while pg_locks did not reveal any locks, you gave me the clue that 
solved the mystery: the truncate command.

pg_locks show no locks at all on this table, but just in case, I tried removing 
an ANLAYZE on the table being truncated, that was being done right at the 
beginning of the sproc, and should have finished long before the truncate was 
being run, and hey presto no deadlocks.

A couple of weird things about this: the fact the pg_locks showed no locks, and 
secondly the fact that it used to work fine before we started partitioning an 
unrelated table. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query using partitioned table hangs

2009-12-07 Thread Rob W
A re-post, since I'm really stuck on this and could use some advice on how to 
troubleshoot this...

I have an app that was previously using a large unpartitioned table with no 
problems. I partitioned this table and am now experiencing intermittent hangs 
when inserting data into the partitioned table. The stored procedure that does 
the insert seems to run to completion even when it 'hangs'. There are no 
messages in the log file, no errors/exceptions that I'm aware of, and I'm at a 
loss as to what is causing this, so any help would be much appreciated.

Here's the details:

PostgreSQL version: 8.4
OS: Reproduced on both, RHEL 5.3 and OS X 10.5
Application: Java 1.6, using PostgreSQL 8.4 JDBC type 4 driver.

The application invokes a pl/pgsql stored procedure (with autocommit set to 
true). This sproc does several inserts - see below for the code. This works 
fine with a single unpartitioned table, even for large inserts.

With the table partitioned, it hangs intermittently, usually occurs within 20 - 
30 minutes of running the application, after invoking the sproc ~25 times. When 
it hangs, it hangs indefinitely - I know because I inadvertently left it in 
'hung' state for 24 hours, so it's not just slow.

The last command of the sproc truncates the staging table, raw_data. The table 
raw_data is empty when the query is hung, which suggests that the sproc is 
running to completion. I tried sending a SIGHUP to the postmaster process (kill 
-1) while the sproc was hung, thinking it might be a thread blocked somewhere, 
but that had no effect.

The relevant tables, queries etc are listed below:


-- The parent table
CREATE TABLE event (
id BIGSERIAL PRIMARY KEY,
evt_time TIMESTAMP WITH TIME ZONE NOT NULL,
fk_host INTEGER REFERENCES dim_host NOT NULL,
fk_user INTEGER REFERENCES dim_user
);


-- example of a child table:
CREATE TABLE event_y2009m09
(
  CONSTRAINT event_y2009m09_evt_time_check CHECK (evt_time = 
'2009-09-01'::date AND evt_time  '2009-10-01'::date)
)
INHERITS (event)
WITH (
  OIDS=FALSE
);


-- Example of trigger function used
CREATE OR REPLACE FUNCTION event_insert_trigger()
  RETURNS trigger AS
$BODY$
BEGIN
IF ( NEW.evt_time = DATE '2009-08-01' AND NEW.evt_time  DATE '2009-09-01' 
) THEN
INSERT INTO event_y2009m08 VALUES (NEW.*);
ELSIF ( NEW.evt_time = DATE '2009-09-01' AND NEW.evt_time  DATE 
'2009-10-01' ) THEN
INSERT INTO event_y2009m09 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range.';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;


-- The sproc that hangs
CREATE OR REPLACE FUNCTION normalize_data()
  RETURNS void AS
$BODY$
DECLARE
str_value text;
match record;
BEGIN
-- Run analyze to keep statistics up to date
-- raw_data contains ~60,000 records at this point
ANALYZE raw_data;
 
INSERT INTO dim_host (name_str)
SELECT DISTINCT host_name FROM raw_data
EXCEPT
SELECT name_str FROM dim_host;
ANALYZE dim_host;

-- Do a few more inserts like the above

-- Then copy all the records from the staging table
-- Simplified for brevity - real query is a 12-way join
-- All FK constraints on table event are dropped before
-- we run this query.
-- All appropriate partitions and triggers have been created
-- in advance of running this query.

INSERT INTO event (evt_time, fk_host, fk_user)
SELECT r.evt_time, dim_host.id, dim_user.id
FROM raw_data as r
JOIN dim_host ON r.host_name = dim_host.name_str
LEFT JOIN dim_user ON r.user_name = dim_user.user_name;

TRUNCATE raw_data;

RETURN;
END;
$$ LANGUAGE plpgsql;


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query using partitioned table hangs

2009-12-02 Thread Rob W
I have an app that was previously using a large unpartitioned table with no 
problems. I partitioned this table and am now experiencing intermittent hangs 
when inserting data into the partitioned table. The stored procedure that does 
the insert seems to run to completion even when it 'hangs'. There are no 
messages in the log file, no errors/exceptions that I'm aware of, and I'm at a 
loss as to what is causing this, so any help would be much appreciated.

Here's the details:

PostgreSQL version: 8.4
OS: Reproduced on both, RHEL 5.3 and OS X 10.5
Application: Java 1.6, using PostgreSQL 8.4 JDBC type 4 driver.

The application invokes a pl/pgsql stored procedure (with autocommit set to 
true). This sproc does several inserts - see below for the code. This works 
fine with a single unpartitioned table, even for large inserts. 

With the table partitioned, it hangs intermittently, usually occurs within 20 - 
30 minutes of running the application, after invoking the sproc ~25 times. When 
it hangs, it hangs indefinitely - I know because I inadvertently left it in 
'hung' state for 24 hours, so it's not just slow.

The last command of the sproc truncates the staging table, raw_data. The table 
raw_data is empty when the query is hung, which suggests that the sproc is 
running to completion. I tried sending a SIGHUP to the postmaster process (kill 
-1) while the sproc was hung, thinking it might be a thread blocked somewhere, 
but that had no effect.

The relevant tables, queries etc are listed below:


-- The parent table
CREATE TABLE event (
id BIGSERIAL PRIMARY KEY,
evt_time TIMESTAMP WITH TIME ZONE NOT NULL,
fk_host INTEGER REFERENCES dim_host NOT NULL,
fk_user INTEGER REFERENCES dim_user
);


-- example of a child table:
CREATE TABLE event_y2009m09
(
  CONSTRAINT event_y2009m09_evt_time_check CHECK (evt_time = 
'2009-09-01'::date AND evt_time  '2009-10-01'::date)
)
INHERITS (event)
WITH (
  OIDS=FALSE
);


-- Example of trigger function used
CREATE OR REPLACE FUNCTION event_insert_trigger()
  RETURNS trigger AS
$BODY$
BEGIN
IF ( NEW.evt_time = DATE '2009-08-01' AND NEW.evt_time  DATE '2009-09-01' 
) THEN
INSERT INTO event_y2009m08 VALUES (NEW.*);
ELSIF ( NEW.evt_time = DATE '2009-09-01' AND NEW.evt_time  DATE 
'2009-10-01' ) THEN
INSERT INTO event_y2009m09 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range.';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;


-- The sproc that hangs
CREATE OR REPLACE FUNCTION normalize_data()
  RETURNS void AS
$BODY$
DECLARE
str_value text;
match record;
BEGIN
-- Run analyze to keep statistics up to date
-- raw_data contains ~60,000 records at this point
ANALYZE raw_data;
  
INSERT INTO dim_host (name_str)
SELECT DISTINCT host_name FROM raw_data
EXCEPT
SELECT name_str FROM dim_host;
ANALYZE dim_host;

-- Do a few more inserts like the above

-- Then copy all the records from the staging table
-- Simplified for brevity - real query is a 12-way join
-- All FK constraints on table event are dropped before 
-- we run this query.
-- All appropriate partitions and triggers have been created 
-- in advance of running this query.

INSERT INTO event (evt_time, fk_host, fk_user)
SELECT r.evt_time, dim_host.id, dim_user.id
FROM raw_data as r
JOIN dim_host ON r.host_name = dim_host.name_str
LEFT JOIN dim_user ON r.user_name = dim_user.user_name;

TRUNCATE raw_data;

RETURN;
END;
$$ LANGUAGE plpgsql;


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Efficiently move data from one table to another, with FK constraints?

2009-07-06 Thread Rob W

I am using COPY to bulk load large volumes (i.e. multi GB range) of data to a 
staging table in a PostgreSQL 8.3. For performance, the staging table has no 
constraints, no primary key, etc. I want to move that data into the real 
tables, but need some advice on how to do that efficiently.

Here's a simple, abbreviated example of tables and relations I'm working with 
(in reality there are a lot more columns and foreign keys).

/* The raw bulk-loaded data. No indexes or constraints. */
CREATE TABLE log_entry (
req_time TIMESTAMP NOT NULL,
url TEXT NOT NULL,
bytes INTEGER NOT NULL
);

/* Where the data will be moved to. Will have indexes, etc */
CREATE TABLE request (
id BIGSERIAL PRIMARY KEY,
req_time TIMESTAMP WITH TIME ZONE NOT NULL,
bytes INTEGER NOT NULL,
fk_url INTEGER REFERENCES url NOT NULL,
);

CREATE TABLE url (
id SERIAL PRIMARY KEY,
path TEXT UNIQUE NOT NULL,
);

Is there a way to move this data in bulk efficiently? Specifically I'm 
wondering how to handle the foreign keys? The naive approach is:

1) For each column that is a foreign key in the target table, do INSERT ... 
SELECT DISTINCT ...  to copy all the values into the appropriate child tables.
2) For each row in log_entry,  do a similar insert to insert the data with the 
appropriate  foreign keys.
3) delete the contents of table log_entry using TRUNCATE

Obviously, this would be very slow when handling tens of millions of records. 
Are there faster approaches to solving this problem?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Optimizing tuning and table design for large analytics DB

2009-05-07 Thread Rob W

Can anyone point me towards good articles or books that would help a PostgreSQL 
novice (i.e. me) learn the optimal approaches to setting up a DB for analytics?

In this particular case, I need to efficiently analyze approximately 300 
million system log events (i.e. time series data). It's log data, so it's only 
appended to the table, not inserted and is never modified. Only 90 days worth 
of data will be retained, so old records need to be deleted periodically. Query 
performance will only be important for small subsets of the data (e.g. when 
analyzing a week or day's worth of data), the rest of the reports will be run 
in batch mode. There will likely only be one user at a time doing ad-hoc 
queries.

This is a a follow-up to the earlier suggestions that PostgreSQL will handle 
the volumes of data I plan to work with, so I figured I'd give it a shot. 

Rob

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general