[PERFORM] DB Performance decreases due to often written/accessed table

Jens Schipkowski Thu, 19 Oct 2006 06:12:41 -0700

Hi,

we've got performance problems due to repeating SELECT, UPDATE, DELETE,INSERT statements. This statements have to be executed every 10 seconds,but they run into a timeout.To obviate problems regarding to our Java Software and JDBC drivers, weput the repeating sequence of statements to a file more than 100k times(half a million statements) and executed "psql ourDB -f ourFile.sql -o/dev/null". To accelerate the occurence of the performance drop, westarted 6 instances of this command.The performance drop occured after 10 minutes shifting the server to 0percent idle and 85 - 95 percent user.For tracing the statement which raised the load, we are using pg_locks,pg_stat_activity with current_query enabled. The responsible statement isthe DELETE, it hangs until its canceled by timeout. The first run on anvacuumed DB took 300 - 600ms.In a second test we removed the DELETE statements to see wich statementsalso needs longer time by increasing the amount of data. After half anhour the SELECT statements timed out.An additional VACUUM - every 1 minute - does extend the timeout occurenceby factor 5 - 6.It does not sound professional, but the database seems to be aging by theincrease of executed statements.


The Statements
---------------
// normal insert - does not seem to be the problem - runtime is ok

INSERT INTO tbl_reg(idreg,idtype,attr1,...,attr6,loc1,...,loc3,register)VALUES(nextval('tbl_reg_idreg_seq'),1,[attr],[loc],1);


// select finds out which one has not an twin
// a twin is defined as record with the same attr* values
// decreases speed over time until timeout by postgresql
SELECT *
FROM tbl_reg reg
WHERE register <> loc1 AND
        idreg NOT IN
                (
                SELECT reg.idreg
                FROM tbl_reg reg, tbl_reg regtwin
                WHERE regtwin.register = 1 AND
                        regtwin.type <> 20 AND
                        reg.attr1 = regtwin.attr1 AND
                        reg.attr2 = regtwin.attr2 AND
                        reg.attr3 = regtwin.attr3 AND
                        reg.attr4 = regtwin.attr4 AND
                        reg.attr5 = regtwin.attr5 AND
                        reg.attr6 = regtwin.attr6 AND
                        reg.idreg <> regtwin.idreg AND
                        reg.register = 2
                );

I tried to optimize the seslect statement but the group by having count(*)

1 solution is half as fast as this statement - relating to the query

plan of EXPLAIN ANALYZE.

// delete data without a twin
// drastically decreases speed over time until timeout by postgresql
DELETE
FROM tbl_reg
WHERE idregs IN
        (
        SELECT reg.idreg
        FROM tbl_reg reg, tbl_reg regtwin
        WHERE regtwin.register = 1 AND
                regtwin.type <> 20 AND
                reg.attr1 = regtwin.attr1 AND
                reg.attr2 = regtwin.attr2 AND
                reg.attr3 = regtwin.attr3 AND
                reg.attr4 = regtwin.attr4 AND
                reg.attr5 = regtwin.attr5 AND
                reg.attr6 = regtwin.attr6 AND
                reg.idreg <> regtwin.idreg AND
                reg.register = 2
        ) OR
        (loc1 = '2' AND loc2 = '2');

The runtime of this statement increases until it will canceled byPostgreSQL.


// the where clause of this update statement is normally build in java
UPDATE tbl_reg SET loc1=2 WHERE idreg IN ('...',...,'...');

The Table
---------------
Tested with: 20.000, 80.000, 500.000 records

CREATE TABLE tbl_reg
(
  idreg bigserial NOT NULL,
  idtype int8 DEFAULT 0,
  attr1 int4,
  attr2 int4,
  attr3 varchar(20),
  attr4 varchar(20),
  attr5 int4,
  attr6 varchar(140) DEFAULT ''::character varying,
  loc1 int2 DEFAULT 0,
  loc2 int2 DEFAULT 0,
  loc3 int2 DEFAULT 0,
  register int2 DEFAULT 1,
  "timestamp" timestamp DEFAULT now(),
  CONSTRAINT tbl_reg_pkey PRIMARY KEY (idreg)
)
WITHOUT OIDS;

The Hardware
----------------
Dual Xeon 3.2GHz Hyperthreading
SCSI harddrives
RAID and non-RAID tested

We have the problem, that we cannot see any potential to improve SQLstatements. Indexing the attr* columns seems not to be an solution,because the data mustn't be unique (twins) and changes really often soreindexing will took too long.



thanks,
Jens Schipkowski

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

[PERFORM] DB Performance decreases due to often written/accessed table

Reply via email to