Re: [GENERAL] Simulating sequences

Jan Wieck Mon, 18 Aug 2003 09:28:32 -0700

Vilson farias wrote:

Greetings,

  I'm getting a big performance problem and I would like to ask you what
would be the reason, but first I need to explain how it happens.

  Let's suppose I can't use sequences (it seams impossible but my boss
doesn't like specific database features like this one).

I don't see how using PL/pgSQL is any better with respect to specific database feature, but that aside for a moment.

Your function not only misses the required FOR UPDATE when reading the (possibly existing) current value, it also contains a general race condition. Multiple concurrent transactions could try inserting the new key and every but one would error out with a duplicate key error.

Sequence values are int8 and are by default safe against integer rollover.

Sequences do not rollback and therefore don't need to wait for concurrent transactions to finish. Your table based replacement is a major concurrency bottleneck. As soon as a transaction did an insert to a table, it blocks out every other transaction from inserting into that table until it either commits or rolls back.

Your VACUUM theory is only partial correct. A frequent VACUUM will prevent the key table from growing. You'd have to do so very often since the excess number of obsolete index entries pointing to dead tuples also degrades your performance. Additionally if there is a very low number of keys (sequences) in that table, an ANALYZE run might cause the planner to go for a sequential scan and ignore the index on the table at which point your function will actually cause "two" sequential scan over all live and dead tuples of all sequences per call.

Sequences are specially designed to overcome all these issues.

If you cannot convice your boss to use sequences, he is a good example for why people having difficulties understanding technical issues should not assume leadership positions in IT projects.

Jan


  For sequence simulation I had created a table called cnfg_key_generation
and each tuple holds information for one of my tables (tablename,
current_sequencial_number). Lets check :

CREATE TABLE cnfg_key_generation (
       department  integer     NOT NULL,
       table_name  varchar(20) NOT NULL,
       current_key integer NOT NULL,
       CONSTRAINT XPKcnfg_key_generation PRIMARY KEY (department,
table_name)
);


Per example, for a table called 'my_test' I would have the following values
:
  department  = 1
  table_name  = 'my_test'
  current_key = 1432

Everytime I want a new key to use in my_test primary-key I just increment
current_key value. For this job, I've created a simple stored procedure
called key_generation


CREATE FUNCTION key_generation(integer, varchar(20)) RETURNS integer AS'
DECLARE
  the_department ALIAS FOR $1;
  the_table_name ALIAS FOR $2;
  new_key_value  integer;
  err_num        integer;
BEGIN
  new_value := 0;

LOCK TABLE cnfg_key_generation IN ROW EXCLUSIVE MODE;

  SELECT current_value INTO new_value
    FROM cnfg_key_generation
    WHERE the_department = department AND the_table_name = table_name;

  IF NOT FOUND THEN
    new_key_value := 1;
    INSERT INTO cnfg_key_generation VALUES (the_department,the_table_name,
new_key_value);
  ELSE
    new_key_value := new_key_value + 1;

    UPDATE cnfg_key_generation
      SET current_key_value = new_key_value
      WHERE department = the_department AND
            table_name = the_table_name;
  END IF;

RETURN new_key_value;

END;
'
LANGUAGE 'plpgsql';

Data insertion is done by the following way :

INSERT INTO my_test VALUES (key_generation(1, 'my_test'), ...other
fields...);

Ok, this works, but has a huge performance problem. I read in pg docs that
everytime I make an UPDATE, a new tuple is created and the old is marked as
invalid. For a tuple that holds sequencial key generation, everytime a new
key is generated, a new tuple is created inside cfg_key_generation. It means
after 2million key generations for same table, performance will be completly
degradated because there will be 2million of old versions of same tuple. For
instance, I have a table called 'cham_chamada' that actually holds
1.5Million of tuples. The response time for key_generation execution for
this table is more than 5seconds. In this same case if I execute
key_generation for a table that has just few values (current key = 5 per
example), response time is just some miliseconds (30 to 50ms).

I tryied to fix this problem with a VACUUM and it was completly ineffective.
After execution the problem was still there. Later, after execution of every
kind of vacuum I knew (with and without ANALYZE, especific for that table,
vacuumdb from shell ...) I gave up and tryied a COPY TO/DROP/CREATE/COPY
FROM. At this time, the performance problem was fixed.

What can I do to solve this problem without table reconstruction? Is there a
performance degradation in PostgreSQL that can't be solved? If a have a huge
table with millions of data being update constantly, what can I do to keep a
good performance if vacuum isn't 100%?

Does PostgreSQL sequences deal with these kinds performance questions? Is it
clean, fast and secury to use sequences? Maybe I still can make up my boss
mind about this subject if I get good arguments here to justify the use of
sequences.

Am I doing some stupid thing?

Best regards,

-----------------------------------------------------------------
José Vilson de Mello de Farias
Software Engineer

Dígitro Tecnologia Ltda - www.digitro.com.br
APC - Customer Oriented Applications
E-mail: [EMAIL PROTECTED]
Tel.: +55 48 281 7158
ICQ 11866179


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] Simulating sequences

Reply via email to