From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Genc, Ömer
Sent: Friday, August 21, 2015 8:49 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance bottleneck due to array manipulation

Hey,

i have a very long running stored procedure, due to array manipulation in a 
stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
    point_ids_older_than_one_hour := '{}';
    object_ids_to_be_invalidated := '{}';

    select ARRAY(SELECT
                    point_id
                from ONLY
                    public.ims_point as p
                where
                    p.timestamp < m_before_one_hour
                )
    into point_ids_older_than_one_hour ; -- this array has a size of 20k

    select ARRAY(SELECT
                        object_id
                  from
                        public.ims_object_header h
                  WHERE
                        h.last_point_id= ANY(point_ids_older_than_one_hour)
                 )
    into object_ids_to_be_invalidated; -- this array has a size of 100

    --    current_last_point_ids will have a size of 100k
    current_last_point_ids := ARRAY( SELECT
                                            last_point_id
                                      from
                                            public.ims_object_header h
                                     );
    -- START OF PERFORMANCE BOTTLENECK
    IF(array_length(current_last_point_ids, 1) > 0)
    THEN
        FOR i IN 0 .. array_upper(current_last_point_ids, 1)
        LOOP
            point_ids_older_than_one_hour = 
array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
        END LOOP;
    END IF;
    -- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, 
that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. 
ims_object_header references some entries of ims_point in the column 
last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older 
than one hour. The currently being referenced ids of the table 
ims_object_header should be excluded from this deletion. Therefore I stored the 
ids in arrays and iterate over those arrays to exclude the referenced values 
from being deleted.

However, I not sure if using an array for an operation like this is the best 
approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.


I think in this case (as is in many other cases) "pure" SQL does the job much 
better than procedural language:

DELETE FROM public.ims_point as P
WHERE  P.timestamp < m_before_one_hour
     AND NOT EXISTS (SELECT 1 FROM  public.ims_object_header OH
                                                WHERE OH.last_point_id = 
P.object_id);

Is that what you are trying to accomplish?

Regards,
Igor Neyman




Reply via email to