I'm not sure if anyone else is using satellite-sync with PostgreSQL, but it's 
quite slow when diffing packages; almost unusable right now.  I wanted to run 
by a possible fix with everyone since the sync code is somewhat complicated and 
I'm not super-familiar with all of it.

diff --git a/backend/satellite_tools/satsync.py 
b/backend/satellite_tools/satsync.py
index 488d23e..b391200 100644
--- a/backend/satellite_tools/satsync.py
+++ b/backend/satellite_tools/satsync.py
@@ -884,13 +884,14 @@ Please contact your RHN representative""") % (generation, 
sat_cert.generation))
     _query_compare_packages = """
         select p.id, c.checksum_type, c.checksum, p.path, p.package_size,
                TO_CHAR(p.last_modified, 'YYYYMMDDHH24MISS') last_modified
-          from rhnPackage p, rhnChecksumView c
-         where p.name_id = lookup_package_name(:name)
+          from rhnPackage p
+            inner join rhnPackageName pn on pn.id = p.name_id
+            inner join rhnChecksumView c on p.checksum_id = c.id
+         where pn.name = :name
            and p.evr_id = lookup_evr(:epoch, :version, :release)
            and p.package_arch_id = lookup_package_arch(:arch)
            and (p.org_id = :org_id or
                (p.org_id is null and :org_id is null))
-           and p.checksum_id = c.id
     """

     def _diff_packages_process(self, chunk, channel_label):

Time difference for rhel-x86_64-productivity-5 from disk (no new packages added 
in either run):
real    10m45.416s
vs
real    0m5.818s

I tracked the problem down to backend/satellite_tools/satsync.py and the 
_query_compare_packages statement.  Using the 'lookup_package_name' function 
causes a sequence scan on rhnPackage, which makes the query quite slow:

spaceschema=# explain analyze select p.id, c.checksum_type, c.checksum, p.path, 
p.package_size,
spaceschema-#                        TO_CHAR(p.last_modified, 
'YYYYMMDDHH24MISS') last_modified
spaceschema-#                   from rhnPackage p, rhnChecksumView c
spaceschema-#                  where p.name_id = 
lookup_package_name(E'python-imaging')
spaceschema-#                    and p.evr_id = lookup_evr(NULL, E'1.1.5', 
E'5.el5')
spaceschema-#                    and p.package_arch_id = 
lookup_package_arch(E'x86_64')
spaceschema-#                    and (p.org_id = 1 or
spaceschema(#                        (p.org_id is null and 1 is null))
spaceschema-#                    and p.checksum_id = c.id;

                           QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------
 Nested Loop  (cost=0.00..23072.52 rows=1 width=212) (actual 
time=455.849..737.877 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..23072.24 rows=1 width=213) (actual 
time=455.729..737.755 rows=1 loops=1)
         ->  Seq Scan on rhnpackage p  (cost=0.00..23063.58 rows=1 width=168) 
(actual time=455.699..737.723 rows=1 loops=1)
               Filter: ((org_id = 1::numeric) AND (name_id = 
lookup_package_name('python-imaging'::character varying, 0::num
eric)) AND (evr_id = lookup_evr(NULL::character varying, '1.1.5'::character 
varying, '5.el5'::character varying)) AND (packa
ge_arch_id = lookup_package_arch('x86_64'::character varying)))
         ->  Index Scan using rhnchecksum_pk on rhnchecksum c  (cost=0.00..8.64 
rows=1 width=61) (actual time=0.025..0.025 r
ows=1 loops=1)
               Index Cond: (c.id = p.checksum_id)
   ->  Index Scan using rhn_checksumtype_id_pk on rhnchecksumtype ct  
(cost=0.00..0.27 rows=1 width=13) (actual time=0.026..
0.028 rows=1 loops=1)
         Index Cond: (ct.id = c.checksum_type_id)
 Total runtime: 738.081 ms
(9 rows)


Replacing the 'lookup_package_name' call with an inner join drastically 
improves the performance of the query:

spaceschema=# explain analyze select p.id, c.checksum_type, c.checksum, p.path, 
p.package_size,
                       TO_CHAR(p.last_modified, 'YYYYMMDDHH24MISS') 
last_modified
                  from rhnPackage p
                     inner join rhnPackageName pn on pn.id = p.name_id
                     inner join rhnChecksumView c on p.checksum_id = c.id
                 where pn.name = 'python-imaging'
                   and p.evr_id = lookup_evr(NULL, E'1.1.5', E'5.el5')
                   and p.package_arch_id = lookup_package_arch(E'x86_64')
                   and (p.org_id = 1 or
                       (p.org_id is null and 1 is null));
                                                                                
                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..43.71 rows=1 width=212) (actual time=0.348..0.351 
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..43.43 rows=1 width=213) (actual 
time=0.330..0.332 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..34.77 rows=1 width=168) (actual 
time=0.316..0.317 rows=1 loops=1)
               ->  Index Scan using rhn_pn_name_uq on rhnpackagename pn  
(cost=0.00..8.27 rows=1 width=7) (actual time=0.033
..0.034 rows=1 loops=1)
                     Index Cond: ((name)::text = 'python-imaging'::text)
               ->  Index Scan using rhn_package_nid_id_idx on rhnpackage p  
(cost=0.00..26.49 rows=1 width=175) (actual time
=0.275..0.275 rows=1 loops=1)
                     Index Cond: (p.name_id = pn.id)
                     Filter: ((p.org_id = 1::numeric) AND (p.evr_id = 
lookup_evr(NULL::character varying, '1.1.5'::character
 varying, '5.el5'::character varying)) AND (p.package_arch_id = 
lookup_package_arch('x86_64'::character varying)))
         ->  Index Scan using rhnchecksum_pk on rhnchecksum c  (cost=0.00..8.64 
rows=1 width=61) (actual time=0.012..0.012 r
ows=1 loops=1)
               Index Cond: (c.id = p.checksum_id)
   ->  Index Scan using rhn_checksumtype_id_pk on rhnchecksumtype ct  
(cost=0.00..0.27 rows=1 width=13) (actual time=0.003..
0.004 rows=1 loops=1)
         Index Cond: (ct.id = c.checksum_type_id)
 Total runtime: 0.445 ms
(13 rows)

I noticed 'lookup_package_name' creates a new record in rhnPackageName if it's 
not found, but 'satellite-sync' will create that record just fine later on for 
non-existent package names (I assume from another call to 
'lookup_package_name', I didn't actually look, just tested syncing with unique 
package names).  Since that is the case, I don't see any detrimental effects 
from this fix.  Let me know if you see anything wrong with it.

/aron

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to