I'm not sure if anyone else is using satellite-sync with PostgreSQL, but it's quite slow when diffing packages; almost unusable right now. I wanted to run by a possible fix with everyone since the sync code is somewhat complicated and I'm not super-familiar with all of it.
diff --git a/backend/satellite_tools/satsync.py b/backend/satellite_tools/satsync.py index 488d23e..b391200 100644 --- a/backend/satellite_tools/satsync.py +++ b/backend/satellite_tools/satsync.py @@ -884,13 +884,14 @@ Please contact your RHN representative""") % (generation, sat_cert.generation)) _query_compare_packages = """ select p.id, c.checksum_type, c.checksum, p.path, p.package_size, TO_CHAR(p.last_modified, 'YYYYMMDDHH24MISS') last_modified - from rhnPackage p, rhnChecksumView c - where p.name_id = lookup_package_name(:name) + from rhnPackage p + inner join rhnPackageName pn on pn.id = p.name_id + inner join rhnChecksumView c on p.checksum_id = c.id + where pn.name = :name and p.evr_id = lookup_evr(:epoch, :version, :release) and p.package_arch_id = lookup_package_arch(:arch) and (p.org_id = :org_id or (p.org_id is null and :org_id is null)) - and p.checksum_id = c.id """ def _diff_packages_process(self, chunk, channel_label): Time difference for rhel-x86_64-productivity-5 from disk (no new packages added in either run): real 10m45.416s vs real 0m5.818s I tracked the problem down to backend/satellite_tools/satsync.py and the _query_compare_packages statement. Using the 'lookup_package_name' function causes a sequence scan on rhnPackage, which makes the query quite slow: spaceschema=# explain analyze select p.id, c.checksum_type, c.checksum, p.path, p.package_size, spaceschema-# TO_CHAR(p.last_modified, 'YYYYMMDDHH24MISS') last_modified spaceschema-# from rhnPackage p, rhnChecksumView c spaceschema-# where p.name_id = lookup_package_name(E'python-imaging') spaceschema-# and p.evr_id = lookup_evr(NULL, E'1.1.5', E'5.el5') spaceschema-# and p.package_arch_id = lookup_package_arch(E'x86_64') spaceschema-# and (p.org_id = 1 or spaceschema(# (p.org_id is null and 1 is null)) spaceschema-# and p.checksum_id = c.id; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------- Nested Loop (cost=0.00..23072.52 rows=1 width=212) (actual time=455.849..737.877 rows=1 loops=1) -> Nested Loop (cost=0.00..23072.24 rows=1 width=213) (actual time=455.729..737.755 rows=1 loops=1) -> Seq Scan on rhnpackage p (cost=0.00..23063.58 rows=1 width=168) (actual time=455.699..737.723 rows=1 loops=1) Filter: ((org_id = 1::numeric) AND (name_id = lookup_package_name('python-imaging'::character varying, 0::num eric)) AND (evr_id = lookup_evr(NULL::character varying, '1.1.5'::character varying, '5.el5'::character varying)) AND (packa ge_arch_id = lookup_package_arch('x86_64'::character varying))) -> Index Scan using rhnchecksum_pk on rhnchecksum c (cost=0.00..8.64 rows=1 width=61) (actual time=0.025..0.025 r ows=1 loops=1) Index Cond: (c.id = p.checksum_id) -> Index Scan using rhn_checksumtype_id_pk on rhnchecksumtype ct (cost=0.00..0.27 rows=1 width=13) (actual time=0.026.. 0.028 rows=1 loops=1) Index Cond: (ct.id = c.checksum_type_id) Total runtime: 738.081 ms (9 rows) Replacing the 'lookup_package_name' call with an inner join drastically improves the performance of the query: spaceschema=# explain analyze select p.id, c.checksum_type, c.checksum, p.path, p.package_size, TO_CHAR(p.last_modified, 'YYYYMMDDHH24MISS') last_modified from rhnPackage p inner join rhnPackageName pn on pn.id = p.name_id inner join rhnChecksumView c on p.checksum_id = c.id where pn.name = 'python-imaging' and p.evr_id = lookup_evr(NULL, E'1.1.5', E'5.el5') and p.package_arch_id = lookup_package_arch(E'x86_64') and (p.org_id = 1 or (p.org_id is null and 1 is null)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..43.71 rows=1 width=212) (actual time=0.348..0.351 rows=1 loops=1) -> Nested Loop (cost=0.00..43.43 rows=1 width=213) (actual time=0.330..0.332 rows=1 loops=1) -> Nested Loop (cost=0.00..34.77 rows=1 width=168) (actual time=0.316..0.317 rows=1 loops=1) -> Index Scan using rhn_pn_name_uq on rhnpackagename pn (cost=0.00..8.27 rows=1 width=7) (actual time=0.033 ..0.034 rows=1 loops=1) Index Cond: ((name)::text = 'python-imaging'::text) -> Index Scan using rhn_package_nid_id_idx on rhnpackage p (cost=0.00..26.49 rows=1 width=175) (actual time =0.275..0.275 rows=1 loops=1) Index Cond: (p.name_id = pn.id) Filter: ((p.org_id = 1::numeric) AND (p.evr_id = lookup_evr(NULL::character varying, '1.1.5'::character varying, '5.el5'::character varying)) AND (p.package_arch_id = lookup_package_arch('x86_64'::character varying))) -> Index Scan using rhnchecksum_pk on rhnchecksum c (cost=0.00..8.64 rows=1 width=61) (actual time=0.012..0.012 r ows=1 loops=1) Index Cond: (c.id = p.checksum_id) -> Index Scan using rhn_checksumtype_id_pk on rhnchecksumtype ct (cost=0.00..0.27 rows=1 width=13) (actual time=0.003.. 0.004 rows=1 loops=1) Index Cond: (ct.id = c.checksum_type_id) Total runtime: 0.445 ms (13 rows) I noticed 'lookup_package_name' creates a new record in rhnPackageName if it's not found, but 'satellite-sync' will create that record just fine later on for non-existent package names (I assume from another call to 'lookup_package_name', I didn't actually look, just tested syncing with unique package names). Since that is the case, I don't see any detrimental effects from this fix. Let me know if you see anything wrong with it. /aron _______________________________________________ Spacewalk-devel mailing list Spacewalk-devel@redhat.com https://www.redhat.com/mailman/listinfo/spacewalk-devel