Re: [Spacewalk-devel] [PATCH] Speedup Taskomatic cleanup-query X-fold!?!

2011-08-23 Thread Tomas Lestach
On Mon, Aug 22, 2011 at 01:45:11PM +0200, Jonathan Hoser wrote:
 Hello Tomas,
 
 yes, of course I know that an 'ORDER BY' will actually increase the CPU
 effort for the query.
 However, my consideration was: Why is the original query running eternally?
 My thoughts were: The IDs to be filtered against in the DELETE [...]
 WHERE id NOT IN ([subselect]) were randomly ordered;
 I know that working on a non-sorted data input is requiring much more
 effort than just going for a quick sort and working on the product.
 So there's the reason for that.
 
 But let me get these 'explain's for you:
 ###
 ORIGINAL:
 ###
 spacewalk_1_5=# explain Delete from rhnPackageChangeLogData WHERE id NOT
 IN ( SELECT changelog_data_id FROM rhnPackageChangeLogRec);
   QUERY PLAN
 --
  Seq Scan on rhnpackagechangelogdata  (cost=108146.08..11134104860.94
 rows=151984 width=6)
Filter: (NOT (SubPlan 1))
SubPlan 1
  -  Materialize  (cost=108146.08..170240.97 rows=4465189 width=8)
-  Seq Scan on rhnpackagechangelogrec  (cost=0.00..86237.89
 rows=4465189 width=8)
 (5 rows)
 
 
 MODIFICATION:
 
 spacewalk_1_5=# explain Delete from rhnPackageChangeLogData WHERE id NOT
 IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec ORDER
 BY changelog_data_id ASC);
  QUERY PLAN
 
 -
 ---
  Seq Scan on rhnpackagechangelogdata  (cost=104819.05..115188.64
 rows=151984 width=6)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
  -  Sort  (cost=104433.39..104626.22 rows=77132 width=8)
Sort Key: rhnpackagechangelogrec.changelog_data_id
-  HashAggregate  (cost=97400.86..98172.18 rows=77132 width=8)
  -  Seq Scan on rhnpackagechangelogrec
 (cost=0.00..86237.89 rows=4465189 width=
 8)
 (7 rows)
 
 

All right.
I see there is some speed improvement in the DISTINCT variant for Postgresql.
We've tried to check explain plans for Oracle and there was no difference.
Committing as:
c529c7ea18211e9b5f0ee95469625893d9f8e30e

The fix will be available in nightly repo in spacewalk-java-1.6.28-1.

Regards,
Tomas
--
Tomas Lestach
RHN Satellite Engineering, Red Hat


 Ok, so I went for an 'EXPLAIN ANALYZE', on three queries:
 The Original, my 'original' modification and modification without the
 'ORDER BY';
 Turns out, without the ORDER BY, the query is another 600msec faster,
 while the 'Explain Analyze' of the original is again *still* executing
 (since 5++ Minutes).
 So my bad for not testing without the ORDER BY - a hunch of mine in the
 wrong direction.
 
 Anyhow: Here are the 'EXPLAIN ANALYSE' of Mod, and Mod-without-ORDER-BY:
 
 #
 Modified:
 #
 spacewalk_1_5=# explain analyze Delete from rhnPackageChangeLogData
 WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
 rhnPackageChangeLogRec ORDER BY changelog_data_id ASC);
 
 QUERY PLAN
 
 -
 -
  Seq Scan on rhnpackagechangelogdata  (cost=104819.05..115188.64
 rows=151984 width=6) (actual tim
 e=3476.682..3476.682 rows=0 loops=1)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
  -  Sort  (cost=104433.39..104626.22 rows=77132 width=8) (actual
 time=3149.620..3192.113 row
 s=301937 loops=1)
Sort Key: rhnpackagechangelogrec.changelog_data_id
Sort Method:  quicksort  Memory: 26442kB
-  HashAggregate  (cost=97400.86..98172.18 rows=77132
 width=8) (actual time=2473.983.
 .2553.576 rows=301937 loops=1)
  -  Seq Scan on rhnpackagechangelogrec
 (cost=0.00..86237.89 rows=4465189 width=
 8) (actual time=0.008..730.139 rows=4466337 loops=1)
  Total runtime: 3483.172 ms
 (9 rows)
 
 
 Modified without ORDER-BY
 
 spacewalk_1_5=# explain analyze Delete from rhnPackageChangeLogData
 WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
 rhnPackageChangeLogRec);
 
 QUERY PLAN
 
 -
 ---
  Seq Scan on rhnpackagechangelogdata  (cost=98365.01..108734.60
 rows=151984 width=6) (actual time
 =2869.294..2869.294 rows=0 loops=1)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
  -  HashAggregate  (cost=97400.86..98172.18 rows=77132 width=8)
 (actual time=2494.079..2577.
 750 rows=301937 loops=1)
-  Seq Scan on rhnpackagechangelogrec  (cost=0.00..86237.89
 rows=4465189 width=8) (ac
 tual time=0.011..741.078 rows=4466337 loops=1)
  Total runtime: 2873.545 ms
 (6 rows)
 
 ##
 Original...
 ##

Re: [Spacewalk-devel] [PATCH] Filters on reposync

2011-08-23 Thread Baptiste AGASSE
Hi Jan,

Following your advices, i modified my code:
Now, it get include and exclude filters from database:

I ran the following SQL query on my Oracle XE Database:
ALTER TABLE rhnContentSource ADD (include_filter VARCHAR(255), exclude_filter 
VARCHAR(255));

Regards.

Baptiste

- Mail original -
De: Baptiste AGASSE baptiste.aga...@lyra-network.com
À: spacewalk-devel@redhat.com
Envoyé: Jeudi 18 Août 2011 22:50:34
Objet: Re: [Spacewalk-devel] [PATCH] Filters on reposync

Hi Jan,
Ok, i can take a look on this next week.

Regards.

Baptiste

- Mail original -
De: Jan Pazdziora jpazdzi...@redhat.com
À: spacewalk-devel@redhat.com
Envoyé: Jeudi 18 Août 2011 13:39:57
Objet: Re: [Spacewalk-devel] [PATCH] Filters on reposync

On Wed, Aug 17, 2011 at 08:23:36PM +0200, Baptiste AGASSE wrote:
 
 Following your advices I have modified my code:
 - You can now include and / or exclude packages (with --include and / or 
 --exclude options)
 - Include filter takes priority over exclude filter: if one package meet 
 'include' and 'exclude' rules, it will be included
   eg:
 exclude = [ 'openoffice.org-langpack-*', ...]
 include = [ 'openoffice.org-langpack-en-*', ...]
 
 - Package filtering is in yum_src.py
 - Yum dependencies resolver is now used to find selected packages dependencies
 - All versions of the packages excluded by a filter is now deleted from DB 
 and filesystem
 - Print elapsed time at end of sync
 
 Any comments are welcome.

I don't really like the fact that the exclude/include options are
specified on the spacewalk-repo-sync runtime, rather than being
properties of the repository. In other words -- these exclude/include
lists should be specified in the database, so that they would be used
any time spacewalk-repo-sync is run, no matter if it is run from the
command line or via scheduled event by taskomatic.

Now that the core functionality is in place, could you amend it some
more and have the lists stored in some database table and used from
there?

-- 
Jan Pazdziora
Principal Software Engineer, Satellite Engineering, Red Hat

___
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel
diff --git a/backend/satellite_tools/repo_plugins/yum_src.py b/backend/satellite_tools/repo_plugins/yum_src.py
index bfc6161..9871a32 100644
--- a/backend/satellite_tools/repo_plugins/yum_src.py
+++ b/backend/satellite_tools/repo_plugins/yum_src.py
@@ -74,14 +74,15 @@ class YumUpdateMetadata(UpdateMetadata):
 no = self._no_cache.setdefault(file['name'], set())
 no.add(un)
 
-class ContentSource:
+class ContentSource(yum.YumBase):
 url = None
 name = None
-repo = None
 cache_dir = '/var/cache/rhn/reposync/'
-def __init__(self, url, name):
-self.url = url
-self.name = name
+repo_id = None
+filters = {'include': [], 'exclude': []}
+
+def __init__(self, url, name, filters = None ):
+yum.YumBase.__init__(self)
 self._clean_cache(self.cache_dir + name)
 
 # read the proxy configuration in /etc/rhn/rhn.conf
@@ -97,14 +98,14 @@ class ContentSource:
 else:
 self.proxy_url = None
 
-def list_packages(self):
- list packages
-repo = yum.yumRepo.YumRepository(self.name)
-self.repo = repo
+if filters:
+self.filters = filters
+
+repo = yum.yumRepo.YumRepository(name)
 repo.cache = 0
 repo.metadata_expire = 0
-repo.mirrorlist = self.url
-repo.baseurl = [self.url]
+repo.mirrorlist = url
+repo.baseurl = [url]
 repo.basecachedir = self.cache_dir
 if self.proxy_url is not None:
 repo.proxy = self.proxy_url
@@ -113,13 +114,36 @@ class ContentSource:
 warnings.disable()
 repo.baseurlSetup()
 warnings.restore()
-
 repo.setup(False)
-sack = repo.getPackageSack()
-sack.populate(repo, 'metadata', None, 0)
-list = sack.returnPackages()
-to_return = []
-for pack in list:
+repos = self.repos.findRepos('*')
+for rep in repos:
+self.repos.disableRepo(rep.id)
+self.repos.delete(rep.id)
+
+self.repo_id = repo.id
+self.repos.add(repo)
+self.pkgSack = self.repos.getRepo(self.repo_id).getPackageSack()
+self.pkgSack.populate(self.repos.getRepo(self.repo_id), 'metadata', None, 0)
+
+def getName(self):
+return self.repos.getRepo(self.repo_id).id
+  
+def getUrl(self):
+return self.repos.getRepo(self.repo_id).mirrorlist
+
+def getFilters(self):
+return self.filters
+
+def getRepo(self):
+return self.repos.getRepo(self.repo_id)
+
+def list_packages(self):
+ list packages
+return self._list_packages(self.pkgSack.returnPackages())
+
+def 

Re: [Spacewalk-devel] Fedora 15 rhnreg_ks error with nightly build

2011-08-23 Thread Miroslav Suchý
On 08/19/2011 06:56 PM, Shelby, James wrote:
 I'm using a simple default kickstart that seems to fail when it gets to the 
 registration part.  If I reboot the system and then run the rhnreg_ks it 
 works fine but just seems to error if run during the kickstart.  Any ideas?  
 I have checked the time on both systems to be insync.
 
 [Fri Aug 19 16:45:56 2011] up2date 
 Traceback (most recent call last):
   File /usr/sbin/rhnreg_ks, line 205, in module
 cli.run()
   File /usr/share/rhn/up2date_client/rhncli.py, line 74, in run
 sys.exit(self.main() or 0)
   File /usr/sbin/rhnreg_ks, line 99, in main
 hardwareList = hardware.Hardware()
   File /usr/share/rhn/up2date_client/hardware.py, line 662, in Hardware
 allhw = get_devices()
   File /usr/share/rhn/up2date_client/hardware_gudev.py, line 45, in 
 get_devices
 'desc': _get_device_desc(device),
   File /usr/share/rhn/up2date_client/hardware_gudev.py, line 306, in 
 _get_device_desc
 (vendor_id, model_id) = device.get_property('product').split('/')[:2]
 type 'exceptions.AttributeError': 'NoneType' object has no attribute 'split'


This was fixed in commit ebd9948a92945026db88932eefd23c4c0c7738ea
And should be fixed since rhn-client-tools-1.5.2-1.

You are using rhn-client-tools-1.3.12-2.fc15 which is in Fedora and
which have this bug.


-- 
Miroslav Suchy
Red Hat Satellite Engineering

___
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel