On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
Hello list,
I have been investigating the slow performance of pacman regarding the cold
caches scenario and I'm trying to write some proof of concept code that
improves things a lot for some cases. However I need your help regarding some
facts I might have misunderstood, and any pointers to the source code you
also give me would also help a lot. I wouldn't like to lose time changing
stuff that would break current functionality. So here are some first
questions that come to mind, just by using strace:
When doing "pacman -Q blah" I can see that besides the getdents() syscalls in
/var/lib/pacman/local (probably caused by readdir()), there are also stat()
and access() calls for every single subdirectory. Why are the last ones
necessary? Isn't readdir enough?
The same goes when doing "pacman -S blah". But in that case it stat()'s both
'local' and 'sync' directories, so worst case is really bad, it will stat()
all contents of local, core, extra and community...
Regarding the stat() and access() operations I finally found out why they
happen exactly:
In case of corrupted db the sync, for example, directory might contain
files, not subdirectories. So in that case _alpm_db_populate() just makes
sure it's a directory. However stat()ing thousands of files is too much of
a price to pay. Similarly, access() checks it is accessible by the user.
In the attached patch I have just removed the relevant lines, with the
following rationale: In the rare case of corrupted db, even if we do
open("sync/not_a_dir/depends") it will still fail and we'll catch the
failure there, no need to investigate the cause further, just write a
message like "couldn't access sync/not_a_dir/depends".
By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I
measure a nice performance boost on my old laptop: "pacman -Q gdb" time is
reduced from about 7s to 2.5s.
What do you think? Is it possible to remove those checks?
Dimitris
P.S. Now all that remains is the depends/conflicts/requiredby stuff which
is by far the hardest... I'm still trying to decipher the patch
implementing REQUIREDBY that was posted earlier.diff --git a/lib/libalpm/be_files.c b/lib/libalpm/be_files.c
index 90e97a5..7d80ea7 100644
--- a/lib/libalpm/be_files.c
+++ b/lib/libalpm/be_files.c
@@ -222,8 +222,6 @@ int _alpm_db_populate(pmdb_t *db)
{
int count = 0;
struct dirent *ent = NULL;
- struct stat sbuf;
- char path[PATH_MAX];
const char *dbpath;
DIR *dbdir;
@@ -243,12 +241,6 @@ int _alpm_db_populate(pmdb_t *db)
if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0) {
continue;
}
- /* stat the entry, make sure it's a directory */
- snprintf(path, PATH_MAX, "%s%s", dbpath, name);
- if(stat(path, &sbuf) != 0 || !S_ISDIR(sbuf.st_mode)) {
- continue;
- }
-
pkg = _alpm_pkg_new();
if(pkg == NULL) {
closedir(dbdir);
@@ -337,13 +329,6 @@ int _alpm_db_read(pmdb_t *db, pmpkg_t *info, pmdbinfrq_t
inforeq)
pkgpath = get_pkgpath(db, info);
- if(access(pkgpath, F_OK)) {
- /* directory doesn't exist or can't be opened */
- _alpm_log(PM_LOG_DEBUG, "cannot find '%s-%s' in db '%s'\n",
- info->name, info->version, db->treename);
- goto error;
- }
-
/* DESC */
if(inforeq & INFRQ_DESC) {
snprintf(path, PATH_MAX, "%sdesc", pkgpath);