On Sat, 12 Dec 2009, Nagy Gabor wrote:
On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
Regarding the stat() and access() operations I finally found out why
they happen exactly:

In case of corrupted db the sync, for example, directory might
contain files, not subdirectories. So in that case
_alpm_db_populate() just makes sure it's a directory. However
stat()ing thousands of files is too much of a price to pay.
Similarly, access() checks it is accessible by the user.

In the attached patch I have just removed the relevant lines, with
the following rationale: In the rare case of corrupted db, even if we
do open("sync/not_a_dir/depends") it will still fail and we'll catch
the failure there, no need to investigate the cause further, just
write a message like "couldn't access sync/not_a_dir/depends".

By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
running, I measure a nice performance boost on my old laptop: "pacman
-Q gdb" time is reduced from about 7s to 2.5s.

Hm. This is a nice time boost... Did you test this with other
operations, too?

I didn't time it, but strace shows this improvement applies to -Qi, -Si, -Su as well. It doesn't show that much however because all these operations actually read() thousands of files (depends, desc) which is much worse than stat(). :-)


What do you think? Is it possible to remove those checks?
Dimitris

The best solution would be to rewrite our whole database crap as Dan said. I am pretty sure that this patch would not cause any harm irl, but

Because I really like the ease of use of the current format, I'll try improving things with minimum changes to it. If we can avoid a complete backend rewrite with minor changes, that is a good think, isn't it?

our code would become a little bit more dangerous: As I see, db_read(INFRQ_BASE) would become a ~NOP function and db_populate would become a simple "ls" function (the only remaining sanity check is splitname).

Exactly! Just a simple ls should be necessary, that was my initial motivation. And I have thought of a way to even avoid that readdir(), but I should get some measurements first.


Dimitris

Reply via email to