On Sat, 12 Dec 2009, Nagy Gabor wrote:
On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
Regarding the stat() and access() operations I finally found out why
they happen exactly:
In case of corrupted db the sync, for example, directory might
contain files, not subdirectories. So in that case
_alpm_db_populate() just makes sure it's a directory. However
stat()ing thousands of files is too much of a price to pay.
Similarly, access() checks it is accessible by the user.
In the attached patch I have just removed the relevant lines, with
the following rationale: In the rare case of corrupted db, even if we
do open("sync/not_a_dir/depends") it will still fail and we'll catch
the failure there, no need to investigate the cause further, just
write a message like "couldn't access sync/not_a_dir/depends".
By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
running, I measure a nice performance boost on my old laptop: "pacman
-Q gdb" time is reduced from about 7s to 2.5s.
Hm. This is a nice time boost... Did you test this with other
operations, too?
I didn't time it, but strace shows this improvement applies to -Qi, -Si,
-Su as well. It doesn't show that much however because all these
operations actually read() thousands of files (depends, desc) which is
much worse than stat(). :-)
What do you think? Is it possible to remove those checks?
Dimitris
The best solution would be to rewrite our whole database crap as Dan
said. I am pretty sure that this patch would not cause any harm irl, but
Because I really like the ease of use of the current format, I'll try
improving things with minimum changes to it. If we can avoid a complete
backend rewrite with minor changes, that is a good think, isn't it?
our code would become a little bit more dangerous: As I see,
db_read(INFRQ_BASE) would become a ~NOP function and db_populate would
become a simple "ls" function (the only remaining sanity check is
splitname).
Exactly! Just a simple ls should be necessary, that was my initial
motivation. And I have thought of a way to even avoid that readdir(), but
I should get some measurements first.
Dimitris