On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: > Yup, detecting and automatically regenerating out-of-sync indexes is pretty > much a must (yet something we currently dont have either, sigh) > > Some other "issues" in the current implementation AFAICS: > - The ability to grab all keys of an index is missing, which would be > needed for the newish index iterator API. I always had the feeling that API > might come back to bite us at some point...
I already added both rpmidxList() and rpmpkgList() last night. ;) > - Index keys are limited to strings whereas we currently have others too, > but then all the actually interesting indexes have string keys, and we > might well be able just to eliminate the others (or convert the data into > strings) Yes, I noticed that after checking rpm's current database code. I can easily switch the rpmidx functions to use binary as keys if you like, it just makes the rpmidxList function a bit awkward as it can no longer return an array of strings. > BTW shouldn't those h2be() and be2h() calls be htonl() and ntohl() instead? Yes, we could use those instead. I just didn't like to include the "arpa/inet.h" header file, it kinda felt wrong. There's also htobe32/be32toh in endian.h if we define _BSD_SOURCE; that seems to be a better choice. As I wasn't sure what to do I decided to postpone the issue by using my own inline functions for now ;) > The idea seems to be keeping the database and indexes in big-endian, ie > network byte order (which is good IMO), but currently its unconditionally > byteswapping so big-endian system would have the db's in little endian > format and little endian systems in big endian. Or am I totally missing > something here? Yes, the code always uses big endian. It doesn't unconditionally swap. (It also does unaligned reads/writes, but we don't really need that.) Coming back to automatically regenerating of out-of-sync indexes, there's still another way do the implementation: keep those indexes in memory and don't store them to disk at all. This means that the indexes need to be generated on the fly at first access by reading all header, it thus means we need to additionaly store a stripped version of each header that just contains the interesting bits. Advantages: - just one single database file - no out-of-sync indexes possible Disadvantage: - needs a bit of time to generate the in-core indexes For my system (2102 installed rpms) the stripped headers would be about 2.2 MBytes to read, that takes about .34 seconds with my slow disk and dropped caches, which is quite noticable. Cheers, Michael. -- Michael Schroeder m...@suse.de SUSE LINUX Products GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} _______________________________________________ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint