On 05/11/13 13:58, Dan McGee wrote: > On Mon, Nov 4, 2013 at 7:23 PM, Allan McRae <[email protected]> wrote: > >> Hi, >> >> We currently have a .db and .files databases, with .files being a >> superset of .db. >> >> An idea was formed on IRC to completely separate these. I.e. .db stays >> as it is and .files only includes the file lists. We would then add >> .source to include the source package information. I would set >> repo-add to automatically create all these files. >> >> We would then add something like "-S --refresh-files" and "-S >> --refresh-source" to download those files as a one off, printing a >> warning when using them if they are out of date compared to the repo. >> Another option is to use Usage as a flag for when to download them, but >> refreshing all those every update seems excessive. >> >> This would also allow us to have some basic pkgfile functionality in >> pacman (-So). >> >> So, there much to work out, but does the general idea sound good to people? >> >> >> No, this sounds like a step backwards to me, so -1 (multiplied by as many > times as I'm allowed to vote -1). > > For a while, repo-add didn't know how to create .files databases. This was > added in January 2011: > https://projects.archlinux.org/pacman.git/commit/scripts/repo-add.sh.in?id=eda4d9ec00be1108ab4336a438299a283c5a0a90 > > That allowed us to commit a large change to the way dbscripts generated > these package files (which was error-prone, slow, and they were not > immediately up-to-date like they are now): > https://projects.archlinux.org/dbscripts.git/commit/?id=fc6a6ab07bde03c7f20d5a4ed971f8e699ee9b20 > > Why did I start down this road? Because it was absolutely impossible to get > consistent, "transactional", database data in any way shape or form that > didn't require 82 special cases in Archweb to handle parsing and loading > the data into a database. Once I open a .files database file, I know I > don't need anything else to have a consistent view of that database. As > soon as we have to pry into two different files, things were an absolute > mess, and one has to cross-reference two different files, guess and pray > that the architectures are actually correct on the files data (because > there is no way to tell if you don't have the other data, keep this in > mind), and have no real way of telling which database file lags the other. > > I'm not sure what the rationale is for removing the non-files data from the > files databases. Does it make them notably larger or slower to process? >
The non-files data makes up ~5% of the files database. But I am not understanding your argument against this. My idea is to have repo-add ALWAYS create a .db and .files databases instead of having to run repo-add twice to generate the separate files. In that case I find it redundant to have the .db information within .files database. But I really want to implement repo-add generating/updating both the .db and .files databases in a single call regardless of what information stays in the .files database. I suppose this comes down to the following questions. Where should the source package information go? The .db file? At a rough guess, the PGP signature for the source package would increase the repo database by an extra 30-40%. So perhaps a separate .source db? If separate, what information should go there? And should there be a type of database containing ALL information? Allan
