On Tue, Mar 14, 2006 at 04:52:14PM +0200, tvali wrote: > > You're talking about the cache, take a look at the cache subsystem and > > write a mysql module for it. This will never become a default though (we > > would get killed if portage starts to depend on mysql). > > I think that it should not become default as mysql module, but if it > is working, it should become default as "portable" sql module. > > # emerge sqlite pysqlite > > I havent used sqlite, but it seems to be small and usable. I think > that it should start with it. > > I think that portage should *support* sql by default, but of course it > should not be default before it's clear that many people like it and > use it. What is imho more important is how to make one usable > interface, which would cover both fs and sql portage db's so that > development didnt go into two products.
See the restrictions framework I've started- http://gentooexperimental.org/~ferringb/blog/archives/2005-07.html#e2005-07-13T01_21_42.txt http://gentooexperimental.org/~ferring/bzr/pkgcore/dev-notes/framework/restrictions Short version is that converting to sql internally sucks badly since you'll have to parse (ad hoc) sql statements for any file based backend. Using sql directly in portage requires encapsulating the sql code so that rdbms syntax differences (replace comes to mind) can be worked around... Re: rdbms being faster then an on disk file db... it's only faster in certain cases. Properly designed/coded backends, RDBMS is _only_ faster when it's returning N records when comparing it to a local file db. As to why adding rdbms into stable is a bad idea right now, the problem is in querying; you _could_ add a sql backend (pretty easy, 2.1 ships with a sql_template and sqlite backend from my earlier work), but it'll actually be slower. Portage does cache lookups individually; want the data for all bsdiff versions? portage does thus- keys=[] for x in portdb.cp_all("dev-util/bsdiff"): keys.append(portdb.aux_get(x, ["DEPENDS"])) Each lookup is a seperate call- there is no way to leverage rdbms speed for N record return if the calling api is (effectively) single row queries. To fully leverage a rdbms backend, need to restructure portage calls so that it's dealing in lists instead of individual elements- fex, under the rewrite repository.match(atom("dev-util/bsdiff")) Via that (and the restriction framework it uses) the api calls are designed so that rdbms can shine; instead of N calls, the repository/cache backend can convert the restrictions into a sql statement and run _one_ search. Finally...rdbms still has problems. If the repository isn't 'frozen' (eg, it can regen it's metadata, as all portage trees in stable currently can) you cannot rely on the cache backend aside from doing random access lookups in it. Why? Cache holds dev-util/bsdiff-4.2 and dev-util/bsdiff-4.3, but not dev-util/bsdiff-4.4 . If you hand off to the cache backend, it'll return just those two, when it should return all 3. ~harring
pgplQtavjLmlz.pgp
Description: PGP signature