Re: [Rpm-maint] Rpm Database musings

2013-04-19 Thread Jan Zelený
On 19. 4. 2013 at 12:08:42, Panu Matilainen wrote: > On 04/18/2013 03:50 PM, Michael Schroeder wrote: > > On Thu, Apr 18, 2013 at 03:30:52PM +0300, Panu Matilainen wrote: > >> BTW there seems to be a bug in newrpmdb, related to the pkgidx/datidx > >> handling for the cases where ovldata is non-zero

Re: [Rpm-maint] Rpm Database musings

2013-04-19 Thread Panu Matilainen
On 04/18/2013 03:50 PM, Michael Schroeder wrote: On Thu, Apr 18, 2013 at 03:30:52PM +0300, Panu Matilainen wrote: BTW there seems to be a bug in newrpmdb, related to the pkgidx/datidx handling for the cases where ovldata is non-zero. It's masked by a typo/thinko in the testit.c header data size

Re: [Rpm-maint] Rpm Database musings

2013-04-18 Thread Panu Matilainen
On 04/18/2013 12:04 PM, Michael Schroeder wrote: On Wed, Apr 17, 2013 at 05:17:42PM +0300, Panu Matilainen wrote: Time for a status report, just to let you know I haven't forgotten or abandoned this "project". That's good to hear ;-) All direct BDB ties in rpmdb.c were cut out last month, be

Re: [Rpm-maint] Rpm Database musings

2013-04-18 Thread Michael Schroeder
On Wed, Apr 17, 2013 at 05:17:42PM +0300, Panu Matilainen wrote: > Time for a status report, just to let you know I haven't forgotten or > abandoned this "project". That's good to hear ;-) > All direct BDB ties in rpmdb.c were cut out last month, been pondering > about the backend API since the

Re: [Rpm-maint] Rpm Database musings

2013-04-17 Thread Panu Matilainen
On 03/09/2013 12:30 PM, Panu Matilainen wrote: On 03/08/2013 04:37 PM, Michael Schroeder wrote: Anyway, attached is a little Packages database implementation I did yesterday and today. The code is very careful not to destroy things if the database is corrupt, i.e. it makes sure that it does not

Re: [Rpm-maint] Rpm Database musings

2013-04-02 Thread Panu Matilainen
On 04/02/2013 05:17 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: I think strings are fine, just thought to note that there are those couple of non-string indexes which we need to do something about. Sigmd5 is probably better just axed, Installtid

Re: [Rpm-maint] Rpm Database musings

2013-04-02 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: > I think strings are fine, just thought to note that there are those couple > of non-string indexes which we need to do something about. Sigmd5 is > probably better just axed, Installtid we might want to keep but that can > just a

Re: [Rpm-maint] Rpm Database musings

2013-03-27 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: > What I've had in mind is lumping all the index stuff (possibly along with > actual data for the critical parts) into a single file so there'd be just > two files db-related files to worry about. But for now, I'm just happy to > h

Re: [Rpm-maint] Rpm Database musings

2013-03-16 Thread Panu Matilainen
On 03/14/2013 05:45 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: On 03/14/2013 01:10 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: Yup, detecting and automatically regenerating out-of-sync indexes

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: > On 03/14/2013 01:10 PM, Michael Schroeder wrote: >> On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: >>> Yup, detecting and automatically regenerating out-of-sync indexes is pretty >>> much a must (yet something we c

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Panu Matilainen
On 03/14/2013 01:10 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: Yup, detecting and automatically regenerating out-of-sync indexes is pretty much a must (yet something we currently dont have either, sigh) Some other "issues" in the current implem

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: > Yup, detecting and automatically regenerating out-of-sync indexes is pretty > much a must (yet something we currently dont have either, sigh) > > Some other "issues" in the current implementation AFAICS: > - The ability to grab all

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Panu Matilainen
On 03/13/2013 03:19 PM, Michael Schroeder wrote: On Fri, Mar 08, 2013 at 03:37:12PM +0100, Michael Schroeder wrote: I kind of like to have all the data in one file. Anyway, attached is a little Packages database implementation I did yesterday and today. Attached is the current version of my l

Re: [Rpm-maint] Rpm Database musings

2013-03-13 Thread Michael Schroeder
On Fri, Mar 08, 2013 at 03:37:12PM +0100, Michael Schroeder wrote: > I kind of like to have all the data in one file. > > Anyway, attached is a little Packages database implementation I did yesterday > and today. Attached is the current version of my little experiments. The main changes are: - I

Re: [Rpm-maint] Rpm Database musings

2013-03-11 Thread Panu Matilainen
On 03/11/2013 02:14 PM, Michael Schroeder wrote: On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote: It has its advantages of course. Having headers spread in different files would probably make some things easier but also slower, so you'd really want to avoid having to go to the he

Re: [Rpm-maint] Rpm Database musings

2013-03-11 Thread Michael Schroeder
On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote: > It has its advantages of course. Having headers spread in different files > would probably make some things easier but also slower, so you'd really > want to avoid having to go to the headers. I did a quick test-case in > python

Re: [Rpm-maint] Rpm Database musings

2013-03-09 Thread Panu Matilainen
On 03/09/2013 03:19 PM, Thierry Vignaud wrote: On 7 March 2013 21:28, Panu Matilainen wrote: I wouldn't worry too much about hash algorithms and storage optimization at this point: that's something that can be tweaked and tuned over time as long as the cache structure is internally versioned so

Re: [Rpm-maint] Rpm Database musings

2013-03-09 Thread Thierry Vignaud
On 7 March 2013 21:28, Panu Matilainen wrote: > I wouldn't worry too much about hash algorithms and storage optimization at > this point: that's something that can be tweaked and tuned over time as long > as the cache structure is internally versioned so we know when we need to > rebuild it. > > R

Re: [Rpm-maint] Rpm Database musings

2013-03-09 Thread Panu Matilainen
On 03/08/2013 04:37 PM, Michael Schroeder wrote: Anyway, attached is a little Packages database implementation I did yesterday and today. The code is very careful not to destroy things if the database is corrupt, i.e. it makes sure that it does not overwrite data. Apart from implementation deta

Re: [Rpm-maint] Rpm Database musings

2013-03-08 Thread Panu Matilainen
On 03/08/2013 04:37 PM, Michael Schroeder wrote: On Thu, Mar 07, 2013 at 10:28:41PM +0200, Panu Matilainen wrote: Right now I'm more interested in what the overall design of this all might look like. Like said, I'd like to see the cache be a "read-only media" so there are zero locking needed for

Re: [Rpm-maint] Rpm Database musings

2013-03-08 Thread Michael Schroeder
On Thu, Mar 07, 2013 at 10:28:41PM +0200, Panu Matilainen wrote: > Right now I'm more interested in what the overall design of this all might > look like. Like said, I'd like to see the cache be a "read-only media" so > there are zero locking needed for queries that only need data from the > cac

Re: [Rpm-maint] Rpm Database musings

2013-03-07 Thread Panu Matilainen
On 03/05/2013 08:11 PM, Michael Schroeder wrote: On Mon, Mar 04, 2013 at 12:22:31PM +0100, Michael Schroeder wrote: For 2000 packages we have about... ugh, that's actually hard to tell as the avg and the median differ that much. Let's use the average: 2000 * 130 = 26 files. I would hash the

Re: [Rpm-maint] Rpm Database musings

2013-03-05 Thread Michael Schroeder
On Mon, Mar 04, 2013 at 12:22:31PM +0100, Michael Schroeder wrote: > For 2000 packages we have about... ugh, that's actually hard > to tell as the avg and the median differ that much. Let's > use the average: 2000 * 130 = 26 files. > > I would hash them using just a 32-bit number for each hash

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Panu Matilainen
On 03/04/2013 01:23 PM, Michael Schroeder wrote: On Mon, Mar 04, 2013 at 12:19:34PM +0100, Ales Kozumplik wrote: On 03/04/2013 11:21 AM, Michael Schroeder wrote: Actually libsolv can do a "incremental" update if it has an old solv file available, i.e. it takes the unchanged content from the old

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Panu Matilainen
On 03/04/2013 12:21 PM, Michael Schroeder wrote: On Sun, Mar 03, 2013 at 05:46:10PM +0200, Panu Matilainen wrote: Right, in this context compression does indeed seem quite attractive. When we talked about this in the devconf, I was thinking about the way rpm itself currently keeps (re)loading th

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
On Mon, Mar 04, 2013 at 03:12:51PM +0100, Florian Festi wrote: > On 03/01/2013 05:32 PM, Michael Schroeder wrote: > > (the median is quite different from the avg, that means that > > some packages are quite big.) > > ... > > > - That means, if I have 2000 packages installed on my system > >

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Florian Festi
On 03/01/2013 05:32 PM, Michael Schroeder wrote: > (the median is quite different from the avg, that means that > some packages are quite big.) ... > - That means, if I have 2000 packages installed on my system > (which is about the real number), the concatenated headers will > use 20 MBy

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Ales Kozumplik
On 03/04/2013 12:23 PM, Michael Schroeder wrote: On Mon, Mar 04, 2013 at 12:19:34PM +0100, Ales Kozumplik wrote: On 03/04/2013 11:21 AM, Michael Schroeder wrote: Actually libsolv can do a "incremental" update if it has an old solv file available, i.e. it takes the unchanged content from the old

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
On Mon, Mar 04, 2013 at 12:19:34PM +0100, Ales Kozumplik wrote: > On 03/04/2013 11:21 AM, Michael Schroeder wrote: >> Actually libsolv can do a "incremental" update if it has an old >> solv file available, i.e. it takes the unchanged content from the >> old solv file and only queries new headers fr

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
More numbers ahead: rpm's scanned: 28423 uncompressed: - unc: sum: 777290960, avg: 27348, median: 10600 clunc: sum: 168381220, avg: 5925, median: 2707 flunc: sum: 553276838, avg: 19466, median: 2102 xxunc: sum: 56542438, avg: 1990, median: 1684 bnunc: sum: 69125101, avg: 24

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Ales Kozumplik
On 03/04/2013 11:21 AM, Michael Schroeder wrote: Actually libsolv can do a "incremental" update if it has an old solv file available, i.e. it takes the unchanged content from the old solv file and only queries new headers from the rpm database. Ales doesn't yet use this in method in hawkey. Did

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
On Sun, Mar 03, 2013 at 05:46:10PM +0200, Panu Matilainen wrote: > Right, in this context compression does indeed seem quite attractive. When > we talked about this in the devconf, I was thinking about the way rpm > itself currently keeps (re)loading the headers from Packages and adding > repeat

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Jan Zeleny
Dne Ne 3. března 2013 17:46:10, Panu Matilainen napsal(a): > On 03/01/2013 06:32 PM, Michael Schroeder wrote: > > Hi Panu et al, > > > > here are some numbers/musings about changing the database > > implementation to just one single packages file: > > > > - I assume that we still want to store al

Re: [Rpm-maint] Rpm Database musings

2013-03-03 Thread Panu Matilainen
On 03/01/2013 06:32 PM, Michael Schroeder wrote: Hi Panu et al, here are some numbers/musings about changing the database implementation to just one single packages file: - I assume that we still want to store all the headers (in some format) anyway. Nod, I think the headers need to stay,

[Rpm-maint] Rpm Database musings

2013-03-01 Thread Michael Schroeder
Hi Panu et al, here are some numbers/musings about changing the database implementation to just one single packages file: - I assume that we still want to store all the headers (in some format) anyway. - I checked all the headers of the i586/noarch packages from FC18 to get some understandi