Re: [Rpm-maint] Rpm Database musings

2013-04-19 Thread Panu Matilainen
On 04/18/2013 03:50 PM, Michael Schroeder wrote: On Thu, Apr 18, 2013 at 03:30:52PM +0300, Panu Matilainen wrote: BTW there seems to be a bug in newrpmdb, related to the pkgidx/datidx handling for the cases where ovldata is non-zero. It's masked by a typo/thinko in the testit.c header data size

Re: [Rpm-maint] Rpm Database musings

2013-04-19 Thread Jan Zelený
On 19. 4. 2013 at 12:08:42, Panu Matilainen wrote: On 04/18/2013 03:50 PM, Michael Schroeder wrote: On Thu, Apr 18, 2013 at 03:30:52PM +0300, Panu Matilainen wrote: BTW there seems to be a bug in newrpmdb, related to the pkgidx/datidx handling for the cases where ovldata is non-zero. It's

Re: [Rpm-maint] Rpm Database musings

2013-04-18 Thread Michael Schroeder
On Wed, Apr 17, 2013 at 05:17:42PM +0300, Panu Matilainen wrote: Time for a status report, just to let you know I haven't forgotten or abandoned this project. That's good to hear ;-) All direct BDB ties in rpmdb.c were cut out last month, been pondering about the backend API since then.

Re: [Rpm-maint] Rpm Database musings

2013-04-18 Thread Panu Matilainen
On 04/18/2013 12:04 PM, Michael Schroeder wrote: On Wed, Apr 17, 2013 at 05:17:42PM +0300, Panu Matilainen wrote: Time for a status report, just to let you know I haven't forgotten or abandoned this project. That's good to hear ;-) All direct BDB ties in rpmdb.c were cut out last month,

Re: [Rpm-maint] Rpm Database musings

2013-04-17 Thread Panu Matilainen
On 03/09/2013 12:30 PM, Panu Matilainen wrote: On 03/08/2013 04:37 PM, Michael Schroeder wrote: Anyway, attached is a little Packages database implementation I did yesterday and today. The code is very careful not to destroy things if the database is corrupt, i.e. it makes sure that it does not

Re: [Rpm-maint] Rpm Database musings

2013-04-02 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: I think strings are fine, just thought to note that there are those couple of non-string indexes which we need to do something about. Sigmd5 is probably better just axed, Installtid we might want to keep but that can just as

Re: [Rpm-maint] Rpm Database musings

2013-04-02 Thread Panu Matilainen
On 04/02/2013 05:17 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: I think strings are fine, just thought to note that there are those couple of non-string indexes which we need to do something about. Sigmd5 is probably better just axed, Installtid

Re: [Rpm-maint] Rpm Database musings

2013-03-27 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: What I've had in mind is lumping all the index stuff (possibly along with actual data for the critical parts) into a single file so there'd be just two files db-related files to worry about. But for now, I'm just happy to have

Re: [Rpm-maint] Rpm Database musings

2013-03-16 Thread Panu Matilainen
On 03/14/2013 05:45 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: On 03/14/2013 01:10 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: Yup, detecting and automatically regenerating out-of-sync indexes

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Panu Matilainen
On 03/13/2013 03:19 PM, Michael Schroeder wrote: On Fri, Mar 08, 2013 at 03:37:12PM +0100, Michael Schroeder wrote: I kind of like to have all the data in one file. Anyway, attached is a little Packages database implementation I did yesterday and today. Attached is the current version of my

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: Yup, detecting and automatically regenerating out-of-sync indexes is pretty much a must (yet something we currently dont have either, sigh) Some other issues in the current implementation AFAICS: - The ability to grab all keys

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Panu Matilainen
On 03/14/2013 01:10 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: Yup, detecting and automatically regenerating out-of-sync indexes is pretty much a must (yet something we currently dont have either, sigh) Some other issues in the current

Re: [Rpm-maint] Rpm Database musings

2013-03-14 Thread Michael Schroeder
On Thu, Mar 14, 2013 at 03:33:44PM +0200, Panu Matilainen wrote: On 03/14/2013 01:10 PM, Michael Schroeder wrote: On Thu, Mar 14, 2013 at 10:55:07AM +0200, Panu Matilainen wrote: Yup, detecting and automatically regenerating out-of-sync indexes is pretty much a must (yet something we currently

Re: [Rpm-maint] Rpm Database musings

2013-03-13 Thread Michael Schroeder
On Fri, Mar 08, 2013 at 03:37:12PM +0100, Michael Schroeder wrote: I kind of like to have all the data in one file. Anyway, attached is a little Packages database implementation I did yesterday and today. Attached is the current version of my little experiments. The main changes are: - I

Re: [Rpm-maint] Rpm Database musings

2013-03-11 Thread Michael Schroeder
On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote: It has its advantages of course. Having headers spread in different files would probably make some things easier but also slower, so you'd really want to avoid having to go to the headers. I did a quick test-case in python

Re: [Rpm-maint] Rpm Database musings

2013-03-09 Thread Thierry Vignaud
On 7 March 2013 21:28, Panu Matilainen pmati...@laiskiainen.org wrote: I wouldn't worry too much about hash algorithms and storage optimization at this point: that's something that can be tweaked and tuned over time as long as the cache structure is internally versioned so we know when we need

Re: [Rpm-maint] Rpm Database musings

2013-03-05 Thread Michael Schroeder
On Mon, Mar 04, 2013 at 12:22:31PM +0100, Michael Schroeder wrote: For 2000 packages we have about... ugh, that's actually hard to tell as the avg and the median differ that much. Let's use the average: 2000 * 130 = 26 files. I would hash them using just a 32-bit number for each hash

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Jan Zeleny
Dne Ne 3. března 2013 17:46:10, Panu Matilainen napsal(a): On 03/01/2013 06:32 PM, Michael Schroeder wrote: Hi Panu et al, here are some numbers/musings about changing the database implementation to just one single packages file: - I assume that we still want to store all the

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
On Sun, Mar 03, 2013 at 05:46:10PM +0200, Panu Matilainen wrote: Right, in this context compression does indeed seem quite attractive. When we talked about this in the devconf, I was thinking about the way rpm itself currently keeps (re)loading the headers from Packages and adding repeated

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
On Mon, Mar 04, 2013 at 12:19:34PM +0100, Ales Kozumplik wrote: On 03/04/2013 11:21 AM, Michael Schroeder wrote: Actually libsolv can do a incremental update if it has an old solv file available, i.e. it takes the unchanged content from the old solv file and only queries new headers from the

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Florian Festi
On 03/01/2013 05:32 PM, Michael Schroeder wrote: (the median is quite different from the avg, that means that some packages are quite big.) ... - That means, if I have 2000 packages installed on my system (which is about the real number), the concatenated headers will use 20 MByte

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Michael Schroeder
On Mon, Mar 04, 2013 at 03:12:51PM +0100, Florian Festi wrote: On 03/01/2013 05:32 PM, Michael Schroeder wrote: (the median is quite different from the avg, that means that some packages are quite big.) ... - That means, if I have 2000 packages installed on my system (which is

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Panu Matilainen
On 03/04/2013 12:21 PM, Michael Schroeder wrote: On Sun, Mar 03, 2013 at 05:46:10PM +0200, Panu Matilainen wrote: Right, in this context compression does indeed seem quite attractive. When we talked about this in the devconf, I was thinking about the way rpm itself currently keeps (re)loading

Re: [Rpm-maint] Rpm Database musings

2013-03-04 Thread Panu Matilainen
On 03/04/2013 01:23 PM, Michael Schroeder wrote: On Mon, Mar 04, 2013 at 12:19:34PM +0100, Ales Kozumplik wrote: On 03/04/2013 11:21 AM, Michael Schroeder wrote: Actually libsolv can do a incremental update if it has an old solv file available, i.e. it takes the unchanged content from the old

[Rpm-maint] Rpm Database musings

2013-03-01 Thread Michael Schroeder
Hi Panu et al, here are some numbers/musings about changing the database implementation to just one single packages file: - I assume that we still want to store all the headers (in some format) anyway. - I checked all the headers of the i586/noarch packages from FC18 to get some