Bug#372712: apt: periodically roll up pdiffs

2006-07-27 Thread Matt Taggart
I had a similar idea as Andrea Mennucc mentions in #372712 for the problem of 
so many pdiffs. The idea is similar to a scheme you might use for nightly 
incremental backups. You might run a zero backup once a month, a one 
backup every 15 days, a two every 7, a three every 3 and a four every 
day. For example:

 July 2006   Aug 2006
00 4 4 3 2
4 4 3 4 4 3 24 3 4 4 3 4 2
4 4 3 4 4 3 23 4 4 1 4 4 2
1 3 4 4 3 4 24 4 3 4 4 3 2
3 4 4 3 4 4 24 3 4 4 1
4 1


On any given day you'd need at most 5 patches and many days far less than 
that.  The reason for doing this is not just to reduce the number of files, 
but the overall data, as a lot of the data in the diff is redundant. Consider 
the case of a package that is updated every day for a month. Under the current 
scheme a client not updating for that month would need to download the 
differences for that package 30 times right? Under an incremental scheme the 
worst case is 5 diffs for that package. It's an even bigger win for longer 
periods of time, the current scheme will start really falling down once we get 
a few more months of pdiffs.

Thanks,

-- 
Matt Taggart
[EMAIL PROTECTED]




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#372712: apt: periodically roll up pdiffs

2006-07-27 Thread Goswin von Brederlow
Matt Taggart [EMAIL PROTECTED] writes:

 I had a similar idea as Andrea Mennucc mentions in #372712 for the problem of 
 so many pdiffs. The idea is similar to a scheme you might use for nightly 
 incremental backups. You might run a zero backup once a month, a one 
 backup every 15 days, a two every 7, a three every 3 and a four every 
 day. For example:

  July 2006   Aug 2006
 00 4 4 3 2
 4 4 3 4 4 3 24 3 4 4 3 4 2
 4 4 3 4 4 3 23 4 4 1 4 4 2
 1 3 4 4 3 4 24 4 3 4 4 3 2
 3 4 4 3 4 4 24 3 4 4 1
 4 1


 On any given day you'd need at most 5 patches and many days far less than 
 that.  The reason for doing this is not just to reduce the number of files, 
 but the overall data, as a lot of the data in the diff is redundant. Consider 
 the case of a package that is updated every day for a month. Under the 
 current 
 scheme a client not updating for that month would need to download the 
 differences for that package 30 times right? Under an incremental scheme the 
 worst case is 5 diffs for that package. It's an even bigger win for longer 
 periods of time, the current scheme will start really falling down once we 
 get 
 a few more months of pdiffs.

 Thanks,

But then again why have incremental diffs at all?

2 patches can be merged by using a file with enough uniqe lines, apply
both patches, diff again. No need to work off the actual Packages
file, they don't have to be stored for this.

It is true that for every day the patch files will all grow (- the
packages with multiple updates in that time) but they aren't so big
and compression gets better for larger files.


Given the crawling speed of the rred method downloading more than a
few days (~300k) worth of patches is slower than the full file (3Mb)
even on a slow dsl line. A combined patch would only use one download,
one gunzip and one rred run. I think that would be worth the space
increase for the patch files.

I would recommend to name the combined patch files after the md5sum
(or sha1) of the Packages/Sources file they patch. That way no index
needs to be downloaded.

MfG
Goswin

---
Sizes for combined patches:

-rw-r--r--  1 reprepro nogroup 26K Jul 27 13:55 comb.2006-07-26-1318.02.gz
-rw-r--r--  1 reprepro nogroup 54K Jul 27 13:55 comb.2006-07-25-1313.19.gz
-rw-r--r--  1 reprepro nogroup 90K Jul 27 13:55 comb.2006-07-24-1338.19.gz
-rw-r--r--  1 reprepro nogroup 132K Jul 27 13:55 comb.2006-07-24-0235.54.gz
-rw-r--r--  1 reprepro nogroup 170K Jul 27 13:55 comb.2006-07-22-1308.51.gz
-rw-r--r--  1 reprepro nogroup 186K Jul 27 13:55 comb.2006-07-21-1255.40.gz
-rw-r--r--  1 reprepro nogroup 206K Jul 27 13:55 comb.2006-07-20-1302.38.gz
-rw-r--r--  1 reprepro nogroup 226K Jul 27 13:56 comb.2006-07-19-1301.33.gz
-rw-r--r--  1 reprepro nogroup 246K Jul 27 13:56 comb.2006-07-18-1311.49.gz
-rw-r--r--  1 reprepro nogroup 289K Jul 27 13:56 comb.2006-07-17-1328.22.gz
-rw-r--r--  1 reprepro nogroup 332K Jul 27 13:56 comb.2006-07-16-2314.28.gz
-rw-r--r--  1 reprepro nogroup 351K Jul 27 13:57 comb.2006-07-15-1308.02.gz
-rw-r--r--  1 reprepro nogroup 370K Jul 27 13:57 comb.2006-07-14-1250.45.gz
-rw-r--r--  1 reprepro nogroup 392K Jul 27 13:57 comb.2006-07-13-1257.25.gz
-rw-r--r--  1 reprepro nogroup 424K Jul 27 13:57 comb.2006-07-12-1242.39.gz
-rw-r--r--  1 reprepro nogroup 443K Jul 27 13:58 comb.2006-07-11-1246.14.gz
-rw-r--r--  1 reprepro nogroup 462K Jul 27 13:58 comb.2006-07-10-1321.18.gz
-rw-r--r--  1 reprepro nogroup 495K Jul 27 13:58 comb.2006-07-10-0029.06.gz
-rw-r--r--  1 reprepro nogroup 538K Jul 27 13:59 comb.2006-07-08-1242.03.gz
-rw-r--r--  1 reprepro nogroup 547K Jul 27 13:59 comb.2006-07-07-1233.30.gz


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]