Re: These new diffs are great, but...

2006-08-24 Thread Florian Weimer
* Goswin von Brederlow:

 However, patching rred to apply patches in a single run would be a
 good start because all further optimizations will need it.

 Why should the number of chunks matter?

If you use the naïve algorithm, it does.  But rred implements
something more involved, leading to a O(patch-files * lines + changes)
complexity (or something very similar).

 What matters is reading, parsing and writing the file O(lines) and
 then the number of changes (lines of changes) O(changes). Combined
 this gives O(lines + changes) if the file is read once at the start
 and then all patches are applied.

I guess you'll find it very hard to get down to O(lines + changes). 8-)

Either you need to make multiple passes over the original file (like
the current code does), or you need to combine patch files before
applying them.  The latter involves some kind of sorting, and unless
you postulate an upper bound on the number of lines in a file, you'll
end up with super-linear complexity.

 You can do that by combining the individual patch files or by teaching
 rred to do a single run with multiple patch files. Same result. Both
 solve the problem of O(lines * chunks + changes) complexity.

The real problem is that APT's architecture doesn't allow rred to look
at all patches at once.  At least that's what I've been told.

 As for using pointers to lines and shuffeling them that seems to be
 the only sane thing to do. All patch operations are line based so it
 is essential that a line can be found and replaced in O(1). A simple
 array of pointers to lines solves that.

Inserting in the middle of an array is not a constant-time operation.
That's why the naïve algorithm is slow.



Re: These new diffs are great, but...

2006-08-24 Thread Goswin von Brederlow
Florian Weimer [EMAIL PROTECTED] writes:

 * Goswin von Brederlow:

 However, patching rred to apply patches in a single run would be a
 good start because all further optimizations will need it.

 Why should the number of chunks matter?

 If you use the naïve algorithm, it does.  But rred implements
 something more involved, leading to a O(patch-files * lines + changes)
 complexity (or something very similar).

 What matters is reading, parsing and writing the file O(lines) and
 then the number of changes (lines of changes) O(changes). Combined
 this gives O(lines + changes) if the file is read once at the start
 and then all patches are applied.

 I guess you'll find it very hard to get down to O(lines + changes). 8-)

 Either you need to make multiple passes over the original file (like
 the current code does), or you need to combine patch files before
 applying them.  The latter involves some kind of sorting, and unless
 you postulate an upper bound on the number of lines in a file, you'll
 end up with super-linear complexity.

Each change in the patches is an insert line, remove line or replace
line. Thinking about it I see that insert/remove is a problem if you
have an array of line pointers. But with a tree you can still
insert/remove/replace each line in O(log lines) or O(changes * log
lines) in total, which is better than O(lines * chunks).

Or you do keep an array of line pointers and copy that for every chunk
you process. Copying 50 pointers on every chunk doesn't sound too
slow. The theoretical complexity would be O(lines * chunks) but the
const factor should be way low.

 You can do that by combining the individual patch files or by teaching
 rred to do a single run with multiple patch files. Same result. Both
 solve the problem of O(lines * chunks + changes) complexity.

 The real problem is that APT's architecture doesn't allow rred to look
 at all patches at once.  At least that's what I've been told.

True. The apt methods aren't designed to process multiple files at
once. You can't use something designed for gunziping a single file to
combine multiple patches into a new Packages file. You have to design
something new.

 As for using pointers to lines and shuffeling them that seems to be
 the only sane thing to do. All patch operations are line based so it
 is essential that a line can be found and replaced in O(1). A simple
 array of pointers to lines solves that.

 Inserting in the middle of an array is not a constant-time operation.
 That's why the naïve algorithm is slow.

Yeah. I noticed that too now. As said above use a tree [O(log n) per
operation] or copy the array [O(n) total per pass]. Copying a pointer
to a line is better than copying each line since the hidden constant
is way lower.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-08-23 Thread Goswin von Brederlow
Florian Weimer [EMAIL PROTECTED] writes:

 * Goswin von Brederlow:

 What code do you need there? If the rred method keeps the full Index
 file in memory during patching it can just be fed all the patches one
 after another and only write out the final result at the
 end. Combining the patches is a simple cat.

 #383881 suggests that I/O bandwidth is not the issue.  In fact, if you
 keep the file in memory and repeatedly patch it, you won't get away
 from the O(n*m) complexity (n being the file size, m the number of
 hunks in the patches), or whatever complexity it is.  Shuffling
 pointers instead of full lines only saves a constant factor, which
 might not be enough.

 However, patching rred to apply patches in a single run would be a
 good start because all further optimizations will need it.

Why should the number of chunks matter?

What matters is reading, parsing and writing the file O(lines) and
then the number of changes (lines of changes) O(changes). Combined
this gives O(lines + changes) if the file is read once at the start
and then all patches are applied.

You can do that by combining the individual patch files or by teaching
rred to do a single run with multiple patch files. Same result. Both
solve the problem of O(lines * chunks + changes) complexity.

As for using pointers to lines and shuffeling them that seems to be
the only sane thing to do. All patch operations are line based so it
is essential that a line can be found and replaced in O(1). A simple
array of pointers to lines solves that.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-08-20 Thread Florian Weimer
* Goswin von Brederlow:

 What code do you need there? If the rred method keeps the full Index
 file in memory during patching it can just be fed all the patches one
 after another and only write out the final result at the
 end. Combining the patches is a simple cat.

#383881 suggests that I/O bandwidth is not the issue.  In fact, if you
keep the file in memory and repeatedly patch it, you won't get away
from the O(n*m) complexity (n being the file size, m the number of
hunks in the patches), or whatever complexity it is.  Shuffling
pointers instead of full lines only saves a constant factor, which
might not be enough.

However, patching rred to apply patches in a single run would be a
good start because all further optimizations will need it.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-18 Thread Paul Slootman
On Fri 30 Jun 2006, Martin Schulze wrote:
 
 You know that you can easily turn off this feature by adjusting apt.conf:
 
Acquire::Pdiffs { false; };

Ah, great :)

After not having done aptitude update for a month or so, after
downloading all the hunderds (!) of diffs, I got the following dreaded
message again:

E: Dynamic MMap ran out of room
E: Read error - read (14 Bad address)
E: The package lists or status file could not be parsed or opened.
E: Dynamic MMap ran out of room
E: Read error - read (14 Bad address)
E: The package lists or status file could not be parsed or opened.

I thought that this was something from bygone days... pretty dismal,
having such a fixed limit on an amd64 with 2GB of memory.
After adding APT::Cache-Limit 2000; in /etc/apt/apt.conf
(the old example of 8MB wasn't enough) it proceeded, and now I don't
need that anymore. The high memory need was apparently related to
processing all those diffs.


Paul Slootman


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-07 Thread Marc Haber
On Fri, 30 Jun 2006 11:10:37 +0200, Eduard Bloch [EMAIL PROTECTED] wrote:
I have doubts, have you measured the real difference?

Yes, I have. Test was done in a sid chroot that hasn't been updated
for like two weeks.

|$ time sudo aptitude update
|Reading package lists... Done
|Building dependency tree... Done
|Reading extended state information
|Initializing package states... Done
|Building tag database... Done
|Ign http://zg.debian.zugschlus.de zg/sid Release.gpg
|Ign http://zg.debian.zugschlus.de zg/sid Release
|Ign http://zg.debian.zugschlus.de zg/sid/main Packages/DiffIndex
|Ign http://zg.debian.zugschlus.de zg/sid/contrib Packages/DiffIndex
|Get:1 http://zg.debian.zugschlus.de zg/sid/main Packages [7506B]
|Hit http://zg.debian.zugschlus.de zg/sid/contrib Packages
|Get:2 http://debian.debian.zugschlus.de sid Release.gpg [189B]
|Get:3 http://debian.debian.zugschlus.de sid Release [38.3kB]
|Ign http://debian.debian.zugschlus.de sid Release
|Get:4 http://debian.debian.zugschlus.de sid/main Packages/DiffIndex [12.6kB]
|Get:5 http://debian.debian.zugschlus.de sid/contrib Packages/DiffIndex [12.5kB]
|Get:6 http://debian.debian.zugschlus.de sid/main Sources/DiffIndex [12.5kB]
|Get:7 http://debian.debian.zugschlus.de sid/contrib Sources/DiffIndex [12.5kB]
|Get:8 2006-06-22-1336.37.pdiff [16.0kB]
|Get:9 2006-06-22-1336.37.pdiff [16.0kB]
|Get:10 2006-06-22-1336.37.pdiff [180B]
|Get:11 2006-06-22-1336.37.pdiff [9491B]
|Get:12 2006-06-22-1336.37.pdiff [180B]
|Get:13 2006-06-22-1336.37.pdiff [16.0kB]
|Get:14 2006-06-22-1336.37.pdiff [172B]
|Get:15 2006-06-22-1336.37.pdiff [9491B]
|Get:16 2006-06-22-1336.37.pdiff [172B]
|Get:17 2006-06-22-1336.37.pdiff [180B]
|Get:18 2006-06-23-1348.02.pdiff [25.9kB]
|Get:19 2006-06-22-1336.37.pdiff [9491B]
|Get:20 2006-06-23-1348.02.pdiff [25.9kB]
|Get:21 2006-06-23-1348.02.pdiff [329B]
|Get:22 2006-06-23-1348.02.pdiff [329B]
|Get:23 2006-06-22-1336.37.pdiff [172B]
|Get:24 2006-06-23-1348.02.pdiff [25.9kB]
|Get:25 2006-06-23-1348.02.pdiff [13.9kB]
|Get:26 2006-06-23-1348.02.pdiff [13.9kB]
|Get:27 2006-06-23-1348.02.pdiff [139B]
|Get:28 2006-06-23-1348.02.pdiff [139B]
|Get:29 2006-06-23-1348.02.pdiff [329B]
|Get:30 2006-06-24-1338.23.pdiff [39.6kB]
|Get:31 2006-06-23-1348.02.pdiff [13.9kB]
|Get:32 2006-06-24-1338.23.pdiff [39.6kB]
|Get:33 2006-06-24-1338.23.pdiff [155B]
|Get:34 2006-06-24-1338.23.pdiff [155B]
|Get:35 2006-06-23-1348.02.pdiff [139B]
|Get:36 2006-06-24-1338.23.pdiff [18.9kB]
|Get:37 2006-06-24-1338.23.pdiff [39.6kB]
|Get:38 2006-06-24-1338.23.pdiff [18.9kB]
|Get:39 2006-06-24-1338.23.pdiff [165B]
|Get:40 2006-06-24-1338.23.pdiff [165B]
|Get:41 2006-06-24-1338.23.pdiff [155B]
|Get:42 2006-06-25-1342.37.pdiff [20.3kB]
|Get:43 2006-06-24-1338.23.pdiff [18.9kB]
|Get:44 2006-06-25-1342.37.pdiff [20.3kB]
|Get:45 2006-06-25-1342.37.pdiff [205B]
|Get:46 2006-06-25-1342.37.pdiff [205B]
|Get:47 2006-06-24-1338.23.pdiff [165B]
|Get:48 2006-06-25-1342.37.pdiff [20.3kB]
|Get:49 2006-06-25-1342.37.pdiff [12.0kB]
|Get:50 2006-06-25-1342.37.pdiff [12.0kB]
|Get:51 2006-06-25-1342.37.pdiff [147B]
|Get:52 2006-06-25-1342.37.pdiff [147B]
|Get:53 2006-06-25-1342.37.pdiff [205B]
|Get:54 2006-06-25-1342.37.pdiff [12.0kB]
|Get:55 2006-06-26-1346.13.pdiff [24.6kB]
|Get:56 2006-06-26-1346.13.pdiff [24.6kB]
|Get:57 2006-06-26-1346.13.pdiff [151B]
|Get:58 2006-06-26-1346.13.pdiff [151B]
|Get:59 2006-06-25-1342.37.pdiff [147B]
|Get:60 2006-06-26-1346.13.pdiff [24.6kB]
|Get:61 2006-06-26-1346.13.pdiff [11.5kB]
|Get:62 2006-06-26-1346.13.pdiff [11.5kB]
|Get:63 2006-06-26-1346.13.pdiff [207B]
|Get:64 2006-06-26-1346.13.pdiff [207B]
|Get:65 2006-06-26-1346.13.pdiff [151B]
|Get:66 2006-06-26-1346.13.pdiff [11.5kB]
|Get:67 2006-06-27-1308.52.pdiff [1269kB]
|Get:68 2006-06-27-1308.52.pdiff [1269kB]
|Get:69 2006-06-27-1308.52.pdiff [14.7kB]
|Get:70 2006-06-27-1308.52.pdiff [14.7kB]
|Get:71 2006-06-26-1346.13.pdiff [207B]
|Get:72 2006-06-27-1308.52.pdiff [13.0kB]
|Get:73 2006-06-27-1308.52.pdiff [1269kB]
|Get:74 2006-06-27-1308.52.pdiff [13.0kB]
|Get:75 2006-06-27-1308.52.pdiff [29B]
|Get:76 2006-06-27-1308.52.pdiff [29B]
|Get:77 2006-06-27-1308.52.pdiff [14.7kB]
|Get:78 2006-06-28-1245.43.pdiff [25.3kB]
|Get:79 2006-06-27-1308.52.pdiff [13.0kB]
|Get:80 2006-06-28-1245.43.pdiff [25.3kB]
|Get:81 2006-06-28-1245.43.pdiff [391B]
|Get:82 2006-06-28-1245.43.pdiff [391B]
|Get:83 2006-06-27-1308.52.pdiff [29B]
|Get:84 2006-06-28-1245.43.pdiff [25.3kB]
|Get:85 2006-06-28-1245.43.pdiff [7672B]
|Get:86 2006-06-28-1245.43.pdiff [7672B]
|Get:87 2006-06-28-1245.43.pdiff [146B]
|Get:88 2006-06-28-1245.43.pdiff [146B]
|Get:89 2006-06-28-1245.43.pdiff [391B]
|Get:90 2006-06-29-1245.21.pdiff [39.1kB]
|Get:91 2006-06-28-1245.43.pdiff [7672B]
|Get:92 2006-06-29-1245.21.pdiff [39.1kB]
|Get:93 2006-06-29-1245.21.pdiff [818B]
|Get:94 2006-06-29-1245.21.pdiff [818B]
|Get:95 2006-06-28-1245.43.pdiff [146B]
|Get:96 2006-06-29-1245.21.pdiff [11.9kB]
|Get:97 2006-06-29-1245.21.pdiff [39.1kB]
|Get:98 2006-06-29-1245.21.pdiff 

Re: These new diffs are great, but...

2006-07-07 Thread Ralf Hildebrandt
* Marc Haber [EMAIL PROTECTED]:

 Yes, I have. Test was done in a sid chroot that hasn't been updated
 for like two weeks.
...
 Updating with pdiffs took one minute nine seconds while downloading a
 completely new set of list files took eight seconds.
 
 Test environment was quite unfair though (an old machine with an 1200
 MHz CPU and a single, slow disk on an 100 MBit link to a rather local
 mirror).

There should be limit on the number of diffs that will be downloaded
before the whole list if used instead.

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED]
Charite - Universitätsmedizin BerlinTel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-BerlinFax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-07 Thread Goswin von Brederlow
Florian Weimer [EMAIL PROTECTED] writes:

 * Marc Haber:

 The machine in Question is a P3 with 1200 MHz. What's making the
 process slow is the turnaround time for the http requests, as observed
 multiple times in this thread alone.

 Then your setup is very broken.  APT performs HTTP pipelining.

Actualy it does NOT from what strace shows me. The apt http method
uses keep-alive but not pipelining. For example apt-get source bash
will send a GET request, read the file, send the next GET, read the
file, send the third GET, read that file. With pipelining it should
send all 3 GETs at once or at least intermixed with reading the files.

But even with pipelining that would not help since the pdiff files are
not queued up with the http method in advance but one after the other.

 On my machines, I see the behavior Miles described: lots of disk I/O.
 Obviously, APT reconstructs every intermediate version of the packages
 file.

Yes, I noticed that too. Patching a 15MB Packages file takes a lot of
time. You can watch the progress during rred runs most of the time
even on a modern amd64 system.

 The fix is to combine the diffs before applying them, so that you only
 need one process the large Packages file once.  I happen to have ML
 code which does this (including the conversion to a patch
 representation which is more amenable to this kind of optimization)
 and would be willing to port it to C++, but someone else would need to
 deal with the APT integration because I'm not familiar with its
 architecture.

What code do you need there? If the rred method keeps the full Index
file in memory during patching it can just be fed all the patches one
after another and only write out the final result at the
end. Combining the patches is a simple cat.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-07 Thread George Danchev
On Friday 07 July 2006 15:36, Goswin von Brederlow wrote:
 Florian Weimer [EMAIL PROTECTED] writes:
  * Marc Haber:
  The machine in Question is a P3 with 1200 MHz. What's making the
  process slow is the turnaround time for the http requests, as observed
  multiple times in this thread alone.
 
  Then your setup is very broken.  APT performs HTTP pipelining.

 Actualy it does NOT from what strace shows me. The apt http method
 uses keep-alive but not pipelining. For example apt-get source bash
 will send a GET request, read the file, send the next GET, read the
 file, send the third GET, read that file. With pipelining it should
 send all 3 GETs at once or at least intermixed with reading the files.

Well, pipelining depends on the remote httpd server also (IFAIK thttpd does 
not support it) and apt's httpd method is smart enough to shut down 
pipelining when talks to HTTP/1.0-only capable servers to speed the things 
up.

-- 
pub 4096R/0E4BD0AB 2003-03-18 people.fccf.net/danchev/key pgp.mit.edu
fingerprint 1AE7 7C66 0A26 5BFF DF22 5D55 1C57 0C89 0E4B D0AB 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-07 Thread Martijn van Oosterhout

On 7/7/06, Goswin von Brederlow [EMAIL PROTECTED] wrote:

What code do you need there? If the rred method keeps the full Index
file in memory during patching it can just be fed all the patches one
after another and only write out the final result at the
end. Combining the patches is a simple cat.


As far as I can see from the code, it reads the input file and the
patch with fgets() and writes the output file with fputs(). Since the
diffs in the file are in reverse order, it first reads one ed command
and recurses so it forms on the stack a set of the file offsets of all
the patches. As it unwinds it scans forward through the data file once
to apply the patches.

Not a terribly bad algorith as such, but it's got quite a bit of disk
overhead if the individual files are on disk. It would appear that the
algorithm would allow itself to stream output from one patch applier
to another, but it would seem to be easiest to simply combine the
diffs into one large diff. Techniques for combining diffs are not new,
I imagine someone just needs to code it...

Hope this helps,
--
Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-07 Thread Adam Borowski
On Fri, Jul 07, 2006 at 02:28:49PM +0200, Marc Haber wrote:
 Updating with pdiffs took one minute nine seconds while downloading a
 completely new set of list files took eight seconds.
 
 Test environment was quite unfair though (an old machine with an 1200
 MHz CPU and a single, slow disk on an 100 MBit link to a rather local
 mirror).

1200MHz is slow these days?  I ran this very test on Wednesday's
night, on a super-duper machine overclocked to whooping 120MHz.

But even though it was only ~10 diffs (installed from Sunday's d-i
daily), I have to concur with your results :p

And the installation isn't complete yet...

-- 
1KB // Microsoft corollary to Hanlon's razor:
//  Never attribute to stupidity what can be
//  adequately explained by malice.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-02 Thread Marc Haber
On Fri, 30 Jun 2006 18:29:40 +0200, Florian Weimer [EMAIL PROTECTED]
wrote:
* Marc Haber:
 The machine in Question is a P3 with 1200 MHz. What's making the
 process slow is the turnaround time for the http requests, as observed
 multiple times in this thread alone.

Then your setup is very broken.

Then it is broken by default. I didn't touch any of apt's
configuration.

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber |Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom  | http://www.zugschlus.de/
Nordisch by Nature | Lt. Worf, TNG Rightful Heir | Fon: *49 621 72739834



Re: These new diffs are great, but...

2006-07-02 Thread Martijn van Oosterhout

On 6/30/06, Florian Weimer [EMAIL PROTECTED] wrote:

* Marc Haber:

 The machine in Question is a P3 with 1200 MHz. What's making the
 process slow is the turnaround time for the http requests, as observed
 multiple times in this thread alone.

Then your setup is very broken.  APT performs HTTP pipelining.


Judging by the way I saw it run, it looked like it was pipelining but
only maybe five at a time. Maybe some of the mirrors restrict the
number of pipelined requests? Is there a way of detecting such a
situation?

Have a nice day,
--
Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-07-02 Thread Anthony Towns
On Fri, Jun 30, 2006 at 05:38:31PM +0200, Steinar H. Gunderson wrote:
 On Fri, Jun 30, 2006 at 04:55:58PM +0200, Martin Schulze wrote:
  You know that you can easily turn off this feature by adjusting apt.conf:
 Sure, and I've done so for several of my machines now. Actually, for many
 enough machines that it's becoming bothersome...

Are you running a public mirror, or one specifically for your local
network?  Have you tried just not mirroring the Packages.diff and
Sources.diff directories?

Cheers,
aj



signature.asc
Description: Digital signature


Re: These new diffs are great, but...

2006-06-30 Thread Marc Haber
On Thu, 29 Jun 2006 11:43:45 -0700, Tyler MacDonald [EMAIL PROTECTED]
wrote:
Steinar H. Gunderson [EMAIL PROTECTED] wrote:
 I usually notice the difference -- the other way. aptitude update on a
 machine that hasn't been updated in a while suddenly takes minutes instead of
 seconds...

   Yes, this is what I'm getting at. :-) Should this be considered a
bug in apt-get (file:// urls should never use diffs)?

file:// URLs are not the only issue here - aptitude update is also
much slower than before on a hosted box which has 100 Mbit/s
connectivity and could load the Packages.gz in, like, two seconds.

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber |Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom  | http://www.zugschlus.de/
Nordisch by Nature | Lt. Worf, TNG Rightful Heir | Fon: *49 621 72739834



Re: These new diffs are great, but...

2006-06-30 Thread Joey Hess
Miles Bader wrote:
 Yeah I noticed this too -- some .pdiff files appeared to be downloaded
 dozens of times!

It prints the same pdiff filenames when downloading files with the same
basename from different paths.

Just to confuse things it does print out each seprate pdiff file 3 times,
although my squid logs show it downloads each exactly once. My guess w/o
reading the code is that one represents the download, one the extraction, and
one the application of the diff.

Get:15 2006-06-29-1245.21.pdiff [39.1kB]   
Get:16 2006-06-29-1245.21.pdiff [39.1kB]   
Get:17 2006-06-29-1245.21.pdiff [39.1kB]   
Get:18 2006-06-29-1245.21.pdiff [818B] 
Get:19 2006-06-29-1245.21.pdiff [818B] 
Get:20 2006-06-29-1245.21.pdiff [330B] 
Get:21 2006-06-29-1245.21.pdiff [330B] 
Get:22 2006-06-29-1245.21.pdiff [11.9kB]   
Get:23 2006-06-29-1245.21.pdiff [818B] 
Get:24 2006-06-29-1245.21.pdiff [330B] 
Get:25 2006-06-29-1245.21.pdiff [249B] 
Get:26 2006-06-29-1245.21.pdiff [11.9kB]   
Get:27 2006-06-29-1245.21.pdiff [249B] 
Get:28 2006-06-29-1245.21.pdiff [11.9kB]   
Get:29 2006-06-29-1245.21.pdiff [249B] 
Get:30 2006-06-29-1245.21.pdiff [4801B]
Get:31 2006-06-29-1245.21.pdiff [4801B]
Get:32 2006-06-29-1245.21.pdiff [4801B]

1151639470.951   4248 192.168.1.7 TCP_REFRESH_MISS/200 12975 GET 
http://archive.progeny.com/debian/dists/unstable/main/binary-i386/Packages.diff/Index
 - DIRECT/216.37.55.114 text/plain
1151639474.134   3183 192.168.1.7 TCP_REFRESH_MISS/200 12883 GET 
http://archive.progeny.com/debian/dists/unstable/contrib/binary-i386/Packages.diff/Index
 - DIRECT/216.37.55.114 text/plain
1151639477.004   2871 192.168.1.7 TCP_REFRESH_MISS/200 12884 GET 
http://archive.progeny.com/debian/dists/unstable/non-free/binary-i386/Packages.diff/Index
 - DIRECT/216.37.55.114 text/plain
1151639479.933   2929 192.168.1.7 TCP_REFRESH_MISS/200 12882 GET 
http://archive.progeny.com/debian/dists/unstable/main/source/Sources.diff/Index 
- DIRECT/216.37.55.114 text/plain
1151639480.474541 192.168.1.7 TCP_REFRESH_HIT/304 250 GET 
http://archive.progeny.com/debian/dists/unstable/contrib/source/Sources.diff/Index
 - DIRECT/216.37.55.114 -
1151639483.453   2980 192.168.1.7 TCP_REFRESH_MISS/200 12884 GET 
http://archive.progeny.com/debian/dists/unstable/non-free/source/Sources.diff/Index
 - DIRECT/216.37.55.114 text/plain
1151639486.507   3053 192.168.1.7 TCP_REFRESH_MISS/200 12884 GET 
http://archive.progeny.com/debian/dists/experimental/main/binary-i386/Packages.diff/Index
 - DIRECT/216.37.55.114 text/plain
1151639486.886379 192.168.1.7 TCP_REFRESH_HIT/304 249 GET 
http://archive.progeny.com/debian/dists/experimental/contrib/binary-i386/Packages.diff/Index
 - DIRECT/216.37.55.114 -
1151639487.205319 192.168.1.7 TCP_REFRESH_HIT/304 250 GET 
http://archive.progeny.com/debian/dists/experimental/non-free/binary-i386/Packages.diff/Index
 - DIRECT/216.37.55.114 -
1151639500.962  13757 192.168.1.7 TCP_MISS/200 39456 GET 
http://archive.progeny.com/debian/dists/unstable/main/binary-i386/Packages.diff/2006-06-29-1245.21.gz
 - DIRECT/216.37.55.114 text/plain
1151639501.693731 192.168.1.7 TCP_MISS/200 1214 GET 
http://archive.progeny.com/debian/dists/unstable/contrib/binary-i386/Packages.diff/2006-06-29-1245.21.gz
 - DIRECT/216.37.55.114 text/plain
1151639502.173480 192.168.1.7 TCP_MISS/200 727 GET 
http://archive.progeny.com/debian/dists/unstable/non-free/binary-i386/Packages.diff/2006-06-29-1245.21.gz
 - DIRECT/216.37.55.114 text/plain
1151639506.532   4358 192.168.1.7 TCP_MISS/200 12277 GET 
http://archive.progeny.com/debian/dists/unstable/main/source/Sources.diff/2006-06-29-1245.21.gz
 - DIRECT/216.37.55.114 text/plain
1151639506.973441 192.168.1.7 TCP_MISS/200 645 GET 
http://archive.progeny.com/debian/dists/unstable/non-free/source/Sources.diff/2006-06-29-1245.21.gz
 - DIRECT/216.37.55.114 text/plain
1151639508.952   1980 192.168.1.7 TCP_MISS/200 5200 GET 
http://archive.progeny.com/debian/dists/experimental/main/binary-i386/Packages.diff/2006-06-29-1245.21.gz
 - DIRECT/216.37.55.114 text/plain

-- 
see shy jo


signature.asc
Description: Digital signature


Re: These new diffs are great, but...

2006-06-30 Thread Martin Michlmayr
* Joey Hess [EMAIL PROTECTED] [2006-06-30 02:05]:
 Just to confuse things it does print out each seprate pdiff file 3
 times, although my squid logs show it downloads each exactly once.
 My guess w/o reading the code is that one represents the download,
 one the extraction, and one the application of the diff.

Your guess is correct, see #372504.  This is currently a UI problem.
It displays the line three times, but it only downloads it in the
first. The other two lines are unpack and rred (patch).
-- 
Martin Michlmayr
http://www.cyrius.com/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Marc Haber
On Fri, 30 Jun 2006 08:44:30 +0200, Martin Michlmayr [EMAIL PROTECTED]
wrote:
Your guess is correct, see #372504.  This is currently a UI problem.
It displays the line three times, but it only downloads it in the
first. The other two lines are unpack and rred (patch).

So the rred is not a badly formatted and partially overwritten
transferred but an actual string?

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber |Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom  | http://www.zugschlus.de/
Nordisch by Nature | Lt. Worf, TNG Rightful Heir | Fon: *49 621 72739834



Re: These new diffs are great, but...

2006-06-30 Thread Eduard Bloch
#include hallo.h
* Marc Haber [Fri, Jun 30 2006, 08:00:57AM]:
 On Thu, 29 Jun 2006 11:43:45 -0700, Tyler MacDonald [EMAIL PROTECTED]
 wrote:
 Steinar H. Gunderson [EMAIL PROTECTED] wrote:
  I usually notice the difference -- the other way. aptitude update on a
  machine that hasn't been updated in a while suddenly takes minutes instead 
  of
  seconds...
 
  Yes, this is what I'm getting at. :-) Should this be considered a
 bug in apt-get (file:// urls should never use diffs)?
 
 file:// URLs are not the only issue here - aptitude update is also
 much slower than before on a hosted box which has 100 Mbit/s
 connectivity and could load the Packages.gz in, like, two seconds.

I have doubts, have you measured the real difference? If you box is
already to slow for patching then it would have fun with bzip2 as well.

Eduard.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Marc Haber
On Fri, 30 Jun 2006 11:10:37 +0200, Eduard Bloch [EMAIL PROTECTED] wrote:
* Marc Haber [Fri, Jun 30 2006, 08:00:57AM]:
 file:// URLs are not the only issue here - aptitude update is also
 much slower than before on a hosted box which has 100 Mbit/s
 connectivity and could load the Packages.gz in, like, two seconds.

I have doubts, have you measured the real difference? If you box is
already to slow for patching then it would have fun with bzip2 as well.

The machine in Question is a P3 with 1200 MHz. What's making the
process slow is the turnaround time for the http requests, as observed
multiple times in this thread alone.

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber |Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom  | http://www.zugschlus.de/
Nordisch by Nature | Lt. Worf, TNG Rightful Heir | Fon: *49 621 72739834



Re: These new diffs are great, but...

2006-06-30 Thread Andreas Barth
* Marc Haber ([EMAIL PROTECTED]) [060630 10:58]:
 On Fri, 30 Jun 2006 08:44:30 +0200, Martin Michlmayr [EMAIL PROTECTED]
 wrote:
 Your guess is correct, see #372504.  This is currently a UI problem.
 It displays the line three times, but it only downloads it in the
 first. The other two lines are unpack and rred (patch).
 
 So the rred is not a badly formatted and partially overwritten
 transferred but an actual string?

it's a restricted version of ed.


Cheers,
Andi
-- 
  http://home.arcor.de/andreas-barth/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Martin Schulze
Steinar H. Gunderson wrote:
 On Thu, Jun 29, 2006 at 08:35:41PM +0200, martin f krafft wrote:
  Not really. pdiff's mainly reduce download size for low bandwidth
  connections. file:// is pretty high bandwidth, you won't notice the
  difference.
 
 I usually notice the difference -- the other way. aptitude update on a
 machine that hasn't been updated in a while suddenly takes minutes instead of
 seconds...

You know that you can easily turn off this feature by adjusting apt.conf:

   Acquire::Pdiffs { false; };

Regards,

Joey

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.

Please always Cc to me when replying to me on the lists.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Steinar H. Gunderson
On Fri, Jun 30, 2006 at 04:55:58PM +0200, Martin Schulze wrote:
 You know that you can easily turn off this feature by adjusting apt.conf:

Sure, and I've done so for several of my machines now. Actually, for many
enough machines that it's becoming bothersome...

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Florian Weimer
* Marc Haber:

 The machine in Question is a P3 with 1200 MHz. What's making the
 process slow is the turnaround time for the http requests, as observed
 multiple times in this thread alone.

Then your setup is very broken.  APT performs HTTP pipelining.

On my machines, I see the behavior Miles described: lots of disk I/O.
Obviously, APT reconstructs every intermediate version of the packages
file.

The fix is to combine the diffs before applying them, so that you only
need one process the large Packages file once.  I happen to have ML
code which does this (including the conversion to a patch
representation which is more amenable to this kind of optimization)
and would be willing to port it to C++, but someone else would need to
deal with the APT integration because I'm not familiar with its
architecture.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Mike Hommey
On Fri, Jun 30, 2006 at 06:29:40PM +0200, Florian Weimer [EMAIL PROTECTED] 
wrote:
 * Marc Haber:
 
  The machine in Question is a P3 with 1200 MHz. What's making the
  process slow is the turnaround time for the http requests, as observed
  multiple times in this thread alone.
 
 Then your setup is very broken.  APT performs HTTP pipelining.
 
 On my machines, I see the behavior Miles described: lots of disk I/O.
 Obviously, APT reconstructs every intermediate version of the packages
 file.
 
 The fix is to combine the diffs before applying them, so that you only
 need one process the large Packages file once.  I happen to have ML
 code which does this (including the conversion to a patch
 representation which is more amenable to this kind of optimization)
 and would be willing to port it to C++, but someone else would need to
 deal with the APT integration because I'm not familiar with its
 architecture.

You're looking for combinediff in the patchutils package.

Mike


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-30 Thread Florian Weimer
* Mike Hommey:

 The fix is to combine the diffs before applying them, so that you only
 need one process the large Packages file once.  I happen to have ML
 code which does this (including the conversion to a patch
 representation which is more amenable to this kind of optimization)
 and would be willing to port it to C++, but someone else would need to
 deal with the APT integration because I'm not familiar with its
 architecture.

 You're looking for combinediff in the patchutils package.

combinediff doesn't work for the ed patches used by APT.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread martin f krafft
also sprach Tyler MacDonald [EMAIL PROTECTED] [2006.06.29.2005 +0200]:
 Is it at all useful/better for apt-get to use the .pdiff files
 when dealing with a local (file://) debian repo?

Not really. pdiff's mainly reduce download size for low bandwidth
connections. file:// is pretty high bandwidth, you won't notice the
difference.

-- 
Please do not send copies of list mail to me; I read the list!
 
 .''`. martin f. krafft [EMAIL PROTECTED]
: :'  :proud Debian developer and author: http://debiansystem.info
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
only by counting could humans demonstrate
their independence of computers.
-- douglas adams, the hitchhiker's guide to the galaxy


signature.asc
Description: Digital signature (GPG/PGP)


Re: These new diffs are great, but...

2006-06-29 Thread Steinar H. Gunderson
On Thu, Jun 29, 2006 at 08:35:41PM +0200, martin f krafft wrote:
 Not really. pdiff's mainly reduce download size for low bandwidth
 connections. file:// is pretty high bandwidth, you won't notice the
 difference.

I usually notice the difference -- the other way. aptitude update on a
machine that hasn't been updated in a while suddenly takes minutes instead of
seconds...

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Tyler MacDonald
Steinar H. Gunderson [EMAIL PROTECTED] wrote:
 On Thu, Jun 29, 2006 at 08:35:41PM +0200, martin f krafft wrote:
  Not really. pdiff's mainly reduce download size for low bandwidth
  connections. file:// is pretty high bandwidth, you won't notice the
  difference.
 
 I usually notice the difference -- the other way. aptitude update on a
 machine that hasn't been updated in a while suddenly takes minutes instead of
 seconds...

Yes, this is what I'm getting at. :-) Should this be considered a
bug in apt-get (file:// urls should never use diffs)?

- Tyler


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Bastian Venthur
Steinar H. Gunderson wrote:
 On Thu, Jun 29, 2006 at 08:35:41PM +0200, martin f krafft wrote:
 Not really. pdiff's mainly reduce download size for low bandwidth
 connections. file:// is pretty high bandwidth, you won't notice the
 difference.
 
 I usually notice the difference -- the other way. aptitude update on a
 machine that hasn't been updated in a while suddenly takes minutes instead of
 seconds...

Same here. Very annoying on a box where you only update every few weeks
or something. Wouldn't it be possible to make snapshots every week and
only pdiff from this snapshot?


Cheers,

Bastian

-- 
Bastian Venthur
http://venthur.de


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Alec Berryman
Tyler MacDonald on 2006-06-29 11:43:45 -0700:

 Steinar H. Gunderson [EMAIL PROTECTED] wrote:
  On Thu, Jun 29, 2006 at 08:35:41PM +0200, martin f krafft wrote:
   Not really. pdiff's mainly reduce download size for low bandwidth
   connections. file:// is pretty high bandwidth, you won't notice the
   difference.
  
  I usually notice the difference -- the other way. aptitude update on a
  machine that hasn't been updated in a while suddenly takes minutes instead 
  of
  seconds...
 
   Yes, this is what I'm getting at. :-) Should this be considered a
 bug in apt-get (file:// urls should never use diffs)?

See bug #372712.  To disable pdiffs, use:

   apt-get update -o Acquire::PDiffs=false


signature.asc
Description: Digital signature


Re: These new diffs are great, but...

2006-06-29 Thread Steinar H. Gunderson
On Thu, Jun 29, 2006 at 09:15:13PM +0200, Bastian Venthur wrote:
 Same here. Very annoying on a box where you only update every few weeks
 or something. Wouldn't it be possible to make snapshots every week and
 only pdiff from this snapshot?

You can turn off pdiffs if you'd like to; the old packages files are still
there. The question here is what to do with the default.

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Bastian Venthur
Steinar H. Gunderson wrote:
 On Thu, Jun 29, 2006 at 09:15:13PM +0200, Bastian Venthur wrote:
 Same here. Very annoying on a box where you only update every few weeks
 or something. Wouldn't it be possible to make snapshots every week and
 only pdiff from this snapshot?
 
 You can turn off pdiffs if you'd like to; the old packages files are still
 there. The question here is what to do with the default.

Ah ok, I was not aware of this feature. But since downloading pdiffs of
 x days might in fact result in a larger download than downloading the
package-files directly, aptitude (or apt-get) should decide
automatically when to use pdiffs and when not.

Someone could make stats to calculate the average day-count x when the
summ of the pdiffs becomes larger that the package-files. Then aptitude
(or apt-get) could decide whether the last update is more than x days
away and decide what to use.

Should not be too hard to implement. We had still the advantage of using
pdiffs for the users who update regulary but no drawbacks for all the
other ones.


Cheers,

Bastian

-- 
Bastian Venthur
http://venthur.de


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread martin f krafft
also sprach Bastian Venthur [EMAIL PROTECTED] [2006.06.29.2135 +0200]:
 Someone could make stats to calculate the average day-count x when the
 summ of the pdiffs becomes larger that the package-files. Then aptitude
 (or apt-get) could decide whether the last update is more than x days
 away and decide what to use.

Nice idea. I suggest Someone == Bastian Venthur :)

-- 
Please do not send copies of list mail to me; I read the list!
 
 .''`. martin f. krafft [EMAIL PROTECTED]
: :'  :proud Debian developer and author: http://debiansystem.info
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
common sense is the collection
 of prejudices acquired by age eighteen
-- albert einstein


signature.asc
Description: Digital signature (GPG/PGP)


Re: These new diffs are great, but...

2006-06-29 Thread Bastian Venthur
martin f krafft wrote:
 also sprach Bastian Venthur [EMAIL PROTECTED] [2006.06.29.2135 +0200]:
 Someone could make stats to calculate the average day-count x when the
 summ of the pdiffs becomes larger that the package-files. Then aptitude
 (or apt-get) could decide whether the last update is more than x days
 away and decide what to use.
 
 Nice idea. I suggest Someone == Bastian Venthur :)

Sure, I will be on vacation for the next week but when I'm back and
nothing changed, I'll have a look at it.


Best regards,

Bastian

-- 
Bastian Venthur
http://venthur.de


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Robert Lemmen
On Thu, Jun 29, 2006 at 09:35:09PM +0200, Bastian Venthur wrote:
 Someone could make stats to calculate the average day-count x when the
 summ of the pdiffs becomes larger that the package-files. Then aptitude
 (or apt-get) could decide whether the last update is more than x days
 away and decide what to use.

it might be easier to just generate fewer diffs on the server side, if
there is no matching diff available apt will fall back to using the
standard method. you will however find out that the size of all diffs
together is already less than the size of the regular packages file.

disabling diffs for file/cdrom urls makes perfect sense imho...

cu  robert

-- 
Robert Lemmen   http://www.semistable.com 


signature.asc
Description: Digital signature


Re: These new diffs are great, but...

2006-06-29 Thread Steinar H. Gunderson
On Thu, Jun 29, 2006 at 10:53:14PM +0200, Robert Lemmen wrote:
 it might be easier to just generate fewer diffs on the server side, if
 there is no matching diff available apt will fall back to using the
 standard method. you will however find out that the size of all diffs
 together is already less than the size of the regular packages file.

There is a penalty per-request (both on the client and the server side), and
a penalty for piecing the diffs back together on the receiving end, both of
which are  ε.

/* Steinar */
-- 
Homepage: http://www.sesse.net/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Kurt Roeckx
On Thu, Jun 29, 2006 at 09:35:09PM +0200, Bastian Venthur wrote:
 Steinar H. Gunderson wrote:
  On Thu, Jun 29, 2006 at 09:15:13PM +0200, Bastian Venthur wrote:
  Same here. Very annoying on a box where you only update every few weeks
  or something. Wouldn't it be possible to make snapshots every week and
  only pdiff from this snapshot?
  
  You can turn off pdiffs if you'd like to; the old packages files are still
  there. The question here is what to do with the default.
 
 Ah ok, I was not aware of this feature. But since downloading pdiffs of
  x days might in fact result in a larger download than downloading the
 package-files directly, aptitude (or apt-get) should decide
 automatically when to use pdiffs and when not.
 
 Someone could make stats to calculate the average day-count x when the
 summ of the pdiffs becomes larger that the package-files. Then aptitude
 (or apt-get) could decide whether the last update is more than x days
 away and decide what to use.

You don't need stats for that, the Packages.diff/Index has the
size in it.

But what I don't get is that it seems to be downloading every
file more than once. It atleast looks to be downloading twice as
files as it should, but it more looks like it's downloading the
same file 3 times if I look at the sizes.


Kurt


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Bastian Venthur
Robert Lemmen wrote:

 standard method. you will however find out that the size of all diffs
 together is already less than the size of the regular packages file.

Yeah, looking at the average filesize of a diff compared to a packages
file, I guess you'll need to wait like 100-200 days until the sum of the
diffs becomes larger than the package file itself. However, downloading
a 5meg file takes a few seconds on my boxes while downloading the diffs
from 10-20days can take a few minutes, which is not very attractive.

This is quite a dilemma since I understand that the bandwith of
volunteering archive mirrors is not free.

Since the main problem seems to be that downloading many small files can
take much longer than downloading one big file, a compromise could be to
provide only one diff. The trick: generate x diffs for:

today-1day, today-2days .. today-x days

so you only have do download one file if your last update is less than x
days ago.

A good compromise for x could be 50 days or something. The diffs would
be reasonable small, fast to download and if your last update is more
than x days ago you still could download the package file directly.

This solution should keep the bandwidth utilization on the servers small
(older diffs are less likely to be downloaded than the most recent ones)
while being faster than the current (and even faster than downloading
the whole packages) solution.

Plus, you don't have to keep all the old diffs (only the last x ones) on
the servers.


Any ideas?


Cheers,

Bastian

-- 
Bastian Venthur
http://venthur.de


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Miles Bader
Robert Lemmen [EMAIL PROTECTED] writes:
 it might be easier to just generate fewer diffs on the server side, if
 there is no matching diff available apt will fall back to using the
 standard method. you will however find out that the size of all diffs
 together is already less than the size of the regular packages file.

The problem for me is not the _size_ or the network bandwidth, it's that
apt seems to spend a lot more _CPU_ (and disk I/O) time dealing with
zillions of diffs -- e.g., it seems to save the updated Packages file to
disk about every 10 downloaded pdiff files or so, and on my machine
saving the Package file takes a good 20 seconds or so.  As I've got a
fast network connection, the pdiff method ends up being far more
painful because of this behavior if it's been a long time since my last
update.

Of course, I can just disable pdiffs  but (1) they're actually nice
when I update frequently, and (2) it really ought to do something more
optimal by default -- novice users won't know how to configure stuff.

-Miles

-- 
It wasn't the Exxon Valdez captain's driving that caused the Alaskan oil spill.
It was yours.  [Greenpeace advertisement, New York Times, 25 February 1990]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Miles Bader
Kurt Roeckx [EMAIL PROTECTED] writes:
 But what I don't get is that it seems to be downloading every
 file more than once. It atleast looks to be downloading twice as
 files as it should, but it more looks like it's downloading the
 same file 3 times if I look at the sizes.

Yeah I noticed this too -- some .pdiff files appeared to be downloaded
dozens of times!

-Miles
-- 
The secret to creativity is knowing how to hide your sources.
  --Albert Einstein


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: These new diffs are great, but...

2006-06-29 Thread Joe Smith


Bastian Venthur [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

Robert Lemmen wrote:


standard method. you will however find out that the size of all diffs
together is already less than the size of the regular packages file.


Yeah, looking at the average filesize of a diff compared to a packages
file, I guess you'll need to wait like 100-200 days until the sum of the
diffs becomes larger than the package file itself. However, downloading
a 5meg file takes a few seconds on my boxes while downloading the diffs
from 10-20days can take a few minutes, which is not very attractive.

This is quite a dilemma since I understand that the bandwith of
volunteering archive mirrors is not free.

Since the main problem seems to be that downloading many small files can
take much longer than downloading one big file, a compromise could be to
provide only one diff. The trick: generate x diffs for:

today-1day, today-2days .. today-x days

so you only have do download one file if your last update is less than x
days ago.

A good compromise for x could be 50 days or something. The diffs would
be reasonable small, fast to download and if your last update is more
than x days ago you still could download the package file directly.

This solution should keep the bandwidth utilization on the servers small
(older diffs are less likely to be downloaded than the most recent ones)
while being faster than the current (and even faster than downloading
the whole packages) solution.

Plus, you don't have to keep all the old diffs (only the last x ones) on
the servers.


Any ideas?



A very good idea. This is trading a slight increase in file space for 
bandwidth and speed. There is some additional server-side processing 
required, but diffing is realatively cheap.


If reversible diffs are used then generating today's diffs requires only 
yesteday's Package file, the most recent (x-1) diffs from yesterday,and 
todays package file. Scripting a program to update the diffs would not be 
terribly hard. Once the diffs are updated, everything from yesterday can be 
discarded.


Apt would always download the main package file if it was smaller than the 
appropriate diff. If it turns out that some of the diffs (the ones around 
today-x) are pretty large they can be compressed like the main package file.



Regardless, diffs should obviously not be used for file:// Sources or cd-rom 
sources unless the user explicitly says otherwise. This is because it is 
normally faster to fetch the main file when using those sources.





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]