Re: an idea for next generation APT archive caching

2004-10-25 Thread martin f krafft
also sprach Wouter Verhelst <[EMAIL PROTECTED]> [2004.10.22.2121 +0200]:
> Oh, absolutely. One way could be to talk to the apt-cacher and
> apt-proxy developers and help fixing bugs in their software,
> instead of calling your not even fully thought out idea (which
> surely hasn't proven itself) the "next generation".

It's an RFC. It's an idea that I liked. Sorry for claiming it next
generation.

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-25 Thread Chris Halls
On Sat, 2004-10-23 at 05:45, Brian May wrote:
> No, max_versions is not correct. It will only work if all my computers
> use the same distribution; if some computers use unstable while others
> use stable for example, then the stable version will get deleted after
> n revisions of the unstable version of the package.

Early versions of v2 from Ranty had this behaviour, but I fixed it a
while ago.

apt-proxy (1.9.12) experimental; urgency=low
[...]
  * Fix max_versions to work in the same way as version 1
did, taking distributions into account (part of #242197)

 -- Chris Halls <[EMAIL PROTECTED]>  Sun, 30 May 2004 07:32:18 +0200




Re: an idea for next generation APT archive caching

2004-10-24 Thread Brian May
> "Wouter" == Wouter Verhelst <[EMAIL PROTECTED]> writes:

Wouter> It's not actually version 2 yet, but the current apt-proxy
Wouter> in unstable is supposed to be apt-proxy v2.

This version isn't in testing, hence part of my confusion. The other
part comes from the fact apt-proxy 1.9.18 in unstable is probably v2.

(As for why it is not in testing, see bug #267880 + others).
-- 
Brian May <[EMAIL PROTECTED]>




Re: an idea for next generation APT archive caching

2004-10-24 Thread Robert Collins
On Sat, 2004-10-23 at 12:57 -0500, Manoj Srivastava wrote:
> On Fri, 22 Oct 2004 23:04:32 -0700, Matt Zimmerman <[EMAIL PROTECTED]> said: 
> 
> > On Wed, Oct 20, 2004 at 02:11:44AM +0200, martin f krafft wrote:
> >> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> >> could be interesting, maybe it's just crap. Your call.
> 
> > My position on special-purpose proxy caches for APT is that
> > general-purpose proxy caches (like squid) seem to work fine for me.
> > What advantages do they have for others?
> 
>   Optimization?  With a special purpose proxies I can control
>  how the cache gets updated. For example, I want to keep two versions
>  of packages I use around -- the current, and the previous one, no
>  matter how old. Hard to do with Squid, which does not know these two
>  files ar4e different versi9ons of the same package. 

It can be taught that. A custom cache replacement policy, for one cache
dir, for example.

Rob

-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: an idea for next generation APT archive caching

2004-10-23 Thread Wouter Verhelst
On Sat, Oct 23, 2004 at 02:45:54PM +1000, Brian May wrote:
> > "Chris" == Chris Halls <[EMAIL PROTECTED]> writes:
> 
> Chris> Hmm, seems you are talking about version 1, which has been
> Chris> rewritten.  The new version isn't bug free yet but it does
> Chris> fix several problems.  It doesn't use wget.
> 
> It would appear apt-proxy v2 isn't in Debian (or that I can't find
> it).

It's not actually version 2 yet, but the current apt-proxy in unstable
is supposed to be apt-proxy v2.

-- 
 EARTH
 smog  |   bricks
 AIR  --  mud  -- FIRE
soda water |   tequila
 WATER
 -- with thanks to fortune




Re: an idea for next generation APT archive caching

2004-10-23 Thread Brian May
> "Chris" == Chris Halls <[EMAIL PROTECTED]> writes:

Chris> Hmm, seems you are talking about version 1, which has been
Chris> rewritten.  The new version isn't bug free yet but it does
Chris> fix several problems.  It doesn't use wget.

It would appear apt-proxy v2 isn't in Debian (or that I can't find
it).

Chris> There are several cleaning algorithms, controlled by
Chris> different parameters.  The 'only correct way' algorithm
Chris> described above is controlled by the max_versions parameter
Chris> (in version 1 & 2)

No, max_versions is not correct. It will only work if all my computers
use the same distribution; if some computers use unstable while others
use stable for example, then the stable version will get deleted after
n revisions of the unstable version of the package.
-- 
Brian May <[EMAIL PROTECTED]>




Re: an idea for next generation APT archive caching

2004-10-23 Thread Manoj Srivastava
On Fri, 22 Oct 2004 23:04:32 -0700, Matt Zimmerman <[EMAIL PROTECTED]> said: 

> On Wed, Oct 20, 2004 at 02:11:44AM +0200, martin f krafft wrote:
>> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
>> could be interesting, maybe it's just crap. Your call.

> My position on special-purpose proxy caches for APT is that
> general-purpose proxy caches (like squid) seem to work fine for me.
> What advantages do they have for others?

Optimization?  With a special purpose proxies I can control
 how the cache gets updated. For example, I want to keep two versions
 of packages I use around -- the current, and the previous one, no
 matter how old. Hard to do with Squid, which does not know these two
 files ar4e different versi9ons of the same package. 


Also, I could work with code that understood apt methods, but
 did not understand http proxies (this is not a strong argument, I
 know).

manoj
-- 
I always had a repulsive need to be something more than human. David Bowie
Manoj Srivastava   <[EMAIL PROTECTED]>  
1024D/BF24424C print 4966 F272 D093 B493 410B  924B 21BA DABB BF24 424C




Re: an idea for next generation APT archive caching

2004-10-23 Thread Matthias Urlichs
Hi, martin f krafft wrote:

> I will have to think about the premature EOF.

It's a file. Files don't have "premature" EOFs, so you need some sort of
lock, which in turn requires a (non-shell ;-) script.

In other words, this rapidly approaches the complexity of
apt-proxy-or-whatever.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  [EMAIL PROTECTED]




Re: an idea for next generation APT archive caching

2004-10-23 Thread Matt Zimmerman
On Wed, Oct 20, 2004 at 02:11:44AM +0200, martin f krafft wrote:

> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> could be interesting, maybe it's just crap. Your call.

My position on special-purpose proxy caches for APT is that general-purpose
proxy caches (like squid) seem to work fine for me.  What advantages do they
have for others?

-- 
 - mdz




Re: an idea for next generation APT archive caching

2004-10-22 Thread Hamish Moffatt
On Thu, Oct 21, 2004 at 02:59:17PM -0500, Manoj Srivastava wrote:
> Hi,
> 
> I can mostly live with the current apt-proxy, except for the
>  fact that it does not seem to want to play nice with debbootstrap:
>  debbootstrap just hangs.

Happens here too.. my apt-proxy and debootstrap client (pbuilder) are on
different machines. I've done this before, so I think it's new with
apt-proxy v2.

Hamish
-- 
Hamish Moffatt VK3SB <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>




Re: an idea for next generation APT archive caching

2004-10-22 Thread Peter Palfrader
On Wed, 20 Oct 2004, martin f krafft wrote:

> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> could be interesting, maybe it's just crap. Your call.

rapt proxy is an actuall http proxy that caches debian packages.  It's
written in ruby and since all you have to do to use it is setting
the http_proxy environment (or configure apt to use the proxy it will
rock.  No more stupid changing sources.list just to get cached.

It's available on alioth.

-- 
 PGP signed and encrypted  |  .''`.  ** Debian GNU/Linux **
messages preferred.| : :' :  The  universal
   | `. `'  Operating System
 http://www.palfrader.org/ |   `-http://www.debian.org/




Re: an idea for next generation APT archive caching

2004-10-22 Thread Manoj Srivastava
On Fri, 22 Oct 2004 15:44:12 +1000, Brian May <[EMAIL PROTECTED]> said: 

>> "Manoj" == Manoj Srivastava <[EMAIL PROTECTED]> writes:
Manoj> Hi, I can mostly live with the current apt-proxy, except for
Manoj> the fact that it does not seem to want to play nice with
Manoj> debbootstrap: debbootstrap just hangs.

> Strange. I have never had any problems with debootstrap and
> apt-proxy myself (unless it was because the server was down or
> something at the time).  -- Brian May <[EMAIL PROTECTED]>

Hmm. The sever was on the same machine as debbootstrap, and it
 did not seem to be down (other clients were being served) --- ony
 debbootstrap seemed to hang. I'll keepa sharper look out when I do
 the 2.6.9 UML's.

manoj
-- 
To spot the expert, pick the one who predicts the job will take the
longest and cost the most.
Manoj Srivastava   <[EMAIL PROTECTED]>  
1024D/BF24424C print 4966 F272 D093 B493 410B  924B 21BA DABB BF24 424C




Re: an idea for next generation APT archive caching

2004-10-22 Thread Chris Halls
On Thu, 2004-10-21 at 04:04, Brian May wrote:
> * If the above point wasn't bad enough by itself, the apt-proxy binary has 
> hard coded:
> 
> WGET_CMD="$WGET --timestamping --no-host-directories --tries=5 
> --no-directories -P $DL_DESTDIR"

Hmm, seems you are talking about version 1, which has been rewritten. 
The new version isn't bug free yet but it does fix several problems.  It
doesn't use wget.

> * No thought put into the file deletion algorithm. IMHO, deleting
> files based on age is wrong (consider how long stable files
> last). Deleting files based on number of different copies is also
> wrong (consider if you have some systems setup with stable and another
> is unstable). IMHO, the only correct way is to scan the most recently
> downloaded Packages and Source index files and delete files that
> aren't mentioned anymore. This could be made more aggressive though if
> disk space is low.

There are several cleaning algorithms, controlled by different
parameters.  The 'only correct way' algorithm described above is
controlled by the max_versions parameter (in version 1 & 2)

Chris




Re: an idea for next generation APT archive caching

2004-10-22 Thread Wouter Verhelst
On Fri, Oct 22, 2004 at 08:22:57PM +0200, martin f krafft wrote:
> also sprach Matthias Urlichs <[EMAIL PROTECTED]> [2004.10.22.2011 +0200]:
> > This rapidly turns from a plain 404 error script into a somewhat
> > nontrivial Perl-or-Python-or-whatever doument handler.
> 
> It's still rather simple. I will have to think about the premature
> EOF. I am sure there is a way to do it.

Oh, absolutely. One way could be to talk to the apt-cacher and apt-proxy
developers and help fixing bugs in their software, instead of calling
your not even fully thought out idea (which surely hasn't proven itself)
the "next generation".

-- 
 EARTH
 smog  |   bricks
 AIR  --  mud  -- FIRE
soda water |   tequila
 WATER
 -- with thanks to fortune




Re: an idea for next generation APT archive caching

2004-10-22 Thread martin f krafft
also sprach Matthias Urlichs <[EMAIL PROTECTED]> [2004.10.22.2011 +0200]:
> >> exec wget -O - $MIRROR/$RPATH | tee $LPATH
> > 
> Don't forget
>   mkdir -p "$(dirname "$LPATH")"

Why the extra two processes?

mkdir -p ${LPATH%/*}

> The above pipe needs either bash 3 or a subshell, if you want to
> be able to catch any errors wget might die from.

Yes, errors are not easy to handle in shell. That's why this
should be done with Perl/Python, or at least curl.

> One more problem -- what happens if client #2 wants the file while
> the wget is still in progress?

Couldn't you somehow tell apache to not read the premature EOF? The
second client could read the file and the connection would just
block until more data becomes available.

> NB: What to do about no-cache pragmas?

Serve the file but don't store it? Just leave out the tee(1) call in
the above.

> This rapidly turns from a plain 404 error script into a somewhat
> nontrivial Perl-or-Python-or-whatever doument handler.

It's still rather simple. I will have to think about the premature
EOF. I am sure there is a way to do it.

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-22 Thread Matthias Urlichs
Hi, martin f krafft wrote:

> also sprach martin f krafft <[EMAIL PROTECTED]> [2004.10.20.1155 +0200]:
>> #!/bin/sh -e
>> 
>> echo 200 OK
>> echo Content-type: application/x-debian-package
>> echo
>> 
>> exec wget -O - $MIRROR/$RPATH | tee $LPATH
> 
Don't forget
  mkdir -p "$(dirname "$LPATH")"

The above pipe needs either bash 3 or a subshell, if you want to be able
to catch any errors wget might die from.

> one might want to parse wget's error output and return 404 as before
> if it returns 404. then again, in the end, this should be
> implemented in perl of C/C++ anyway.

One more problem -- what happens if client #2 wants the file while the
wget is still in progress?

In other words, you need locking. I'd also ask the remote server for
a Content-Length: (via HEAD or whatever) so that broken transfers can be
detected a bit more reliably.

NB: What to do about no-cache pragmas?


This rapidly turns from a plain 404 error script into a somewhat
nontrivial Perl-or-Python-or-whatever doument handler.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  [EMAIL PROTECTED]




Re: an idea for next generation APT archive caching

2004-10-22 Thread martin f krafft
also sprach Robert Collins <[EMAIL PROTECTED]> [2004.10.22.0019 +0200]:
> store_avg_object_size should have no impact on what is and is not
> cached.

Ah, interesting. I guess my testing results were influence by my
expectations then.

Thanks for your tips!

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-22 Thread Eduard Bloch
#include 
* Adeodato Simó [Fri, Oct 22 2004, 04:40:52AM]:
> > Further, I wish there could be pre-caching. Means: if a file was
> > downloaded and that file was mentioned in packages-file A and after the
> > next update, A has a newer version of this package than the package
> > could be downloaded. This would be an optional feature, of course, but
> > it could be implemented without millions LOC.
> 
>   well, that would change apt-cacher from a from a simple webserver
>   script to a daemon-via-cron application.

...and? The clean script is already cron-triggered.

>   what problems does it have a cron.daily script that runs like:
> 
> apt-get -qq update && apt-get -qqdy dist-upgrade

That the hosting machine is maybe not a Debian box? Or runs different
Debian branch than its clients?

Regards,
Eduard.
-- 
Sieht die Magd den Bauern nackt, wird vom Brechreiz sie gepackt.




Re: an idea for next generation APT archive caching

2004-10-22 Thread Paul Hampson
On Fri, Oct 22, 2004 at 02:21:17PM +1000, Jonathan Oxer wrote:
> On Fri, 2004-10-22 at 13:43 +1000, Paul Hampson wrote:
> 
> > Is there anything such a system would want to fetch from a Debian
> > mirror that doesn't show up in Packages.gz or Sources.gz?

> Yes, lots of things as I found out the hard way when I implemented
> object type checking in apt-cacher - even plain old .tar.gz if you want
> people to be able to fetch sources. Not good from a "don't use this as a
> general purpose relay" standpoint! The current checks in apt-cacher look
> like this:

> if ($filename =~ /(\.deb|\.rpm|\.dsc|\.tar\.gz|\.diff\.gz|\.udeb)$/) {

> } elsif ($filename =~ /(Packages.gz|Release|Sources.gz)$/) {
> ...
> } else {
> etc.

.rpm
To my mind:
Packages.gz refers to the .deb or .udeb
Sources.gz refers to the .dsc, .orig.tar.gz and the .diff.gz
Releases refers to Packages, but is this either neccessary, or
widely used outside the Debian mirrors themselves? (Does apt
even use Releases?)

I'm not even going to think about non-apt uses of this. ^_^
(Although circumvention of any checking is relatively easy... A
Sources.gz that refers to a. .orig.tar.gz which may contain anything
the web site owner wishes.)

And of course, this all is a complete shutout on apt-archives
without Packages.gz. ^_^

Once a file's in the cache tree, then Apache can serve it directly,
and it'll be there until it's cleaned by some other process.

Now that I look in my /var/lib/apt/lists directory, I'm reminded of
the other gain I'd like to see made here... The output from apt-cache
policy shows the host name/IP and then the path under 'dists'. Which
means it can't visually distinguish packages from:
deb http://192.168.0.1:/debian sid main
deb http://192.168.0.1:/ipv6 sid main
both give: 500 http://192.168.0.1 sid/main Packages
(This is a mock-up... the IPv6 archive has ipv6 as pool, not 'main'.
I can't seem to find an example now, but I'm sure I used to hit one
all the time before. >_<)

Using this as a proxy means the source names don't chance, and so the
hostnames become sensible/usable again. (Even though they're not
neccessarily accurate ^_^).

> (It's trapping the Packages.gz etc files separately because you can't
> just cache them directly: you'd have namespace collisions all over the
> shop. They have to be stored separately in the cache based on the
> requested host, distro etc and then the names mapped back again when
> another request comes in).

Hopefuly _that_ will be avoided by storing in a mirror-structured tree,
rooted at the mirror-source or something.

And for that I'm thinking something like the apt-proxy configuration
where the admin defines a mirror-type, hostnames to recognise, and
mirror sources to talk to.

Also an option for "dynamic mirrors" would be good, for any unrecognised
hostname to effectively autogenerate it's own mirror directory.

Of course, now I might be asking too much of mod_rewrite and/or
mod_proxy. I'll need to do some reading myself to determine if this is
possible in the form I hope for.

-- 
---
Paul "TBBle" Hampson, MCSE
7th year CompSci/Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
[EMAIL PROTECTED]

"No survivors? Then where do the stories come from I wonder?"
-- Capt. Jack Sparrow, "Pirates of the Caribbean"

This email is licensed to the recipient for non-commercial
use, duplication and distribution.
---


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-22 Thread Brian May
> "martin" == martin f krafft <[EMAIL PROTECTED]> writes:

martin> also sprach Jonathan Oxer <[EMAIL PROTECTED]>
martin> [2004.10.21.0617 +0200]:
>> So it's necessary to keep fetching the Packages files within
>> their expiry time or the cache gets nuked.

martin> Why delete them at all?

Need more disk space?

When I last partitioned my hard disk, I allocated 2 gig for caching
Debian files with apt-proxy. I thought this would be heaps to meet my
requirements.

It currently has approx 350 Meg free, and no doubt if I kept my
testing systems up to date with testing, this would vanish in several
days time.
-- 
Brian May <[EMAIL PROTECTED]>




Re: an idea for next generation APT archive caching

2004-10-22 Thread Brian May
> "Manoj" == Manoj Srivastava <[EMAIL PROTECTED]> writes:

Manoj> Hi, I can mostly live with the current apt-proxy, except
Manoj> for the fact that it does not seem to want to play nice
Manoj> with debbootstrap: debbootstrap just hangs.

Strange. I have never had any problems with debootstrap and apt-proxy
myself (unless it was because the server was down or something at the
time).
-- 
Brian May <[EMAIL PROTECTED]>




Re: an idea for next generation APT archive caching

2004-10-21 Thread Jonathan Oxer
On Fri, 2004-10-22 at 13:43 +1000, Paul Hampson wrote:

> Is there anything such a system would want to fetch from a Debian
> mirror that doesn't show up in Packages.gz or Sources.gz?

Yes, lots of things as I found out the hard way when I implemented
object type checking in apt-cacher - even plain old .tar.gz if you want
people to be able to fetch sources. Not good from a "don't use this as a
general purpose relay" standpoint! The current checks in apt-cacher look
like this:

if ($filename =~ /(\.deb|\.rpm|\.dsc|\.tar\.gz|\.diff\.gz|\.udeb)$/) {
...
} elsif ($filename =~ /(Packages.gz|Release|Sources.gz)$/) {
...
} else {
etc.

(It's trapping the Packages.gz etc files separately because you can't
just cache them directly: you'd have namespace collisions all over the
shop. They have to be stored separately in the cache based on the
requested host, distro etc and then the names mapped back again when
another request comes in).

Cheers  :-)

Jonathan




Re: an idea for next generation APT archive caching

2004-10-21 Thread Paul Hampson
On Wed, Oct 20, 2004 at 02:11:44AM +0200, martin f krafft wrote:
> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> could be interesting, maybe it's just crap. Your call.

> Based on a normal mirror layout, the idea is to use apache's 404
> hook for packages. When an existing package is requested, it is
> served regularly. If the file is not found, a 404 is triggered,
> which can be served by a CGI-like thingie that goes to retrieve the
> package, returns 200 instead of 404 and streams the package as the
> 404 error document contents while writing it to the filesystem
> (tee(1) style).

This all sounds great. I've been thinking about it since this posting,
and one thing I'd like to see would be this hooked up to mod_proxy, so
I only have to change one setting in apt.conf, rather than a couple of
lines in sources.list, when I move from site to site.

I'm not sure yet myself how this would work, but I guess it would
catch anything that was looking for a Packages.gz, and rewrite it
internally as a connection to a local mirror or mirrors. If that
didn't work, trying the client-supplied mirror server as a fallback
would allow it to neatly cache those files which were coming from
Debian-external package pools, while still ensuring only files that
are listed in a Packages.gz somewhere get through.

That last fallback would have to be optional, or anyone could put a
Packages.gz on a webserver, and suddenly any file would be gettable
through the proxy, which would defeat one of the purposes to which
_I_ currently put apt-proxy (which is providing unmetered Debian
archive access for otherwise metered ISP customers. ^_^)

But for my home install, it'd be nice to know that experimental,
the IPv6 archive and the WMI, CenterICQ and irssi daily build
archives are all being cached and cleaned automatically when I'm
there, and otherwise I comment out the proxy option and they work
when I'm off-site for a week.

Is there anything such a system would want to fetch from a Debian
mirror that doesn't show up in Packages.gz or Sources.gz?

> For Release, Package, Sources, and Contents files, we need
> a RewriteRule. When one of these is is accessed, a call to a mirror
> should be made to check for updates. If there is one, download it
> and stream it.

> How do you send the newly retrieved file instead of the static file
> present on the filesystem? Essentially, this is the only need for
> a proxy, which could be implemented with a RewriteRule and a CGI. Or
> maybe apache can do this somehow?

> I think this would be an extremely simple implementation, using the
> proven apache for most of the work (and not the buggy twisted module
> that apt-proxy uses). Thus, the entire thing is reduced to a couple
> of httpd.conf entries and two extremely simple (?) CGIs.

> In addition, a cronjob runs daily to purge all files in the
> filesystem space, which are not referenced from any of the
> Packages/Sources files.

> This is a braindump. Please comment. Am I missing something? Would
> someone like to try this? I really don't have the time right now...

-- 
---
Paul "TBBle" Hampson, MCSE
7th year CompSci/Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
[EMAIL PROTECTED]

"No survivors? Then where do the stories come from I wonder?"
-- Capt. Jack Sparrow, "Pirates of the Caribbean"

This email is licensed to the recipient for non-commercial
use, duplication and distribution.
---


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-21 Thread Adeodato Simó
* Eduard Bloch [Thu, 21 Oct 2004 18:12:42 +0200]:

> And though I like apt-cacher in general (it worked immediately while I
> did not manage to make apt-proxy work within 15 minutes and dropped the
> crap), this is the only method I do not like at all.
> It could be done better. I suggest you switch from wget to curl and use
> If-Modified-Since calls to update the Package/Source/Release file only
> when needed. And only when the local copy of them has changed (and the
> update was clean), then the deb files should be purged (when the next
> cleanup cycle comes). You could even check for index file updates based
> on time periods instead of triggering it by the user. Actually, it could
> be a cron-job that downloads them to /dev/null.

> Further, I wish there could be pre-caching. Means: if a file was
> downloaded and that file was mentioned in packages-file A and after the
> next update, A has a newer version of this package than the package
> could be downloaded. This would be an optional feature, of course, but
> it could be implemented without millions LOC.

  well, that would change apt-cacher from a from a simple webserver
  script to a daemon-via-cron application.

  what problems does it have a cron.daily script that runs like:

apt-get -qq update && apt-get -qqdy dist-upgrade

  ?

-- 
Adeodato Simó
EM: asp16 [ykwim] alu.ua.es | PK: DA6AE621
 
When all is summed up, a man never speaks of himself without loss; his
accusations of himself are always believed; his praises never.
-- Michel de Montaigne




Re: an idea for next generation APT archive caching

2004-10-21 Thread Robert Collins
On Fri, 2004-10-22 at 03:36 +0200, Tobias Hertkorn wrote:
> One bad thing (amongs others) that happens if you use squid - first of all 
> you have to make your clients use the proxy settings

Set it up in reverse proxy mode (way easier in 3.0) and you don't use
proxy settings - you use it as your repository. AIUI apt-proxy, you use
it in much the same way.

>  AND way more imortant - 
> a request for http://yourserver/testing//apachedeb will not create a 
> hit if requested as http://yourserver/sid//apache...deb . Furthermore 
> requests to similar mirrors will not create cache hits. So everybody has to 
> use the same sources list, down to the same requests by symlinks.

A fairly simple redirector will accomodate this, but even if not done,
the pool is common - and the bulk of sid & testing updates is in the
pool.


Rob

-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: an idea for next generation APT archive caching

2004-10-21 Thread Jonathan Oxer
On Thu, 2004-10-21 at 13:13 +0200, martin f krafft wrote:

> also sprach Jonathan Oxer <[EMAIL PROTECTED]> [2004.10.21.0617 +0200]:
> > So it's necessary to keep fetching the Packages files within their
> > expiry time or the cache gets nuked.
> 
> Why delete them at all?

Because then they are never re-fetched, and they'll be out of date.

Cheers  :-)

Jonathan Oxer




Re: an idea for next generation APT archive caching

2004-10-21 Thread Jonathan Oxer
On Fri, 2004-10-22 at 03:36 +0200, Tobias Hertkorn wrote:

> a request for http://yourserver/testing//apachedeb will not create a 
> hit if requested as http://yourserver/sid//apache...deb . Furthermore 
> requests to similar mirrors will not create cache hits. So everybody has to 
> use the same sources list, down to the same requests by symlinks.

Yep, that's annoying and it's one of the reasons for the flat cache
design in apt-cacher: if clients fetch the same package from different
mirrors it will cause a cache hit since the package names are the same.

However, something that was raised at Debian Miniconf2 (IIRC) was that
this allows cache poisoning: by creating a compromised package and
sticking it on any random web server, a cracker can then fetch the
package themselves through the cache and any user who subsequently
fetches it (even using a genuine mirror in the sources.list) will get
the poisoned package.

Good argument for package signatures.

Cheers  :-)

Jonathan Oxer
--
The Debian Universe: Installing, managing and using Debian GNU/Linux
http://www.debianuniverse.com/




Re: an idea for next generation APT archive caching

2004-10-21 Thread Tobias Hertkorn
One bad thing (amongs others) that happens if you use squid - first of all 
you have to make your clients use the proxy settings AND way more imortant - 
a request for http://yourserver/testing//apachedeb will not create a 
hit if requested as http://yourserver/sid//apache...deb . Furthermore 
requests to similar mirrors will not create cache hits. So everybody has to 
use the same sources list, down to the same requests by symlinks.
You can make squid better than the default, but excellent for clients that 
are not owned by the very same admin? I doubt it.

Greets,
Tobi
- Original Message - 
From: "Robert Collins" <[EMAIL PROTECTED]>
To: "Chris Halls" <[EMAIL PROTECTED]>
Cc: 
Sent: Friday, October 22, 2004 1:56 AM
Subject: Re: an idea for next generation APT archive caching

Caching for concurrent clients is non trivial :). Theres not a lot squid
would need done, to make it an excellent archive-specific cache:




Re: an idea for next generation APT archive caching

2004-10-21 Thread Robert Collins
On Thu, 2004-10-21 at 17:31 +0100, Chris Halls wrote:

> There have been quite a lot of attempts to make a better apt-proxy, but
> almost always the authors discovered the problem is rather difficult to 
> get right, especially when you start worrying about streaming while
> downloading, multiple clients downloading simultaneously and cache
> cleaning algorithms.  Based on previous attempts, I'd advise you not to
> underestimate the task.  Is apt-proxy really so broken that the only way
> to make something better is to rewrite the whole thing from scratch?

Caching for concurrent clients is non trivial :). Theres not a lot squid
would need done, to make it an excellent archive-specific cache:

If we allowed cache dir selection by request tag, or just by standards
then acl's could trivially tag requests that should go into a dedicated
cache dir, thus preventing normal traffic ejecting packages or archive
metadata from the cache. Beyond that, see my other email for some
settings that should make squid much better than the default, for apt
caching.

Rob

-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: an idea for next generation APT archive caching

2004-10-21 Thread Robert Collins
On Wed, 2004-10-20 at 12:11 +0200, martin f krafft wrote:
> also sprach martin f krafft <[EMAIL PROTECTED]> [2004.10.20.0211 +0200]:
> > Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> > could be interesting, maybe it's just crap. Your call.

> 3. squid:
>   Squid works reliably, but it has no concept of the APT repository
>   and thus it is impossible to control what is cached and for how
>   long. The release-codename symlinks can be worked around with
>   a simple rewriter, but other than that, there are three parameters
>   that seem relevant:
> 
>   maximum_object_size 131072 KB
>   cache_dir aufs /var/spool/squid-apt 1024 16 256
>   store_avg_object_size 100 Kb
> 
>   These values are what I came up with after two days of testing.
>   The problematic one is the last one. It's at 13 Kb per default,
>   and this causes squid not to reliably cache objects larger than
>   35 Mb. Increasing it to 100 Kb causes even openoffice.org to be
>   cached for some time, but the high average also causes smaller
>   files to be removed earlier than they should be.

store_avg_object_size should have no impact on what is and is not
cached. It is used to estimate the has size required to fully populate
the cache. Having too low a value there will simply cause squid to
create a hash table that is larger than optimal : it will not enable or
prevent caching of debs, nor will it cause smaller files to be removed.

If you are using bloom inter-cache digests, the average object size
estimator is also used there to tune the bloom digest for optimal
density. Again, no impact on the local squid's caching or not of any
given object.

I usually bump my max object size up past 720MB, so that I can cache
isos.
maximum_object_size 740 MB

One of the problems with sid debs, is that they are often very recent,
and delivered without cache expiry metadata, so squid's heuristic, which
looks at time since modification, gives them an inappropriately low
maxmium lifetime. So lets specify 1 day minimum for debs without expiry
metadata, and 1/5 of its age as the age based freshness, and upper cap
that at 1 month. This is heavily geared, and if you are happy with
revalidation - i.e. your deb mirror returns the same mod time & etag
when squid checks, and you aren't trying to use this offline, then this
can reduced. As debs names are not reused, this should be safe as is.

refresh_pattern deb$ 14400 20% 2592000

likewise the control files files, but these we expect to change daily.

refresh_pattern (Packages(.gz)?|Release|Sources.gz)$ 14300 20% 14400

(I think that regex is right, haven't tested it).

You probably want heap LFUDA cache replacement policy.

cache_replacement_policy heap LFUDA


Cheers,
Rob

-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: an idea for next generation APT archive caching

2004-10-21 Thread Manoj Srivastava
Hi,

I can mostly live with the current apt-proxy, except for the
 fact that it does not seem to want to play nice with debbootstrap:
 debbootstrap just hangs.

manoj
-- 
Philogyny recapitulates erogeny; erogeny recapitulates philogyny.
Manoj Srivastava   <[EMAIL PROTECTED]>  
1024D/BF24424C print 4966 F272 D093 B493 410B  924B 21BA DABB BF24 424C




Re: an idea for next generation APT archive caching

2004-10-21 Thread Chris Halls
On Wed, 2004-10-20 at 11:11, martin f krafft wrote:
> 1. apt-proxy:
>   While I love the concept of apt-proxy, it works very unreliably.
>   Frequently, the proxy fails to download the package or imposes
>   very long delays (#272217, and others).

This seems to be the result of a patch that fixed one problem and caused
another.  You can work around it for now by using this in the config
file:

disable_pipelining=1

>   If it does work, it's a performance hog. On my Opteron 3600+, my
>   mouse starts to go jaggy when more than one machine accesses the
>   cache at the same time.

Yes, this needs looking into.  If you actually have some time on your
hands to do this sort of thing, I'd encourage you to look at apt-proxy
seriously.  It hasn't had the loving attention it needs since ranty
died, although Otavio has been fixing some of the problems recently.

There have been quite a lot of attempts to make a better apt-proxy, but
almost always the authors discovered the problem is rather difficult to 
get right, especially when you start worrying about streaming while
downloading, multiple clients downloading simultaneously and cache
cleaning algorithms.  Based on previous attempts, I'd advise you not to
underestimate the task.  Is apt-proxy really so broken that the only way
to make something better is to rewrite the whole thing from scratch?

Chris




Re: an idea for next generation APT archive caching

2004-10-21 Thread Eduard Bloch
#include 
* Jonathan Oxer [Thu, Oct 21 2004, 02:17:49PM]:

> > is unstable). IMHO, the only correct way is to scan the most recently
> > downloaded Packages and Source index files and delete files that
> > aren't mentioned anymore.
> 
> That's how apt-cacher does it. Early versions of apt-cacher did no cache
> cleaning and it was the #1 requested feature for a while, but once I sat
> down to actually start implementing it I discovered something that's not
> obvious until you actually try to do it yourself: Writing Cache Expiry
> Algorithms Is Bloody Hard(TM).
> 
> In the end I settled on a combination: Packages and Release files are
> expired based on age, and .debs are purged based on reference within a

And though I like apt-cacher in general (it worked immediately while I
did not manage to make apt-proxy work within 15 minutes and dropped the
crap), this is the only method I do not like at all.
It could be done better. I suggest you switch from wget to curl and use
If-Modified-Since calls to update the Package/Source/Release file only
when needed. And only when the local copy of them has changed (and the
update was clean), then the deb files should be purged (when the next
cleanup cycle comes). You could even check for index file updates based
on time periods instead of triggering it by the user. Actually, it could
be a cron-job that downloads them to /dev/null.

Further, I wish there could be pre-caching. Means: if a file was
downloaded and that file was mentioned in packages-file A and after the
next update, A has a newer version of this package than the package
could be downloaded. This would be an optional feature, of course, but
it could be implemented without millions LOC.

Regards,
Eduard.
-- 
Wer die Form zerstört, beschädigt auch den Inhalt.
-- Herbert von Karajan




Re: an idea for next generation APT archive caching

2004-10-21 Thread martin f krafft
also sprach Tobias Hertkorn <[EMAIL PROTECTED]> [2004.10.20.1449 +0200]:
> As a part of speeding up delivery, I programmed an apache module called 
> mirror_mod.
> For documentation look here:
> http://hacktor.fs.uni-bayreuth.de/apt-got/docs/mod_mirror.html

Interesting! Though I am not sure this would solve my problem...



also sprach Brian May <[EMAIL PROTECTED]> [2004.10.21.0504 +0200]:
> I have looked at (and flamed) apt-proxy in particular, but
> I suspect at least some of the issues here might also be relevant
> to other caching packages.
> 
> If you want a reliable caching service, I think some thought needs
> to be put into some of the issues above. Some issues might be easy
> to fix, others might be harder (e.g. minimizing latency so the
> client doesn't time out and to minimize download time but choosing
> the best server at the same time).

I realise, and I don't want to bash apt-proxy for being ambitious!

However, I am a big friend of simple solutions, and what I have in
mind just sounds like it's a whole lot easier than a separate
daemon. Simplicity -> robustness, that is all...



also sprach Jonathan Oxer <[EMAIL PROTECTED]> [2004.10.21.0617 +0200]:
> So it's necessary to keep fetching the Packages files within their
> expiry time or the cache gets nuked.

Why delete them at all?

Also, if there is no Packages file, then the deletion should not
proceed.

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-20 Thread Jonathan Oxer
On Thu, 2004-10-21 at 13:04 +1000, Brian May wrote:

> * No thought put into the file deletion algorithm. IMHO, deleting
> files based on age is wrong (consider how long stable files
> last). Deleting files based on number of different copies is also
> wrong (consider if you have some systems setup with stable and another
> is unstable). IMHO, the only correct way is to scan the most recently
> downloaded Packages and Source index files and delete files that
> aren't mentioned anymore.

That's how apt-cacher does it. Early versions of apt-cacher did no cache
cleaning and it was the #1 requested feature for a while, but once I sat
down to actually start implementing it I discovered something that's not
obvious until you actually try to do it yourself: Writing Cache Expiry
Algorithms Is Bloody Hard(TM).

In the end I settled on a combination: Packages and Release files are
expired based on age, and .debs are purged based on reference within a
Packages file. However, that's not a 100% solution either because what
happens if several days go by without any clients doing an 'apt-get
update'? The Packages file is purged by the cache cleaning script
because it's too old, but then all the .debs are purged too because
there's no matching Packages file! Doh.

So it's necessary to keep fetching the Packages files within their
expiry time or the cache gets nuked.

> If you want a reliable caching service, I think some thought needs to
> be put into some of the issues above. Some issues might be easy to
> fix, others might be harder (e.g. minimizing latency so the client
> doesn't time out and to minimize download time but choosing the best
> server at the same time).

I haven't looked at it them for this purpose in detail but I still think
p2p systems are a natural for this. Layering .deb package retrieval onto
the Torrent or similar would rock. I'm sure others know much more about
the issues though.

> You mean via HTTP? This would be possible to add, I think. I guess it
> hasn't been considered a priority.

Not necessarily, it depends on the cache architecture. Trying to do this
with apt-cacher, for example, would suck mightily because it uses a flat
cache structure. What's really needed to make is trivially browseable is
a cache that stores objects in a structure that mimics the original
mirror structure. My understanding is that apt-proxy v2 was written with
this in mind, but as usual I'm probably wrong.

Cheers  :-)

Jonathan Oxer
--
The Debian Universe: Installing, managing and using Debian GNU/Linux
http://www.debianuniverse.com/




Re: an idea for next generation APT archive caching

2004-10-20 Thread Brian May
> "martin" == martin f krafft <[EMAIL PROTECTED]> writes:

martin> 1. apt-proxy: While I love the concept of apt-proxy, it
martin> works very unreliably.  Frequently, the proxy fails to
martin> download the package or imposes very long delays (#272217,
martin> and others).

apt-proxy is generally good. I use it all the time here, and my
current ISP uses it to (which means I can download Debian packages
without it counting towards my usage quota!).

It has some limitations though (sorry, I haven't read the bug reports;
I am not sure if these are reported). I don't consider these show
stoppers, but they can be irritating at times.

* Sometimes if all servers return file not found (e.g. because the
package is obsolete and has been deleted), the file not found is not
passed on to the client. Instead the client times out. Sometimes. When
I test it out in order to write bug reports, it works fine. Arrghh!

* Can't cope very well if server fails, and it will repeatedly try to
contact the dead server for every request, meaning the client is likely
to timeout even if the 2nd server listed is OK.

* If the above point wasn't bad enough by itself, the apt-proxy binary has 
hard coded:

WGET_CMD="$WGET --timestamping --no-host-directories --tries=5 --no-directories 
-P $DL_DESTDIR"

So if the client doesn't time out after the first try, it will timeout
after 5 more times. This is insane! I hacked my binary to remove this
parameter. Also the 30 second timeout seems a while, but at least this
is configurable (and may actually be good if the upstream server is
running apt-proxy, in case the upstream server encounters delays).

* No thought put into the file deletion algorithm. IMHO, deleting
files based on age is wrong (consider how long stable files
last). Deleting files based on number of different copies is also
wrong (consider if you have some systems setup with stable and another
is unstable). IMHO, the only correct way is to scan the most recently
downloaded Packages and Source index files and delete files that
aren't mentioned anymore. This could be made more aggressive though if
disk space is low.

* Can't cope with out of disk space errors, it will keep trying to
download regardless, giving the client time out errors.

* Previously, due to a bug in squid, if you told apt-proxy to go via
squid, it would never get updated, as squid always returned the cached
HTTP headers which said the file hadn't changed. This in turn
indicated to wget not to download the new file. I filed a bug report
on this, and it was closed, so I assume the issue has been fixed (not
tested myself).

I have looked at (and flamed) apt-proxy in particular, but I suspect
at least some of the issues here might also be relevant to other
caching packages.

If you want a reliable caching service, I think some thought needs to
be put into some of the issues above. Some issues might be easy to
fix, others might be harder (e.g. minimizing latency so the client
doesn't time out and to minimize download time but choosing the best
server at the same time).

martin>   If it does work, it's a performance hog. On my Opteron
martin> 3600+, my mouse starts to go jaggy when more than one
martin> machine accesses the cache at the same time.

Strange. I guess I only have used one machine at a time. I haven't
heard anyone complain of similar problems before though.

martin>   I have always missed the ability to surf the local
martin> repository.

You mean via HTTP? This would be possible to add, I think. I guess it
hasn't been considered a priority.
-- 
Brian May <[EMAIL PROTECTED]>




Re: an idea for next generation APT archive caching

2004-10-20 Thread Tobias Hertkorn
Hi,
I did an advanced programming project on Debian mirroring called apt-got. It 
is a standalone java server with mirroring capability.
As a part of speeding up delivery, I programmed an apache module called 
mirror_mod.
For documentation look here:
http://hacktor.fs.uni-bayreuth.de/apt-got/docs/mod_mirror.html

The philosophy behind this approach is that mirror engine and apache plugin 
share the directories created by the mirror engine. Requests from apt-get 
are directed to the apache plugin. If the plugin finds a file at that URL, 
it gets delivered by apache. If it does not find it on the other hand, it 
sends back a redirect URL, causing apt-get to try and get the file from you 
mirror engine directly. For example apt-proxy, apt-cacher or apt-got.

Well, basicly think of mod_mirror as a very lightweight and specialized 
rewriting engine.

Greets,
Tobi
- Original Message - 
From: "Adrian 'Dagurashibanipal' von Bidder" <[EMAIL PROTECTED]>
To: "debian developers" 
Sent: Wednesday, October 20, 2004 2:08 PM
Subject: Re: an idea for next generation APT archive caching

But your approach of catching apache's 404 certainly sounds interesting,
and would be extremely lightweight in the case of cache hits.  Just be 
sure to
relay 'real' 404s properly.




Re: an idea for next generation APT archive caching

2004-10-20 Thread Adrian 'Dagurashibanipal' von Bidder
On Wednesday 20 October 2004 12.11, martin f krafft wrote:

> 2. apt-cacher:
>   Also a very nice concept, I have found it rather unusable. Clients
>   would time out as the streaming does not work reliably. Also,
>   after using it for a day or two, I found 30 or more Perl zombies
>   on the system from the CGI.

With current apt-cacher from testing, on {apche,perl,libc6}/testing, I can't 
confirm this.  apt-cacher works very reliable for me.

The only thinkg I'd like is to have apt-cacher appear as 'native' Debian 
repository instead of needing to specify the full URL of the mirror 
(reason: so I can switch the mirror to be used from the apt-cacher config 
without the need to modify all the sources.lists.)

>   Here too, it is not possible to browse the repository.

Hasn't bothered me so far.


But your approach of catching apache's 404 certainly sounds interesting, and 
would be extremely lightweight in the case of cache hits.  Just be sure to 
relay 'real' 404s properly.

cheers
-- vbi

-- 
Oops


pgpygIixsdHOc.pgp
Description: PGP signature


Re: an idea for next generation APT archive caching

2004-10-20 Thread paddy
Martin,

On Wed, Oct 20, 2004 at 12:11:12PM +0200, martin f krafft wrote:
> 3. squid:
>   Squid works reliably, but it has no concept of the APT repository
>   and thus it is impossible to control what is cached and for how
>   long. 

I've long wondered whether the best answer might not be to teach an existing 
proxy cache about the content, perhaps by way of some 'plug-in' interface.

>   If you have a better squid configuration for Debian, please share
>   it!

Sorry! I use apt-proxy (well, both of them, but in different contexts)

> Thus, my proposal for an apache-integrated approach, which solves
> most of the issues above and appears to me to be very simple and
> transparent.

Not to mention very entertaining, many thanks!

Regards,
Paddy
-- 
Perl 6 will give you the big knob. -- Larry Wall




Re: an idea for next generation APT archive caching

2004-10-20 Thread martin f krafft
also sprach martin f krafft <[EMAIL PROTECTED]> [2004.10.20.0211 +0200]:
> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> could be interesting, maybe it's just crap. Your call.

Some people asked how this differs from existing methods. Here are
my experiences:

1. apt-proxy:
  While I love the concept of apt-proxy, it works very unreliably.
  Frequently, the proxy fails to download the package or imposes
  very long delays (#272217, and others).

  If it does work, it's a performance hog. On my Opteron 3600+, my
  mouse starts to go jaggy when more than one machine accesses the
  cache at the same time.

  I have always missed the ability to surf the local repository.

2. apt-cacher:
  Also a very nice concept, I have found it rather unusable. Clients
  would time out as the streaming does not work reliably. Also,
  after using it for a day or two, I found 30 or more Perl zombies
  on the system from the CGI.

  Here too, it is not possible to browse the repository.

3. squid:
  Squid works reliably, but it has no concept of the APT repository
  and thus it is impossible to control what is cached and for how
  long. The release-codename symlinks can be worked around with
  a simple rewriter, but other than that, there are three parameters
  that seem relevant:

  maximum_object_size 131072 KB
  cache_dir aufs /var/spool/squid-apt 1024 16 256
  store_avg_object_size 100 Kb

  These values are what I came up with after two days of testing.
  The problematic one is the last one. It's at 13 Kb per default,
  and this causes squid not to reliably cache objects larger than
  35 Mb. Increasing it to 100 Kb causes even openoffice.org to be
  cached for some time, but the high average also causes smaller
  files to be removed earlier than they should be.

  If you have a better squid configuration for Debian, please share
  it!

  Squid works, but by Murphy's law it has always pruned exactly
  those packages that I later need, which are larger than what my
  cable line can handle in a couple of seconds, and if I am under
  time pressure.

  Squid, too, does not allow to browse the archive.

Thus, my proposal for an apache-integrated approach, which solves
most of the issues above and appears to me to be very simple and
transparent.

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-20 Thread martin f krafft
also sprach martin f krafft <[EMAIL PROTECTED]> [2004.10.20.1155 +0200]:
> #!/bin/sh -e
> 
> echo 200 OK
> echo Content-type: application/x-debian-package
> echo
> 
> exec wget -O - $MIRROR/$RPATH | tee $LPATH

one might want to parse wget's error output and return 404 as before
if it returns 404. then again, in the end, this should be
implemented in perl of C/C++ anyway.

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-20 Thread martin f krafft
also sprach Michelle Konzack <[EMAIL PROTECTED]> [2004.10.20.1107 +0200]:
> Because in Apache you can manipulate Error-Messages, it is
> possibel. The problem is only the TimeOut of 'apt-get' if the
> requested Server is not fast enough.

That's why there should be streaming going on. This has been my
major problem with apt-cacher, which I am trying to solve...

> I hve done this already in a php-Script, but unfortunatly my
> apt-404-proxy-cgi does download the File and then forward it to
> apt-get.

Yeah, this is not ideal. However, I think it would be trivial to
implement. Here's the pseudo-code. I assume that $RPATH and $LPATH
contain the path appended to a mirror and $LPATH the local
filesystem path. $MIRROR is a mirror. These variables should be set
from the CGI environment as much as possible, and MIRROR comes from
a conffile. And: this is shell... :)

#!/bin/sh -e

echo 200 OK
echo Content-type: application/x-debian-package
echo

exec wget -O - $MIRROR/$RPATH | tee $LPATH

Would this work?

> Hmmm for those files I have a cronjob, so there are allways the 
> newest versions availlable.

This is also a possibility, but will be hard to get in sync properly
with the mirrors...

> > How do you send the newly retrieved file instead of the static file
> > present on the filesystem? Essentially, this is the only need for
> > a proxy, which could be implemented with a RewriteRule and a CGI. Or
> > maybe apache can do this somehow?
> 
> ???  -  'apache' allow to use CGI to serve Errors.

I am almost sure it does.

> If an error occurs, the Refer URL is paste to the Error-404-CGI. 
> and while the CGI download the file from a Debian-Mirror, it can
> stream the Error-404 as "application/x-debian-package" while
> rewriting the HTTP-Header from 404 to 200.

See above.

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature


Re: an idea for next generation APT archive caching

2004-10-20 Thread Michelle Konzack
Hello Martin, 

Am 2004-10-20 02:11:44, schrieb martin f krafft:
> Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
> could be interesting, maybe it's just crap. Your call.

:-)

> Based on a normal mirror layout, the idea is to use apache's 404
> hook for packages. When an existing package is requested, it is
> served regularly. If the file is not found, a 404 is triggered,
> which can be served by a CGI-like thingie that goes to retrieve the
> package, returns 200 instead of 404 and streams the package as the
> 404 error document contents while writing it to the filesystem
> (tee(1) style).

Realy cool, I was thinking the same but for 2 years or something
like this. 

Because in Apache you can manipulate Error-Messages, it is possibel.
The problem is only the TimeOut of 'apt-get' if the requested Server
is not fast enough.

I hve done this already in a php-Script, but unfortunatly my
apt-404-proxy-cgi does download the File and then forward it
to apt-get. 

I was not able to stream it while downloading.

> For Release, Package, Sources, and Contents files, we need
> a RewriteRule. When one of these is is accessed, a call to a mirror
> should be made to check for updates. If there is one, download it
> and stream it.

Hmmm for those files I have a cronjob, so there are allways the 
newest versions availlable.

> How do you send the newly retrieved file instead of the static file
> present on the filesystem? Essentially, this is the only need for
> a proxy, which could be implemented with a RewriteRule and a CGI. Or
> maybe apache can do this somehow?

???  -  'apache' allow to use CGI to serve Errors.

If an error occurs, the Refer URL is paste to the Error-404-CGI. 
and while the CGI download the file from a Debian-Mirror, it can
stream the Error-404 as "application/x-debian-package" while
rewriting the HTTP-Header from 404 to 200.

> I think this would be an extremely simple implementation, using the
> proven apache for most of the work (and not the buggy twisted module
> that apt-proxy uses). Thus, the entire thing is reduced to a couple
> of httpd.conf entries and two extremely simple (?) CGIs.

Agreed

> In addition, a cronjob runs daily to purge all files in the
> filesystem space, which are not referenced from any of the
> Packages/Sources files.

Perfectly

> This is a braindump. Please comment. Am I missing something? Would
> someone like to try this? I really don't have the time right now...

Curently no time to work on it :-(

Greetings
Michelle

-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/ 
Michelle Konzack   Apt. 917  ICQ #328449886
   50, rue de Soultz MSM LinuxMichi
0033/3/8845235667100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


an idea for next generation APT archive caching

2004-10-19 Thread martin f krafft
Here's an idea I just had about apt-proxy/apt-cacher NG. Maybe this
could be interesting, maybe it's just crap. Your call.

Based on a normal mirror layout, the idea is to use apache's 404
hook for packages. When an existing package is requested, it is
served regularly. If the file is not found, a 404 is triggered,
which can be served by a CGI-like thingie that goes to retrieve the
package, returns 200 instead of 404 and streams the package as the
404 error document contents while writing it to the filesystem
(tee(1) style).

For Release, Package, Sources, and Contents files, we need
a RewriteRule. When one of these is is accessed, a call to a mirror
should be made to check for updates. If there is one, download it
and stream it.

How do you send the newly retrieved file instead of the static file
present on the filesystem? Essentially, this is the only need for
a proxy, which could be implemented with a RewriteRule and a CGI. Or
maybe apache can do this somehow?

I think this would be an extremely simple implementation, using the
proven apache for most of the work (and not the buggy twisted module
that apt-proxy uses). Thus, the entire thing is reduced to a couple
of httpd.conf entries and two extremely simple (?) CGIs.

In addition, a cronjob runs daily to purge all files in the
filesystem space, which are not referenced from any of the
Packages/Sources files.

This is a braindump. Please comment. Am I missing something? Would
someone like to try this? I really don't have the time right now...

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`. martin f. krafft <[EMAIL PROTECTED]>
: :'  :proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid/expired PGP subkeys? Use subkeys.pgp.net as keyserver!


signature.asc
Description: Digital signature