Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
Hi, On Thu, Mar 22, 2018 at 10:08:58PM +0800, Paul Wise wrote: > On Wed, Mar 21, 2018 at 11:50 PM, Osamu Aoki wrote: > > > One negative point of above approach is that it always downloads doc deb > > packages even if there are no changes. > > This will prevent downloading the same package twice: > > chdist apt old install --download-only emacsen-common Oh, that's it. I should have known ... this is simple enough. (I always had real chroot systems so never used it) > > Also unpacking code is still complicated to make it future proof. > > Hmm, do you have any more details? Is dpkg -x not enough? Probably enough. > > Then I thought it may be good idea to create very small Debian unstable > > chroot. Then install pertinent doc packages. For old deb from > > snapshot, we can wget and dpkg -i. Then this chroot can be updated and > > dist-upgraded. This causes to download all base package updates but > > skips repeated download of doc packages. Hmmm... which is worse? > > A chroot needs root so probably isn't going to work here. That's fair. So now agree. Thanks for the idea. Osamu
Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
On Wed, Mar 21, 2018 at 11:50 PM, Osamu Aoki wrote: > One negative point of above approach is that it always downloads doc deb > packages even if there are no changes. This will prevent downloading the same package twice: chdist apt old install --download-only emacsen-common > Also unpacking code is still complicated to make it future proof. Hmm, do you have any more details? Is dpkg -x not enough? > Then I thought it may be good idea to create very small Debian unstable > chroot. Then install pertinent doc packages. For old deb from > snapshot, we can wget and dpkg -i. Then this chroot can be updated and > dist-upgraded. This causes to download all base package updates but > skips repeated download of doc packages. Hmmm... which is worse? A chroot needs root so probably isn't going to work here. -- bye, pabs https://wiki.debian.org/PaulWise
Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
Hi, On Wed, Mar 21, 2018 at 11:26:23AM +0800, Paul Wise wrote: > On Tue, Mar 20, 2018 at 10:55 PM, Osamu Aoki wrote: > > > apt-get has -C option to use non-standard /etc/apt/sources.list > > listing unstable distribution: > > This is the wrong way to completely override the system apt > configuration. The right way is to set the APT_CONFIG environment > variable. See the apt.conf(5) manual page, APT_CONFIG is the first > method that apt uses to find the config file and the -C command-line > option is the very last one so it cannot prevent loading all the > system config. The chdist tool (from devscripts), the apt-venv package > and the derivatives census code get this correct. Great point! > I definitely agree that apt-get could be used to download packages > from repositories, especially since it will verify hashes and > signatures. One negative point of above approach is that it always downloads doc deb packages even if there are no changes. Also unpacking code is still complicated to make it future proof. Then I thought it may be good idea to create very small Debian unstable chroot. Then install pertinent doc packages. For old deb from snapshot, we can wget and dpkg -i. Then this chroot can be updated and dist-upgraded. This causes to download all base package updates but skips repeated download of doc packages. Hmmm... which is worse? I think that chroot may need to mount few parent filesystem points. mounting /proc filesystem mounting /sys filesystem creating /{dev,run}/shm mounting /dev/pts filesystem redirecting /dev/ptmx to /dev/pts/ptmx Osamu
Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
On Tue, Mar 20, 2018 at 10:55 PM, Osamu Aoki wrote: > apt-get has -C option to use non-standard /etc/apt/sources.list > listing unstable distribution: This is the wrong way to completely override the system apt configuration. The right way is to set the APT_CONFIG environment variable. See the apt.conf(5) manual page, APT_CONFIG is the first method that apt uses to find the config file and the -C command-line option is the very last one so it cannot prevent loading all the system config. The chdist tool (from devscripts), the apt-venv package and the derivatives census code get this correct. I definitely agree that apt-get could be used to download packages from repositories, especially since it will verify hashes and signatures. -- bye, pabs https://wiki.debian.org/PaulWise
Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
Hi, On Tue, Mar 20, 2018 at 11:55:01PM +0900, Osamu Aoki wrote: > What you did is one way ;-) > > On Mon, Mar 19, 2018 at 03:27:25PM +0100, Laura Arjona Reina wrote: ... > > pattern specified with -A. Maybe there is a more efficient way to do this? > > Most wgetfiles are downloading the latest unstable binary packages. > Why re-invent the binary package downloader when we have "apt-get"? > > apt-get has -C option to use non-standard /etc/apt/sources.list > listing unstable distribution: > > deb http://deb.debian.org/debian/ sid main contrib non-free > > We need to use non-standard directories and not to contaminate the > system ones: > /var/cache/apt/archives/ > /var/lib/apt/lists/ > > This can be done by setting through > Item: Dir::Cache::Archives > Item: Dir::State::Lists > specified via -o option. > > By setting all these and few more options as needed, we should be > able to download the latest binary package from the archive using the > proven tool. Just to be clear, I was thinking to do the following to get the latest binary package in the previous post: apt-get update apt-get -d install $PACKAGENAME Then go to redirected /var/cache/apt/archives/ location to pick up the latest package. This is just an outline. There may needs to do few more chores to get this working. > Of course, obsolete dpkg-doc from snapshot should use original wgetfiles > > Installation guide: I need to check how it should be handled. > > What do you think? > > Regards, > > Osamu >
Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
What you did is one way ;-) On Mon, Mar 19, 2018 at 03:27:25PM +0100, Laura Arjona Reina wrote: > Hello > I've been doing some tests and this is what I have, for now: > > * the current script is in: > https://anonscm.debian.org/cgit/debwww/cron.git/tree/parts/1ftpfiles > (and it works, not sure when/if it will stop working...). ... > This seems to work (I've run the script in local and later checked that the > files were downloaded in /srv/www.debian.org/cron/ftpfiles), but it needs > improvements, because: > > * I've workarounded the robots.txt with "-e robots=off" but I guess that this > is > not the correct/elegant/respectful way? > > * wget downloads all the files and then removes the ones that don't match the > pattern specified with -A. Maybe there is a more efficient way to do this? Most wgetfiles are downloading the latest unstable binary packages. Why re-invent the binary package downloader when we have "apt-get"? apt-get has -C option to use non-standard /etc/apt/sources.list listing unstable distribution: deb http://deb.debian.org/debian/ sid main contrib non-free We need to use non-standard directories and not to contaminate the system ones: /var/cache/apt/archives/ /var/lib/apt/lists/ This can be done by setting through Item: Dir::Cache::Archives Item: Dir::State::Lists specified via -o option. By setting all these and few more options as needed, we should be able to download the latest binary package from the archive using the proven tool. Of course, obsolete dpkg-doc from snapshot should use original wgetfiles Installation guide: I need to check how it should be handled. What do you think? Regards, Osamu
Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://
Hello I've been doing some tests and this is what I have, for now: * the current script is in: https://anonscm.debian.org/cgit/debwww/cron.git/tree/parts/1ftpfiles (and it works, not sure when/if it will stop working...). * There, changing ftp:// for http:// is not enough, because we were getting files using wildcards (which I guess worked for ftp:// but it does not work for http://). * Relevant code is a function "wgetfiles" which is called in 3 ways: wgetfiles "" "" ftp://${ftpsite}/debian/doc/ 2 doc wgetfiles emacsen-common emacsen-common_*.deb wgetfiles dpkg dpkg-doc_1.9.21_all.deb http://snapshot.debian.org/archive/debian/20050312T00Z/pool/main 7 And the specific wget call inside the function wgetfiles is, currently: wget --timeout=60 --quiet --recursive --timestamping --no-host-directories \ --cut-dirs=${cutdirs} --directory-prefix=${prefix} \ ${ftpurl}/${initial}/${namesrc}/${namebin} I've learned that with wget we can use wildcards with --recursive and -A "pattern" and we should probably add --no-parent to avoid the recursive go to other places. Then I've transformed the wget call into this: wget -e robots=off --no-parent --timeout=60 --recursive --timestamping --no-host-directories \ --cut-dirs=${cutdirs} --directory-prefix=${prefix} \ --reject "*.html*" -A "${namebin}" ${ftpurl}/${initial}/${namesrc}/ (see also the complete diff and the 'new' 1ftpfiles, attached). This seems to work (I've run the script in local and later checked that the files were downloaded in /srv/www.debian.org/cron/ftpfiles), but it needs improvements, because: * I've workarounded the robots.txt with "-e robots=off" but I guess that this is not the correct/elegant/respectful way? * wget downloads all the files and then removes the ones that don't match the pattern specified with -A. Maybe there is a more efficient way to do this? Cheers -- Laura Arjona Reina https://wiki.debian.org/LauraArjona diff --git a/parts/1ftpfiles b/parts/1ftpfiles index f4fcade..3bd4229 100755 --- a/parts/1ftpfiles +++ b/parts/1ftpfiles @@ -14,7 +14,7 @@ wget --timeout=60 --quiet --timestamping http://${ftpsite}/debian/indices/Mainta [ -d $webtopdir/webwml/english/devel/wnpp ] || mkdir -p $webtopdir/webwml/english/devel/wnpp ln -sf $crondir/ftpfiles/Maintainers $webtopdir/webwml/english/devel/wnpp/Maintainers -ftpurlmain=ftp://${ftpsite}/debian/pool/main +ftpurlmain=http://${ftpsite}/debian/pool/main wgetfiles() { namesrc=$1 # source package name: dpkg @@ -24,9 +24,9 @@ cutdirs=${4:-5} # number of / in to drop + 3: default 5 prefix=${5:-pool} # download directory echo -n " ${namesrc}" initial=$(echo "${namesrc}"|sed -e "s/^\(.\).*$/\1/") -wget --timeout=60 --quiet --recursive --timestamping --no-host-directories \ +wget -e robots=off --no-parent --timeout=60 --recursive --timestamping --no-host-directories \ --cut-dirs=${cutdirs} --directory-prefix=${prefix} \ - ${ftpurl}/${initial}/${namesrc}/${namebin} + --reject "*.html*" -A "${namebin}" ${ftpurl}/${initial}/${namesrc}/ } # needed for 7doc_updates @@ -34,7 +34,7 @@ wget --timeout=60 --quiet --recursive --timestamping --no-host-directories \ # Refresh $crondir/ftpfiles/doc rm -rf $crondir/ftpfiles/doc -wgetfiles "" "" ftp://${ftpsite}/debian/doc/ 2 doc +wgetfiles "" "" http://${ftpsite}/debian/doc/ 2 doc # Refresh $crondir/ftpfiles/pool rm -rf $crondir/ftpfiles/pool #!/bin/sh -e # this script fetches some stuff from the FTP stuff that's needed # and puts it in /srv/www.debian.org/cron/ftpfiles . `dirname $0`/../common.sh [ -d $crondir/ftpfiles ] || mkdir -p $crondir/ftpfiles cd $crondir/ftpfiles ftpsite=ftp.de.debian.org # needed for WNPP, webwml/english/devel/wnpp/wnpp.pl wget --timeout=60 --quiet --timestamping http://${ftpsite}/debian/indices/Maintainers [ -d $webtopdir/webwml/english/devel/wnpp ] || mkdir -p $webtopdir/webwml/english/devel/wnpp ln -sf $crondir/ftpfiles/Maintainers $webtopdir/webwml/english/devel/wnpp/Maintainers ftpurlmain=http://${ftpsite}/debian/pool/main wgetfiles() { namesrc=$1 # source package name: dpkg namebin=$2 # binary package name (glob): dpkg-doc_*.deb ftpurl=${3:-$ftpurlmain} # ${ftpsite}/ cutdirs=${4:-5} # number of / in to drop + 3: default 5 prefix=${5:-pool} # download directory echo -n " ${namesrc}" initial=$(echo "${namesrc}"|sed -e "s/^\(.\).*$/\1/") wget -e robots=off --no-parent --timeout=60 --recursive --timestamping --no-host-directories \ --cut-dirs=${cutdirs} --directory-prefix=${prefix} \ --reject "*.html*" -A "${namebin}" ${ftpurl}/${initial}/${namesrc}/ } # needed for 7doc_updates # this is FTP because otherwise we get all those ugly HTML thingies # Refresh $crondir/ftpfiles/doc rm -rf $crondir/ftpfiles/doc wgetfiles "" "" http://${ftpsite}/debian/doc/ 2 doc # Refresh $crondir/ftpfiles/pool rm -rf $crondir/ftpfiles/pool wgetfiles emacsen-common emacsen-common_*.deb wgetfiles