Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-25 Thread Osamu Aoki
Hi,
On Thu, Mar 22, 2018 at 10:08:58PM +0800, Paul Wise wrote:
> On Wed, Mar 21, 2018 at 11:50 PM, Osamu Aoki wrote:
> 
> > One negative point of above approach is that it always downloads doc deb
> > packages even if there are no changes.
> 
> This will prevent downloading the same package twice:
> 
> chdist apt old install --download-only emacsen-common

Oh, that's it.  I should have known ... this is simple enough.
(I always had real chroot systems  so never used it)

> > Also unpacking code is still complicated to make it future proof.
> 
> Hmm, do you have any more details? Is dpkg -x not enough?

Probably enough.

> > Then I thought it may be good idea to create very small Debian unstable
> > chroot.  Then install pertinent doc packages.  For old deb from
> > snapshot, we can wget and dpkg -i.  Then this chroot can be updated and
> > dist-upgraded.  This causes to download all base package updates but
> > skips repeated download of doc packages.  Hmmm... which is worse?
> 
> A chroot needs root so probably isn't going to work here.

That's fair.

So now agree.  Thanks for the idea.

Osamu



Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-22 Thread Paul Wise
On Wed, Mar 21, 2018 at 11:50 PM, Osamu Aoki wrote:

> One negative point of above approach is that it always downloads doc deb
> packages even if there are no changes.

This will prevent downloading the same package twice:

chdist apt old install --download-only emacsen-common

> Also unpacking code is still complicated to make it future proof.

Hmm, do you have any more details? Is dpkg -x not enough?

> Then I thought it may be good idea to create very small Debian unstable
> chroot.  Then install pertinent doc packages.  For old deb from
> snapshot, we can wget and dpkg -i.  Then this chroot can be updated and
> dist-upgraded.  This causes to download all base package updates but
> skips repeated download of doc packages.  Hmmm... which is worse?

A chroot needs root so probably isn't going to work here.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-21 Thread Osamu Aoki
Hi,

On Wed, Mar 21, 2018 at 11:26:23AM +0800, Paul Wise wrote:
> On Tue, Mar 20, 2018 at 10:55 PM, Osamu Aoki wrote:
> 
> > apt-get has -C option to use non-standard /etc/apt/sources.list
> > listing unstable distribution:
> 
> This is the wrong way to completely override the system apt
> configuration. The right way is to set the APT_CONFIG environment
> variable. See the apt.conf(5) manual page, APT_CONFIG is the first
> method that apt uses to find the config file and the -C command-line
> option is the very last one so it cannot prevent loading all the
> system config. The chdist tool (from devscripts), the apt-venv package
> and the derivatives census code get this correct.

Great point!
 
> I definitely agree that apt-get could be used to download packages
> from repositories, especially since it will verify hashes and
> signatures.

One negative point of above approach is that it always downloads doc deb
packages even if there are no changes.  Also unpacking code is still
complicated to make it future proof.

Then I thought it may be good idea to create very small Debian unstable
chroot.  Then install pertinent doc packages.  For old deb from
snapshot, we can wget and dpkg -i.  Then this chroot can be updated and
dist-upgraded.  This causes to download all base package updates but
skips repeated download of doc packages.  Hmmm... which is worse?

I think that chroot may need to mount few parent filesystem points.
 mounting /proc filesystem
 mounting /sys filesystem
 creating /{dev,run}/shm
 mounting /dev/pts filesystem
 redirecting /dev/ptmx to /dev/pts/ptmx

Osamu



Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-20 Thread Paul Wise
On Tue, Mar 20, 2018 at 10:55 PM, Osamu Aoki wrote:

> apt-get has -C option to use non-standard /etc/apt/sources.list
> listing unstable distribution:

This is the wrong way to completely override the system apt
configuration. The right way is to set the APT_CONFIG environment
variable. See the apt.conf(5) manual page, APT_CONFIG is the first
method that apt uses to find the config file and the -C command-line
option is the very last one so it cannot prevent loading all the
system config. The chdist tool (from devscripts), the apt-venv package
and the derivatives census code get this correct.

I definitely agree that apt-get could be used to download packages
from repositories, especially since it will verify hashes and
signatures.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-20 Thread Osamu Aoki
Hi,

On Tue, Mar 20, 2018 at 11:55:01PM +0900, Osamu Aoki wrote:
> What you did is one way ;-)
> 
> On Mon, Mar 19, 2018 at 03:27:25PM +0100, Laura Arjona Reina wrote:
...
> > pattern specified with -A. Maybe there is a more efficient way to do this?
> 
> Most wgetfiles are downloading the latest unstable binary packages.
> Why re-invent the binary package downloader when we have "apt-get"?
> 
> apt-get has -C option to use non-standard /etc/apt/sources.list
> listing unstable distribution:
> 
> deb http://deb.debian.org/debian/ sid main contrib non-free
> 
> We need to use non-standard directories and not to contaminate the
> system ones:
>   /var/cache/apt/archives/
>   /var/lib/apt/lists/
> 
> This can be done by setting through
>   Item: Dir::Cache::Archives
>   Item: Dir::State::Lists
> specified via -o option.
> 
> By setting all these and few more options as needed, we should be
> able to download the latest binary package from the archive using the
> proven tool.

Just to be clear, I was thinking to do the following to get the latest
binary package in the previous post:

 apt-get update
 apt-get -d install $PACKAGENAME

Then go to redirected /var/cache/apt/archives/ location to pick up the
latest package.

This is just an outline.  There may needs to do few more chores to get
this working.

> Of course, obsolete dpkg-doc from snapshot should use original wgetfiles
> 
> Installation guide: I need to check how it should be handled.
> 
> What do you think?
> 
> Regards,
> 
> Osamu
> 



Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-20 Thread Osamu Aoki
What you did is one way ;-)

On Mon, Mar 19, 2018 at 03:27:25PM +0100, Laura Arjona Reina wrote:
> Hello
> I've been doing some tests and this is what I have, for now:
> 
> * the current script is in:
> https://anonscm.debian.org/cgit/debwww/cron.git/tree/parts/1ftpfiles
> (and it works, not sure when/if it will stop working...).
...
> This seems to work (I've run the script in local and later checked that the
> files were downloaded in /srv/www.debian.org/cron/ftpfiles), but it needs
> improvements, because:
> 
> * I've workarounded the robots.txt with "-e robots=off" but I guess that this 
> is
> not the correct/elegant/respectful way?
> 
> * wget downloads all the files and then removes the ones that don't match the
> pattern specified with -A. Maybe there is a more efficient way to do this?

Most wgetfiles are downloading the latest unstable binary packages.
Why re-invent the binary package downloader when we have "apt-get"?

apt-get has -C option to use non-standard /etc/apt/sources.list
listing unstable distribution:

deb http://deb.debian.org/debian/ sid main contrib non-free

We need to use non-standard directories and not to contaminate the
system ones:
  /var/cache/apt/archives/
  /var/lib/apt/lists/

This can be done by setting through
  Item: Dir::Cache::Archives
  Item: Dir::State::Lists
specified via -o option.

By setting all these and few more options as needed, we should be
able to download the latest binary package from the archive using the
proven tool.

Of course, obsolete dpkg-doc from snapshot should use original wgetfiles

Installation guide: I need to check how it should be handled.

What do you think?

Regards,

Osamu



Re: Bug#893397: adapt the cron scripts to use http:// instead of ftp://

2018-03-19 Thread Laura Arjona Reina
Hello
I've been doing some tests and this is what I have, for now:

* the current script is in:
https://anonscm.debian.org/cgit/debwww/cron.git/tree/parts/1ftpfiles
(and it works, not sure when/if it will stop working...).

* There, changing ftp:// for http:// is not enough, because we were getting
files using wildcards (which I guess worked for ftp:// but it does not work for
http://).

* Relevant code is a function "wgetfiles" which is called in 3 ways:

wgetfiles "" ""  ftp://${ftpsite}/debian/doc/ 2 doc
wgetfiles emacsen-common emacsen-common_*.deb
wgetfiles dpkg dpkg-doc_1.9.21_all.deb
http://snapshot.debian.org/archive/debian/20050312T00Z/pool/main 7

And the specific wget call inside the function wgetfiles is, currently:

wget --timeout=60 --quiet --recursive  --timestamping --no-host-directories \
  --cut-dirs=${cutdirs} --directory-prefix=${prefix} \
  ${ftpurl}/${initial}/${namesrc}/${namebin}

I've learned that with wget we can use wildcards with --recursive and -A
"pattern" and we should probably add --no-parent to avoid the recursive go to
other places.

Then I've transformed the wget call into this:

wget -e robots=off --no-parent --timeout=60  --recursive  --timestamping
--no-host-directories \
 --cut-dirs=${cutdirs} --directory-prefix=${prefix} \
 --reject "*.html*" -A "${namebin}" ${ftpurl}/${initial}/${namesrc}/

(see also the complete diff and the 'new' 1ftpfiles, attached).

This seems to work (I've run the script in local and later checked that the
files were downloaded in /srv/www.debian.org/cron/ftpfiles), but it needs
improvements, because:

* I've workarounded the robots.txt with "-e robots=off" but I guess that this is
not the correct/elegant/respectful way?

* wget downloads all the files and then removes the ones that don't match the
pattern specified with -A. Maybe there is a more efficient way to do this?


Cheers

-- 
Laura Arjona Reina
https://wiki.debian.org/LauraArjona
diff --git a/parts/1ftpfiles b/parts/1ftpfiles
index f4fcade..3bd4229 100755
--- a/parts/1ftpfiles
+++ b/parts/1ftpfiles
@@ -14,7 +14,7 @@ wget --timeout=60 --quiet --timestamping http://${ftpsite}/debian/indices/Mainta
 [ -d $webtopdir/webwml/english/devel/wnpp ] || mkdir -p $webtopdir/webwml/english/devel/wnpp
 ln -sf $crondir/ftpfiles/Maintainers $webtopdir/webwml/english/devel/wnpp/Maintainers
 
-ftpurlmain=ftp://${ftpsite}/debian/pool/main
+ftpurlmain=http://${ftpsite}/debian/pool/main
 wgetfiles()
 {
 namesrc=$1 # source package name:  dpkg
@@ -24,9 +24,9 @@ cutdirs=${4:-5} # number of / in  to drop + 3: default 5
 prefix=${5:-pool} # download directory
 echo -n " ${namesrc}"
 initial=$(echo "${namesrc}"|sed -e "s/^\(.\).*$/\1/")
-wget --timeout=60 --quiet --recursive  --timestamping --no-host-directories \
+wget -e robots=off --no-parent --timeout=60  --recursive  --timestamping --no-host-directories \
  --cut-dirs=${cutdirs} --directory-prefix=${prefix} \
- ${ftpurl}/${initial}/${namesrc}/${namebin}
+ --reject "*.html*" -A "${namebin}" ${ftpurl}/${initial}/${namesrc}/
 }
 
 # needed for 7doc_updates
@@ -34,7 +34,7 @@ wget --timeout=60 --quiet --recursive  --timestamping --no-host-directories \
 
 # Refresh $crondir/ftpfiles/doc
 rm -rf $crondir/ftpfiles/doc
-wgetfiles "" ""  ftp://${ftpsite}/debian/doc/ 2 doc
+wgetfiles "" ""  http://${ftpsite}/debian/doc/ 2 doc
 
 # Refresh $crondir/ftpfiles/pool
 rm -rf $crondir/ftpfiles/pool
#!/bin/sh -e

# this script fetches some stuff from the FTP stuff that's needed
# and puts it in /srv/www.debian.org/cron/ftpfiles

. `dirname $0`/../common.sh

[ -d $crondir/ftpfiles ] || mkdir -p $crondir/ftpfiles
cd $crondir/ftpfiles
ftpsite=ftp.de.debian.org

# needed for WNPP, webwml/english/devel/wnpp/wnpp.pl
wget --timeout=60 --quiet --timestamping 
http://${ftpsite}/debian/indices/Maintainers
[ -d $webtopdir/webwml/english/devel/wnpp ] || mkdir -p 
$webtopdir/webwml/english/devel/wnpp
ln -sf $crondir/ftpfiles/Maintainers 
$webtopdir/webwml/english/devel/wnpp/Maintainers

ftpurlmain=http://${ftpsite}/debian/pool/main
wgetfiles()
{
namesrc=$1 # source package name:  dpkg
namebin=$2 # binary package name (glob):   dpkg-doc_*.deb
ftpurl=${3:-$ftpurlmain}  # ${ftpsite}/
cutdirs=${4:-5} # number of / in  to drop + 3: default 5
prefix=${5:-pool} # download directory
echo -n " ${namesrc}"
initial=$(echo "${namesrc}"|sed -e "s/^\(.\).*$/\1/")
wget -e robots=off --no-parent --timeout=60  --recursive  --timestamping 
--no-host-directories \
 --cut-dirs=${cutdirs} --directory-prefix=${prefix} \
 --reject "*.html*" -A "${namebin}" ${ftpurl}/${initial}/${namesrc}/
}

# needed for 7doc_updates
# this is FTP because otherwise we get all those ugly HTML thingies

# Refresh $crondir/ftpfiles/doc
rm -rf $crondir/ftpfiles/doc
wgetfiles "" ""  http://${ftpsite}/debian/doc/ 2 doc

# Refresh $crondir/ftpfiles/pool
rm -rf $crondir/ftpfiles/pool
wgetfiles emacsen-common emacsen-common_*.deb
wgetfiles