Package: debmirror
Version: 1:2.14
Severity: wishlist
Tags: patch

I'd like to be able to include or exclude packages by arbitrary fields
in the Packages file.  --exclude-deb-section and --limit-priority
provide limited forms of this; I'd like to have something more general.

My particular use case is that at some point soonish I intend to
collapse Ubuntu's "main" and "universe" components down into simply
"main", which would then be properly analogous to Debian main; but I
keep a local mirror on the wrong end of a rather slow ADSL line, and I
would like to be able to mirror something that roughly corresponds to
what I currently mirror, rather than something about five times as
large.  However, I can imagine similar cases in Debian too, particularly
with the Tag field, and when we used to have a Task field it would have
been useful for that too.  I don't think it makes sense to introduce
even more irregularly-named options for specific fields, but I think it
would make sense to have something generalised.

The semantics of this patch might be a matter for debate.  I opted to
take the approach where if you just say --include-field=Foo=bar and
nothing else, then debmirror will only mirror packages matching that
inclusion; I felt this was the most convenient approach.  However I can
imagine something more rsync-like where you have to explicitly say
--exclude-field=Foo= to exclude everything else.  Let me know what you
think.

diff --git a/debmirror b/debmirror
index 9e9158a..9126548 100755
--- a/debmirror
+++ b/debmirror
@@ -291,6 +291,20 @@ science, ...) match the regex. May be used multiple times.
 Limit download to files whose Debian Priority (required, extra,
 optional, ...) match the regex. May be used multiple times.
 
+=item B<--exclude-field>=I<fieldname>=I<regex>
+
+Never download any binary packages where the contents of I<fieldname> match
+the regex. May be used multiple times. If this option is used and the mirror
+includes source packages, only those source packages corresponding to
+included binary packages will be downloaded.
+
+=item B<--include-field>=I<fieldname>=I<regex>
+
+Don't exclude any binary packages where the contents of I<fieldname> match
+the regex. May be used multiple times. If this option is used and the mirror
+includes source packages, only those source packages corresponding to
+included binary packages will be downloaded.
+
 =item B<-t>, B<--timeout>=I<seconds>
 
 Specifies the timeout to use for network operations (either FTP or rsync).
@@ -564,6 +578,7 @@ our ($debug, $progress, $verbose, $passive, $skippackages, 
$getcontents, $i18n);
 our ($ua, $proxy, $ftp);
 our (@dists, @sections, @arches, @ignores, @excludes, @includes, @keyrings);
 our (@excludes_deb_section, @limit_priority);
+our (%excludes_field, %includes_field);
 our (@di_dists, @di_arches, @rsync_extra);
 our $state_cache_days = 0;
 our $verify_checksums = 0;
@@ -687,6 +702,8 @@ GetOptions('debug'                  => \$debug,
           'exclude-deb-section=s'  => \@excludes_deb_section,
           'limit-priority=s'       => \@limit_priority,
           'include=s'              => \@includes,
+          'exclude-field=s'        => \%excludes_field,
+          'include-field=s'        => \%includes_field,
           'skippackages'           => \$skippackages,
           'i18n'                   => \$i18n,
           'getcontents'            => \$getcontents,
@@ -1099,6 +1116,9 @@ say("Parsing Packages and Sources files ...");
   my $exclude_deb_section =
     "(".join("|", @excludes_deb_section).")" if @excludes_deb_section;
   my $limit_priority = "(".join("|", @limit_priority).")" if @limit_priority;
+  my $field_filters =
+    scalar(keys %includes_field) || scalar(keys %excludes_field);
+  my %binaries;
 
   foreach my $file (@package_files) {
     next if (!-f $file);
@@ -1121,6 +1141,9 @@ say("Parsing Packages and Sources files ...");
        next if (defined($limit_priority) && defined($deb_priority)
                 && ! ($deb_priority=~/$limit_priority/o));
       }
+      next if $field_filters && !check_field_filters($_);
+      my ($package)=m/^Package:\s+(.*)/im;
+      $binaries{$package} = 1;
       # File was listed in state cache, or file occurs multiple times
       if (exists $files{$filename}) {
        if ($files{$filename} >= 0) {
@@ -1148,9 +1171,10 @@ say("Parsing Packages and Sources files ...");
     }
     close(FILE);
   }
-SOURCE:  foreach my $file (@source_files) {
+  foreach my $file (@source_files) {
     next if (!-f $file);
     open(FILE, "<", $file) or die "$file: $!";
+SOURCE:
     for (;;) {
       my $stanza;
       unless (defined( $stanza = <FILE> )) {
@@ -1186,6 +1210,19 @@ SOURCE:  foreach my $file (@source_files) {
                      next SOURCE if (defined($limit_priority) && 
defined($deb_priority)
                                && ! ($deb_priority=~/$limit_priority/o));
              }
+             elsif ($line=~/^Binary:\s+(.*)/i) {
+                     if ($field_filters) {
+                             my @binary_names=split(/\s*,\s*/,$1);
+                             my $fetching_binary=0;
+                             for my $binary_name (@binary_names) {
+                                     if (exists $binaries{$binary_name}) {
+                                             $fetching_binary=1;
+                                             last;
+                                     }
+                             }
+                             next SOURCE unless $fetching_binary;
+                     }
+             }
              elsif ($line=~/^Files:/i) {
                      $parse_source_files->("MD5Sum");
              }
@@ -1488,6 +1525,26 @@ sub add_bytes_gotten {
   }
 }
 
+# Return true if a package stanza is permitted by
+# --include-field/--exclude-field.
+sub check_field_filters {
+  my $stanza = shift;
+  for my $name (keys %includes_field) {
+    if ($stanza=~/^\Q$name\E:\s+(.*)/im) {
+      my $value=$1;
+      return 1 if $value=~/$includes_field{$name}/;
+    }
+  }
+  return 0 if keys %includes_field;
+  for my $name (keys %excludes_field) {
+    if ($stanza=~/^\Q$name\E:\s+(.*)/im) {
+      my $value=$1;
+      return 0 if $value=~/$excludes_field{$name}/;
+    }
+  }
+  return 1;
+}
+
 # Takes named parameters: filename, size.
 # 
 # Optionally can also be passed parameters specifying expected checksums

Thanks,

-- 
Colin Watson                                       [cjwat...@ubuntu.com]


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to