[PATCH v2 1/1] import-tars: read overlong names from pax extended header

2018-05-23 Thread Pedro Alvarez
From: Pedro Alvarez Piedehierro <palvare...@gmail.com>

Importing gcc tarballs[1] with import-tars script (in contrib) fails
when hitting a pax extended header.

Make sure we always read the extended attributes from the pax entries,
and store the 'path' value if found to be used in the next ustar entry.

The code to parse pax extended headers was written consulting the Pax
Pax Interchange Format documentation [2].

[1] http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.xz
[2] 
https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current=tar=5

Signed-off-by: Pedro Alvarez <palvare...@gmail.com>
---
 contrib/fast-import/import-tars.perl | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/contrib/fast-import/import-tars.perl 
b/contrib/fast-import/import-tars.perl
index d60b4315ed..e800d9f5c9 100755
--- a/contrib/fast-import/import-tars.perl
+++ b/contrib/fast-import/import-tars.perl
@@ -63,6 +63,8 @@ foreach my $tar_file (@ARGV)
my $have_top_dir = 1;
my ($top_dir, %files);
 
+   my $next_path = '';
+
while (read(I, $_, 512) == 512) {
my ($name, $mode, $uid, $gid, $size, $mtime,
$chksum, $typeflag, $linkname, $magic,
@@ -70,6 +72,13 @@ foreach my $tar_file (@ARGV)
$prefix) = unpack 'Z100 Z8 Z8 Z8 Z12 Z12
Z8 Z1 Z100 Z6
Z2 Z32 Z32 Z8 Z8 Z*', $_;
+
+   unless ($next_path eq '') {
+   # Recover name from previous extended header
+   $name = $next_path;
+   $next_path = '';
+   }
+
last unless length($name);
if ($name eq '././@LongLink') {
# GNU tar extension
@@ -90,13 +99,31 @@ foreach my $tar_file (@ARGV)
Z8 Z1 Z100 Z6
Z2 Z32 Z32 Z8 Z8 Z*', $_;
}
-   next if $name =~ m{/\z};
$mode = oct $mode;
$size = oct $size;
$mtime = oct $mtime;
next if $typeflag == 5; # directory
 
-   if ($typeflag != 1) { # handle hard links later
+   if ($typeflag eq 'x') { # extended header
+   # If extended header, check for path
+   my $pax_header = '';
+   while ($size > 0 && read(I, $_, 512) == 512) {
+   $pax_header = $pax_header . substr($_, 0, 
$size);
+   $size -= 512;
+   }
+
+   my @lines = split /\n/, $pax_header;
+   foreach my $line (@lines) {
+   my ($len, $entry) = split / /, $line;
+   my ($key, $value) = split /=/, $entry;
+   if ($key eq 'path') {
+   $next_path = $value;
+   }
+   }
+   next;
+   } elsif ($name =~ m{/\z}) { # directory
+   next;
+   } elsif ($typeflag != 1) { # handle hard links later
print FI "blob\n", "mark :$next_mark\n";
if ($typeflag == 2) { # symbolic link
print FI "data ", length($linkname), "\n",
-- 
2.11.0



[PATCH v2 0/1] import-tars: read overlong names from pax extended header

2018-05-23 Thread Pedro Alvarez
From: Pedro Alvarez Piedehierro <palvare...@gmail.com>

Hello!

In this version I've trimmed and improved the commit message as suggested.

Regarding the error handling, as Jeff mentioned, could be improved
in general in the entire script. But I guess I could do it if needed
to get this patch approved.

Thanks for reviewing and giving me some feedback!

Pedro.

Pedro Alvarez Piedehierro (1):
  import-tars: read overlong names from pax extended header

 contrib/fast-import/import-tars.perl | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

-- 
2.11.0



[PATCH] Add initial support for pax extended attributes

2018-05-22 Thread Pedro Alvarez
From: Pedro Alvarez Piedehierro <palvare...@gmail.com>

Sometimes the tar files will contain pax extended attributes to deal
with cases where the information needed doesn't fit in a standard
ustar entry.

One of these cases is when the path is larger than 100 characters. A
pax entry will appear containing two standard ustart entries. The first
entry will have an 'x' typeflag, and contain the the extended attributes.

The pax extended attributes contain one or multiple records constructed as
follows:

"%d %s=%s\n", , , 

This commit makes sure that we always read the extended attibutes from
pax entries, and in the case of finding one, we parse its records
looking for 'path' information. If this information is found, it's
stored to be used in the next ustar entry.

Information about the Pax Interchange Format can be found at:


https://www.freebsd.org/cgi/man.cgi?manpath=FreeBSD+8-current=tar=5.

Before this change, importing gcc tarballs[1] would fail with the
following error:

fast-import crash report:
fast-import process: 82899
parent process : 82897
at 2018-05-21 12:35:27 +

fatal: Unsupported command: 29 atime=1516870168.93527949

Most Recent Commands Before Crash
-
  M 644 :22495 
gcc-7.3.0/libstdc++-v3/testsuite/20_util/duration/PaxHeaders.4467/comparison_operators
  M 644 :140367 gcc-7.3.0/gcc/ada/s-gloloc-mingw.adb
  M 644 :75143 
gcc-7.3.0/gcc/testsuite/gcc.c-torture/execute/builtins/PaxHeaders.4467/strncat-chk-lib.c

  

  M 644 :135585 
gcc-7.3.0/gcc/testsuite/c-c++-common/attr-warn-unused-result.c
  M 644 :54956 
gcc-7.3.0/gcc/testsuite/go.test/test/fixedbugs/PaxHeaders.4467/bug335.dir
  M 644 :20632 27 mtime=1483272463.905435
* 29 atime=1516870168.93527949

[1]: http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/gcc-7.3.0.tar.xz

Signed-off-by: Pedro Alvarez <palvare...@gmail.com>
---
 contrib/fast-import/import-tars.perl | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/contrib/fast-import/import-tars.perl 
b/contrib/fast-import/import-tars.perl
index d60b4315ed..c2e54ec7a3 100755
--- a/contrib/fast-import/import-tars.perl
+++ b/contrib/fast-import/import-tars.perl
@@ -63,6 +63,8 @@ foreach my $tar_file (@ARGV)
my $have_top_dir = 1;
my ($top_dir, %files);
 
+   my $next_path = '';
+
while (read(I, $_, 512) == 512) {
my ($name, $mode, $uid, $gid, $size, $mtime,
$chksum, $typeflag, $linkname, $magic,
@@ -70,6 +72,13 @@ foreach my $tar_file (@ARGV)
$prefix) = unpack 'Z100 Z8 Z8 Z8 Z12 Z12
Z8 Z1 Z100 Z6
Z2 Z32 Z32 Z8 Z8 Z*', $_;
+
+   unless ($next_path eq '') {
+   # Recover name from previous extended header
+   $name = $next_path;
+   $next_path = '';
+   }
+
last unless length($name);
if ($name eq '././@LongLink') {
# GNU tar extension
@@ -90,13 +99,32 @@ foreach my $tar_file (@ARGV)
Z8 Z1 Z100 Z6
Z2 Z32 Z32 Z8 Z8 Z*', $_;
}
-   next if $name =~ m{/\z};
$mode = oct $mode;
$size = oct $size;
$mtime = oct $mtime;
next if $typeflag == 5; # directory
 
-   if ($typeflag != 1) { # handle hard links later
+   if ($typeflag eq 'x') { # extended header
+   # If extended header, check for path
+   my $pax_header = '';
+   while ($size > 0 && read(I, $_, 512) == 512) {
+   $pax_header = $pax_header . substr($_, 0, 
$size);
+   $size -= 512;
+   }
+
+   my @lines = split /\n/, $pax_header;
+   foreach my $line (@lines) {
+   my ($len, $entry) = split / /, $line;
+   my ($key, $value) = split /=/, $entry;
+   if ($key eq 'path') {
+   $next_path = $value;
+   }
+   }
+   next;
+   } elsif ($name =~ m{/\z}) {
+   # If it's a folder, ignore
+   next;
+   } elsif ($typeflag != 1) { # handle hard links later
print FI "blob\n", "mark :$next_mark\n";
if ($typeflag == 2) { # symbolic link
print FI "data ", length($linkname), "\n",
-- 
2.11.0