Re: [Bug-tar] Faster excludes

2009-08-04 Thread Carl Worth
On Sat, 2009-08-01 at 01:46 +0100, Phil Proudman wrote:
 I've just updated the patch for tar-1.22. Please find attached.

Thanks, Phil!

Phil and Sergey, I just noticed I dropped bug-tar@gnu.org from the CC
list. That was accidental.

Anyone looking for Phil's updated patch can find it here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=221482#21

-Carl



signature.asc
Description: This is a digitally signed message part


[Bug-tar] --backup can destroy an extracted member when followed by an existing directory

2009-08-04 Thread Carl Worth
On Mon, 08 Dec 2008 13:41:57 -0500 Eric Lammerts wrote:
 When extracting, tar sometimes renames a backup back to the original name
 (overwriting the newly-extracted file) for no good reason.
 
 Example:
 # mkdir dir1 dir2
 # echo bla  dir1/file1
 # tar cf test.tar dir1 dir2
 # tar xfv test.tar --backup
 dir1/
 dir1/file1
 Renaming `dir1/file1' to `dir1/file1~'
 dir2/
 Renaming `dir1/file1~' back to `dir1/file1'
 
 This problem is made worse by the fact that if you do the above with tar
 xf instead of tar xfv, tar doesn't print anything at all (and exits
 with status 0)!! Failure to extract a file should be an error, not just a
 message printed only in verbose mode.

Hi Eric,

Thank you for the bug report against the Debian tar package. I've
verified that this bug also exists in tar version 1.22-1.1 as packaged
by Debian.

I've traced through the code a bit and the problem is due to the
handling of directory entries in extract.c:extract_archive(). Here is
what happens when processing dir2 in the above scenario:

1. maybe_backup_file() returns true, but without changing
   after_backup_file (which was set previously to dir1/file1~)

2. extract_dir(), (called through *fun) returns -1 since the
   directory already exists

3. after detecting that an archive member failed to extract, the code
   calls undo_last_backup() to restore the backup. However, at this
   point the file names after_backup_file and before_backup_file still
   correspond to dir1/file1~ and dir1/file1 due to step #1.

So step 1 is definitely a bug, and step 2 might potentially be a bug as
well. I've attached a simple patch to fix the bug in step 1. Sergey,
perhaps you'll have a cleaner fix. Please let me know.

-Carl


From 0465ed84d91583387763ab5ed016d1b58abc1ad2 Mon Sep 17 00:00:00 2001
From: Carl Worth cwo...@cworth.org
Date: Tue, 4 Aug 2009 13:34:52 -0700
Subject: [PATCH] maybe_backup_file: Clear previously-set after_backup_name

Without this, under various conditions, (including file_name being
a directory), maybe_backup_file would return true without having
done anything, nor having set after_backup_name. Then, if any
error was detected while extracting the member, (such as directory
already exists), the previously backed-up file would be restored,
potentially destroying user data.

This closes Debian bug #508199, with thanks to Eric Lammerts for
reporting it with a concise test case.
---
 debian/changelog |3 ++-
 src/misc.c   |4 
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index 1e26c81..daa0223 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -7,8 +7,9 @@ tar (1.22-1.2) UNRELEASED; urgency=low
 Thanks to Ted T'so for the idea and Sergey Poznyakoff for
 cleaning up my original implementation.
   * Respect DEB_BUILD_OPTIONS=nocheck to conform with Policy 3.8.2
+  * Avoid incorrectly restoring old backup file, closes: #508199
 
- -- Carl Worth cwo...@cworth.org  Tue, 04 Aug 2009 12:07:06 -0700
+ -- Carl Worth cwo...@cworth.org  Tue, 04 Aug 2009 13:36:56 -0700
 
 tar (1.22-1.1) unstable; urgency=low
 
diff --git a/src/misc.c b/src/misc.c
index 951449e..c400bcd 100644
--- a/src/misc.c
+++ b/src/misc.c
@@ -422,6 +422,10 @@ maybe_backup_file (const char *file_name, bool this_is_the_archive)
   if (this_is_the_archive  _remdev (file_name))
 return true;
 
+  /* Ensure that no previously backed-up file remains in case we
+   * return early. */
+  assign_string (after_backup_name, 0);
+
   if (stat (file_name, file_stat))
 {
   if (errno == ENOENT)
-- 
1.6.3.3



signature.asc
Description: This is a digitally signed message part


[Bug-tar] Avoid undefined behavior of passing NULL to strcmp

2009-08-04 Thread Carl Worth
On Mon, 19 Feb 2007 12:52:12 +0100, Raúl Sánchez Siles wrote:
 To reproduce the error do this:
 
 1- echo filename  filelist
 2- echo  filelist
 3- tar xvfz tarfile.tgz -T filelist
 
 Note: its not necessary that filename exists.

 When you use -T, the contents of the file is dumped into the argv
 vector. Each line should contain a file name, optionally with its path.
 
 You will have a segment violation because on line 421 of lib/getopt.c
 you have:
 
 if (d-optind != argc  !strcmp (argv[d-optind], --))

Hi Raúl,

Thank you very much for the bug report.

As it turns out, the bug may or may not a segmentation fault. To be
precise, the code is passing a NULL pointer to strcmp, which is
undefined behavior according the the relevant specification. So
conforming implementations can do anything here, (including a
segmentation fault as well as silently treating NULL as an empty
string). My system, for example, seems to behave with the
silent-treat-NULL-like-empty-string behavior.

Regardless, this undefined behavior is something the program should
avoid doing.

 There are two possible options as I see it.
 
 1- Changing that line to: 
 if (d-optind != argc  argv[d-optind]  !strcmp (argv[d-optind], --))
 
 2- Fix the parsing of -T so an empty line won't be included in the dump
 to argv vector.

I've attached a patch for #1. Doing #2 looked like a bit more work to
me, and I'm lazy that way, (but Sergey might want to do something about
it).

 I tried to report this upstream but bug reporting is limited to memebers
 of the project.

I don't know if anything has changed with respect to bug reporting for
upstream tar in the past couple of years. But I've started reporting
bugs recently, and have had no problem doing so without any membership.
I simply send bug reports to bug-tar@gnu.org, (as I am doing with this
email), and I've found upstream to be particularly responsive to
high-quality bug reports, (thanks, Sergey!).

Happy hacking,

-Carl

From 0328eb91a46840bf5020a7f627aa5664a2948621 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Ra=C3=BAl=20S=C3=A1nchez=20Siles?= r...@barracuda.es
Date: Tue, 4 Aug 2009 14:17:12 -0700
Subject: [PATCH] Avoid undefined behavior of passing NULL to strcmp

Having a blank line in the file passed to the -T option will cause
an entry in the argv array to be NULL. To avoid invoking undefined
behavior, we take care not to pass these NULL values to strcmp.
---
 debian/changelog |4 +++-
 lib/getopt.c |2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index 1e26c81..3238ff3 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -7,8 +7,10 @@ tar (1.22-1.2) UNRELEASED; urgency=low
 Thanks to Ted T'so for the idea and Sergey Poznyakoff for
 cleaning up my original implementation.
   * Respect DEB_BUILD_OPTIONS=nocheck to conform with Policy 3.8.2
+  * Avoid undefined behavior of passing NULL to strcmp, closes: #411485
+Thanks to Raúl Sánchez Siles for proposing this patch.
 
- -- Carl Worth cwo...@cworth.org  Tue, 04 Aug 2009 12:07:06 -0700
+ -- Carl Worth cwo...@cworth.org  Tue, 04 Aug 2009 14:16:09 -0700
 
 tar (1.22-1.1) unstable; urgency=low
 
diff --git a/lib/getopt.c b/lib/getopt.c
index f1e6d1f..f1c0e1f 100644
--- a/lib/getopt.c
+++ b/lib/getopt.c
@@ -413,7 +413,7 @@ _getopt_internal_r (int argc, char **argv, const char *optstring,
 	 then exchange with previous non-options as if it were an option,
 	 then skip everything else like a non-option.  */
 
-  if (d-optind != argc  !strcmp (argv[d-optind], --))
+  if (d-optind != argc  argv[d-optind]  !strcmp (argv[d-optind], --))
 	{
 	  d-optind++;
 
-- 
1.6.3.3



signature.asc
Description: This is a digitally signed message part


[Bug-tar] [PATCH] Clarify default values for --same-owner and --no-same-owner

2009-08-04 Thread Carl Worth
Most other options which have different defaults for superuser and
ordinary users already say as much. Fix these to match.
---
 src/tar.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/tar.c b/src/tar.c
index e10b804..309f9d7 100644
--- a/src/tar.c
+++ b/src/tar.c
@@ -481,9 +481,9 @@ static struct argp_option options[] = {
   {touch, 'm', 0, 0,
N_(don't extract file modified time), GRID+1 },
   {same-owner, SAME_OWNER_OPTION, 0, 0,
-   N_(try extracting files with the same ownership), GRID+1 },
+   N_(try extracting files with the same ownership as exists in the archive 
(default for superuser)), GRID+1 },
   {no-same-owner, NO_SAME_OWNER_OPTION, 0, 0,
-   N_(extract files as yourself), GRID+1 },
+   N_(extract files as yourself (default for ordinary users)), GRID+1 },
   {numeric-owner, NUMERIC_OWNER_OPTION, 0, 0,
N_(always use numbers for user/group names), GRID+1 },
   {preserve-permissions, 'p', 0, 0,
-- 
1.6.3.3





[Bug-tar] Preserve timestamp of symlinks when extracting (if utimensat available)

2009-08-04 Thread Carl Worth
On Sun, 12 Jun 2005 14:22:53 -0400, D Goel wrote:
 Let's try the tar-trick to copy the contents of dir1 to dir2: 
 (cd dir1  tar -clpsf- *) | (cd dir2  tar -xpsf -)
 
 This copies the contents, including correct timestamps and ownership,
 except that if there is a symlink (named, say s) in dir1.  Then the
 copied file dir2/s fails to get the same timestamp as dir1/s. 

Hi DG,

Thank you for the bug report. I've confirmed that this bug still exists
in the latest version of tar packaged in Debian (1.22-1.1). And I almost
replied saying that Linux doesn't provide the necessary functionality to
do what you want, (I noticed that cp -a also exhibits the same
bug---see bug #26766).

But rsync does not have the bug, so I looked a little closer and
found... the necessary magic is in the flags argument to the POSIX
utimensat function call:

   The flags field is a bit mask that may be 0, or include  the  following
   constant, defined in fcntl.h:

   AT_SYMLINK_NOFOLLOW
  If  pathname  specifies  a  symbolic link, then update the time‐
  stamps of the link, rather than the file to which it refers.

The tar program already prefers to call utimensat if available, but
currently always passes 0 for the flags argument. It also avoids ever
calling this function for symlinks, (which would update the timestamp of
the target to the archived time stamp of the link which is certainly not
desired).

I've attached a patch that fixes the bug when utimensat is available,
and should have no effect when the function is not available, (though I
didn't test on any such system). I also didn't write a new test case for
this behavior, (I'm not sure how to make a test case that would not be
expected to pass on some systems.)

Any feedback would be appreciated,

-Carl



From 2fe8827ad1874abf3295a5aa96eba18372b98c12 Mon Sep 17 00:00:00 2001
From: Carl Worth cwo...@cworth.org
Date: Tue, 4 Aug 2009 17:00:35 -0700
Subject: [PATCH] Preserve timestamp of symlinks when extracting (if utimensat available)

If the utimensat function is not available, then do nothing with
symlink time stamps, (which is the same as the current code).
---
 lib/utimens.c |   26 +++
 lib/utimens.h |4 +-
 src/extract.c |   61 +++-
 src/misc.c|2 +-
 4 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/lib/utimens.c b/lib/utimens.c
index 708de10..ae8b0a6 100644
--- a/lib/utimens.c
+++ b/lib/utimens.c
@@ -72,11 +72,16 @@ struct utimbuf
use just futimes (or equivalent) instead of utimes (or equivalent),
and fail if on an old system without futimes (or equivalent).
If TIMESPEC is null, set the time stamps to the current time.
+   If the file is a symlink and IS_SYMLINK is set, then the
+   time stamps of the symlink itself will be updated if
+   possible, (but if not supported by the operating system
+   then no change will occur).
Return 0 on success, -1 (setting errno) on failure.  */
 
 int
 gl_futimens (int fd ATTRIBUTE_UNUSED,
-	 char const *file, struct timespec const timespec[2])
+	 char const *file, struct timespec const timespec[2],
+	 int is_symlink)
 {
   /* Some Linux-based NFS clients are buggy, and mishandle time stamps
  of files in NFS file systems in some cases.  We have no
@@ -102,7 +107,8 @@ gl_futimens (int fd ATTRIBUTE_UNUSED,
 #if HAVE_UTIMENSAT
   if (fd  0)
 {
-  int result = utimensat (AT_FDCWD, file, timespec, 0);
+  int flags = is_symlink ? AT_SYMLINK_NOFOLLOW : 0;
+  int result = utimensat (AT_FDCWD, file, timespec, flags);
 # ifdef __linux__
   /* Work around what might be a kernel bug:
  http://bugzilla.redhat.com/442352
@@ -119,6 +125,12 @@ gl_futimens (int fd ATTRIBUTE_UNUSED,
 return result;
 }
 #endif
+
+  /* Without utimensat we have no way to update a symlink rather than
+   * the target, so just return immediately. */
+  if (is_symlink)
+  return 0;
+
 #if HAVE_FUTIMENS
   {
 int result = futimens (fd, timespec);
@@ -219,9 +231,13 @@ gl_futimens (int fd ATTRIBUTE_UNUSED,
 }
 
 /* Set the access and modification time stamps of FILE to be
-   TIMESPEC[0] and TIMESPEC[1], respectively.  */
+   TIMESPEC[0] and TIMESPEC[1], respectively.
+   If the file is a symlink and is_symlink is set, then the
+   time stamps of the symlink itself will be updated if
+   possible, (but if not supported by the operating system
+   then no change will occur). */
 int
-utimens (char const *file, struct timespec const timespec[2])
+utimens (char const *file, struct timespec const timespec[2], int is_symlink)
 {
-  return gl_futimens (-1, file, timespec);
+  return gl_futimens (-1, file, timespec, is_symlink);
 }
diff --git a/lib/utimens.h b/lib/utimens.h
index 169521d..625785c 100644
--- a/lib/utimens.h
+++ b/lib/utimens.h
@@ -1,3 +1,3 @@
 #include time.h
-int gl_futimens (int, char const *, struct timespec const [2]);
-int utimens (char const

[Bug-tar] tar fails to preserve hard links with --remove-files

2009-07-30 Thread Carl Worth
A user of Debian noticed that tar (1.22) does not always preserve hard
links when creating an archive with the --remove-files option. Ted Ts'o
provided the following analysis:

On Sun, 13 Apr 2003 15:45:27 -0400, Theodore Ts'o ty...@mit.edu wrote:
 I'm pretty sure, by the way, that the problem is that tar is keying
 off of the st_nlink to decide whether or not to do hard link
 processing as an optimization.  When --remove-files is present, then
 st_nlink of the hard-linked inode is dropping, and when st_nlink is
 one, tar can't tell that it was previously a hard-linked file.  The
 fix would require that tar check every single file's inode number
 against previously written files to see if it was a hard linked file
 (instead of just checking files where st_nlink  1), in the case when
 --remove-file option is in use.

I've attached two patches to fix this bug. The first implements Ted's
suggestion, (using the hard links hash table for all files when the
--remove-files option is in effect, regardless of the value of
st_nlink). The second patch adds a test case for the bug, (failing
before the first patch is added and passing afterwards).

Please let me know if you need anything else,

-Carl

PS. If you could preserve the CC list in any replies that would be
appreciated.
From f1ed85d46043c523cd5b8196c1d266f3606a2531 Mon Sep 17 00:00:00 2001
From: Carl Worth cwo...@cworth.org
Date: Wed, 29 Jul 2009 20:45:58 -0700
Subject: [PATCH 1/2] Preserve hard links with --remove-files

When the --remove-files option is in effect, it is no longer
reliable to use a file's link count to determine if we should
use the hash table for hard links. Instead, we look into the
hash table for every file when under the influence of the
--remove-files option.
---
 debian/changelog |3 ++-
 src/create.c |4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index df3a125..747988e 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -3,8 +3,9 @@ tar (1.22-1.2) UNRELEASED; urgency=low
   * Add Carl Worth as an uploader.
   * Fix to allow parallel build (-j2), closes #535319
   * Don't close file stream before EOF, closes #525818
+  * Preserve hard links with --remove-files, closes #188663
 
- -- Carl Worth cwo...@cworth.org  Wed, 29 Jul 2009 16:18:18 -0700
+ -- Carl Worth cwo...@cworth.org  Wed, 29 Jul 2009 21:28:45 -0700
 
 tar (1.22-1.1) unstable; urgency=low
 
diff --git a/src/create.c b/src/create.c
index fde7ed1..559aaa0 100644
--- a/src/create.c
+++ b/src/create.c
@@ -1377,7 +1377,7 @@ static Hash_table *link_table;
 static bool
 dump_hard_link (struct tar_stat_info *st)
 {
-  if (link_table  st-stat.st_nlink  1)
+  if (link_table  (st-stat.st_nlink  1 || remove_files_option))
 {
   struct link lp;
   struct link *duplicate;
@@ -1424,7 +1424,7 @@ file_count_links (struct tar_stat_info *st)
 {
   if (hard_dereference_option)
 return;
-  if (st-stat.st_nlink  1)
+  if (st-stat.st_nlink  1 || remove_files_option)
 {
   struct link *duplicate;
   struct link *lp = xmalloc (offsetof (struct link, name)
-- 
1.6.3.3

From a75570c728ed2c3f65fb075491a07a9b4ade407f Mon Sep 17 00:00:00 2001
From: Carl Worth cwo...@cworth.org
Date: Wed, 29 Jul 2009 21:26:23 -0700
Subject: [PATCH 2/2] Add hardlinks test (to ensure they are preserved with --remove-files)

The new hardlinks.at test case verifies the fix in the previous
commit, (without that change the test fails, and with the change
the test passes).
---
 tests/hardlinks.at |   50 ++
 tests/testsuite.at |2 ++
 2 files changed, 52 insertions(+), 0 deletions(-)
 create mode 100644 tests/hardlinks.at

diff --git a/tests/hardlinks.at b/tests/hardlinks.at
new file mode 100644
index 000..9e01ec3
--- /dev/null
+++ b/tests/hardlinks.at
@@ -0,0 +1,50 @@
+# Process this file with autom4te to create testsuite. -*- Autotest -*-
+
+# Test suite for GNU tar.
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301, USA.
+
+# Problem: hard links not preserved with --remove-files
+# Reported by: Theodore Y. Ts'o ty...@mit.edu
+# References: e194eae-0001le...@think.thunk.org
+# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=188663
+
+AT_SETUP([preserve hard links with --remove

Re: [Bug-tar] tar fails to preserve hard links with --remove-files

2009-07-30 Thread Carl Worth
On Thu, 2009-07-30 at 23:52 +0300, Sergey Poznyakoff wrote:
 Thanks for reporting.

Thank you, Sergey. And thanks for the quick reply.

 This one is mostly OK, except that I don't see any reason to test the
 remove_files_option value in file_count_links. This will unnecessarily
 inflate the link_table and slow down archive creation.

I had expected to only have to add it in one location, and I thought I
even tested it that way first. (But maybe I made the mistake and added
it to the wrong one.) Anyway, I'm glad to see a version of this land
upstream.

 I have installed the attached patch.

Excellent. I'll incorporate that into Debian's package for now, and look
forward to seeing this change appear in a future release.

-Carl



signature.asc
Description: This is a digitally signed message part