Re: [Bug-tar] Faster excludes
On Sat, 2009-08-01 at 01:46 +0100, Phil Proudman wrote: I've just updated the patch for tar-1.22. Please find attached. Thanks, Phil! Phil and Sergey, I just noticed I dropped bug-tar@gnu.org from the CC list. That was accidental. Anyone looking for Phil's updated patch can find it here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=221482#21 -Carl signature.asc Description: This is a digitally signed message part
[Bug-tar] --backup can destroy an extracted member when followed by an existing directory
On Mon, 08 Dec 2008 13:41:57 -0500 Eric Lammerts wrote: When extracting, tar sometimes renames a backup back to the original name (overwriting the newly-extracted file) for no good reason. Example: # mkdir dir1 dir2 # echo bla dir1/file1 # tar cf test.tar dir1 dir2 # tar xfv test.tar --backup dir1/ dir1/file1 Renaming `dir1/file1' to `dir1/file1~' dir2/ Renaming `dir1/file1~' back to `dir1/file1' This problem is made worse by the fact that if you do the above with tar xf instead of tar xfv, tar doesn't print anything at all (and exits with status 0)!! Failure to extract a file should be an error, not just a message printed only in verbose mode. Hi Eric, Thank you for the bug report against the Debian tar package. I've verified that this bug also exists in tar version 1.22-1.1 as packaged by Debian. I've traced through the code a bit and the problem is due to the handling of directory entries in extract.c:extract_archive(). Here is what happens when processing dir2 in the above scenario: 1. maybe_backup_file() returns true, but without changing after_backup_file (which was set previously to dir1/file1~) 2. extract_dir(), (called through *fun) returns -1 since the directory already exists 3. after detecting that an archive member failed to extract, the code calls undo_last_backup() to restore the backup. However, at this point the file names after_backup_file and before_backup_file still correspond to dir1/file1~ and dir1/file1 due to step #1. So step 1 is definitely a bug, and step 2 might potentially be a bug as well. I've attached a simple patch to fix the bug in step 1. Sergey, perhaps you'll have a cleaner fix. Please let me know. -Carl From 0465ed84d91583387763ab5ed016d1b58abc1ad2 Mon Sep 17 00:00:00 2001 From: Carl Worth cwo...@cworth.org Date: Tue, 4 Aug 2009 13:34:52 -0700 Subject: [PATCH] maybe_backup_file: Clear previously-set after_backup_name Without this, under various conditions, (including file_name being a directory), maybe_backup_file would return true without having done anything, nor having set after_backup_name. Then, if any error was detected while extracting the member, (such as directory already exists), the previously backed-up file would be restored, potentially destroying user data. This closes Debian bug #508199, with thanks to Eric Lammerts for reporting it with a concise test case. --- debian/changelog |3 ++- src/misc.c |4 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/debian/changelog b/debian/changelog index 1e26c81..daa0223 100644 --- a/debian/changelog +++ b/debian/changelog @@ -7,8 +7,9 @@ tar (1.22-1.2) UNRELEASED; urgency=low Thanks to Ted T'so for the idea and Sergey Poznyakoff for cleaning up my original implementation. * Respect DEB_BUILD_OPTIONS=nocheck to conform with Policy 3.8.2 + * Avoid incorrectly restoring old backup file, closes: #508199 - -- Carl Worth cwo...@cworth.org Tue, 04 Aug 2009 12:07:06 -0700 + -- Carl Worth cwo...@cworth.org Tue, 04 Aug 2009 13:36:56 -0700 tar (1.22-1.1) unstable; urgency=low diff --git a/src/misc.c b/src/misc.c index 951449e..c400bcd 100644 --- a/src/misc.c +++ b/src/misc.c @@ -422,6 +422,10 @@ maybe_backup_file (const char *file_name, bool this_is_the_archive) if (this_is_the_archive _remdev (file_name)) return true; + /* Ensure that no previously backed-up file remains in case we + * return early. */ + assign_string (after_backup_name, 0); + if (stat (file_name, file_stat)) { if (errno == ENOENT) -- 1.6.3.3 signature.asc Description: This is a digitally signed message part
[Bug-tar] Avoid undefined behavior of passing NULL to strcmp
On Mon, 19 Feb 2007 12:52:12 +0100, Raúl Sánchez Siles wrote: To reproduce the error do this: 1- echo filename filelist 2- echo filelist 3- tar xvfz tarfile.tgz -T filelist Note: its not necessary that filename exists. When you use -T, the contents of the file is dumped into the argv vector. Each line should contain a file name, optionally with its path. You will have a segment violation because on line 421 of lib/getopt.c you have: if (d-optind != argc !strcmp (argv[d-optind], --)) Hi Raúl, Thank you very much for the bug report. As it turns out, the bug may or may not a segmentation fault. To be precise, the code is passing a NULL pointer to strcmp, which is undefined behavior according the the relevant specification. So conforming implementations can do anything here, (including a segmentation fault as well as silently treating NULL as an empty string). My system, for example, seems to behave with the silent-treat-NULL-like-empty-string behavior. Regardless, this undefined behavior is something the program should avoid doing. There are two possible options as I see it. 1- Changing that line to: if (d-optind != argc argv[d-optind] !strcmp (argv[d-optind], --)) 2- Fix the parsing of -T so an empty line won't be included in the dump to argv vector. I've attached a patch for #1. Doing #2 looked like a bit more work to me, and I'm lazy that way, (but Sergey might want to do something about it). I tried to report this upstream but bug reporting is limited to memebers of the project. I don't know if anything has changed with respect to bug reporting for upstream tar in the past couple of years. But I've started reporting bugs recently, and have had no problem doing so without any membership. I simply send bug reports to bug-tar@gnu.org, (as I am doing with this email), and I've found upstream to be particularly responsive to high-quality bug reports, (thanks, Sergey!). Happy hacking, -Carl From 0328eb91a46840bf5020a7f627aa5664a2948621 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?Ra=C3=BAl=20S=C3=A1nchez=20Siles?= r...@barracuda.es Date: Tue, 4 Aug 2009 14:17:12 -0700 Subject: [PATCH] Avoid undefined behavior of passing NULL to strcmp Having a blank line in the file passed to the -T option will cause an entry in the argv array to be NULL. To avoid invoking undefined behavior, we take care not to pass these NULL values to strcmp. --- debian/changelog |4 +++- lib/getopt.c |2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/debian/changelog b/debian/changelog index 1e26c81..3238ff3 100644 --- a/debian/changelog +++ b/debian/changelog @@ -7,8 +7,10 @@ tar (1.22-1.2) UNRELEASED; urgency=low Thanks to Ted T'so for the idea and Sergey Poznyakoff for cleaning up my original implementation. * Respect DEB_BUILD_OPTIONS=nocheck to conform with Policy 3.8.2 + * Avoid undefined behavior of passing NULL to strcmp, closes: #411485 +Thanks to Raúl Sánchez Siles for proposing this patch. - -- Carl Worth cwo...@cworth.org Tue, 04 Aug 2009 12:07:06 -0700 + -- Carl Worth cwo...@cworth.org Tue, 04 Aug 2009 14:16:09 -0700 tar (1.22-1.1) unstable; urgency=low diff --git a/lib/getopt.c b/lib/getopt.c index f1e6d1f..f1c0e1f 100644 --- a/lib/getopt.c +++ b/lib/getopt.c @@ -413,7 +413,7 @@ _getopt_internal_r (int argc, char **argv, const char *optstring, then exchange with previous non-options as if it were an option, then skip everything else like a non-option. */ - if (d-optind != argc !strcmp (argv[d-optind], --)) + if (d-optind != argc argv[d-optind] !strcmp (argv[d-optind], --)) { d-optind++; -- 1.6.3.3 signature.asc Description: This is a digitally signed message part
[Bug-tar] [PATCH] Clarify default values for --same-owner and --no-same-owner
Most other options which have different defaults for superuser and ordinary users already say as much. Fix these to match. --- src/tar.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/tar.c b/src/tar.c index e10b804..309f9d7 100644 --- a/src/tar.c +++ b/src/tar.c @@ -481,9 +481,9 @@ static struct argp_option options[] = { {touch, 'm', 0, 0, N_(don't extract file modified time), GRID+1 }, {same-owner, SAME_OWNER_OPTION, 0, 0, - N_(try extracting files with the same ownership), GRID+1 }, + N_(try extracting files with the same ownership as exists in the archive (default for superuser)), GRID+1 }, {no-same-owner, NO_SAME_OWNER_OPTION, 0, 0, - N_(extract files as yourself), GRID+1 }, + N_(extract files as yourself (default for ordinary users)), GRID+1 }, {numeric-owner, NUMERIC_OWNER_OPTION, 0, 0, N_(always use numbers for user/group names), GRID+1 }, {preserve-permissions, 'p', 0, 0, -- 1.6.3.3
[Bug-tar] Preserve timestamp of symlinks when extracting (if utimensat available)
On Sun, 12 Jun 2005 14:22:53 -0400, D Goel wrote: Let's try the tar-trick to copy the contents of dir1 to dir2: (cd dir1 tar -clpsf- *) | (cd dir2 tar -xpsf -) This copies the contents, including correct timestamps and ownership, except that if there is a symlink (named, say s) in dir1. Then the copied file dir2/s fails to get the same timestamp as dir1/s. Hi DG, Thank you for the bug report. I've confirmed that this bug still exists in the latest version of tar packaged in Debian (1.22-1.1). And I almost replied saying that Linux doesn't provide the necessary functionality to do what you want, (I noticed that cp -a also exhibits the same bug---see bug #26766). But rsync does not have the bug, so I looked a little closer and found... the necessary magic is in the flags argument to the POSIX utimensat function call: The flags field is a bit mask that may be 0, or include the following constant, defined in fcntl.h: AT_SYMLINK_NOFOLLOW If pathname specifies a symbolic link, then update the time‐ stamps of the link, rather than the file to which it refers. The tar program already prefers to call utimensat if available, but currently always passes 0 for the flags argument. It also avoids ever calling this function for symlinks, (which would update the timestamp of the target to the archived time stamp of the link which is certainly not desired). I've attached a patch that fixes the bug when utimensat is available, and should have no effect when the function is not available, (though I didn't test on any such system). I also didn't write a new test case for this behavior, (I'm not sure how to make a test case that would not be expected to pass on some systems.) Any feedback would be appreciated, -Carl From 2fe8827ad1874abf3295a5aa96eba18372b98c12 Mon Sep 17 00:00:00 2001 From: Carl Worth cwo...@cworth.org Date: Tue, 4 Aug 2009 17:00:35 -0700 Subject: [PATCH] Preserve timestamp of symlinks when extracting (if utimensat available) If the utimensat function is not available, then do nothing with symlink time stamps, (which is the same as the current code). --- lib/utimens.c | 26 +++ lib/utimens.h |4 +- src/extract.c | 61 +++- src/misc.c|2 +- 4 files changed, 53 insertions(+), 40 deletions(-) diff --git a/lib/utimens.c b/lib/utimens.c index 708de10..ae8b0a6 100644 --- a/lib/utimens.c +++ b/lib/utimens.c @@ -72,11 +72,16 @@ struct utimbuf use just futimes (or equivalent) instead of utimes (or equivalent), and fail if on an old system without futimes (or equivalent). If TIMESPEC is null, set the time stamps to the current time. + If the file is a symlink and IS_SYMLINK is set, then the + time stamps of the symlink itself will be updated if + possible, (but if not supported by the operating system + then no change will occur). Return 0 on success, -1 (setting errno) on failure. */ int gl_futimens (int fd ATTRIBUTE_UNUSED, - char const *file, struct timespec const timespec[2]) + char const *file, struct timespec const timespec[2], + int is_symlink) { /* Some Linux-based NFS clients are buggy, and mishandle time stamps of files in NFS file systems in some cases. We have no @@ -102,7 +107,8 @@ gl_futimens (int fd ATTRIBUTE_UNUSED, #if HAVE_UTIMENSAT if (fd 0) { - int result = utimensat (AT_FDCWD, file, timespec, 0); + int flags = is_symlink ? AT_SYMLINK_NOFOLLOW : 0; + int result = utimensat (AT_FDCWD, file, timespec, flags); # ifdef __linux__ /* Work around what might be a kernel bug: http://bugzilla.redhat.com/442352 @@ -119,6 +125,12 @@ gl_futimens (int fd ATTRIBUTE_UNUSED, return result; } #endif + + /* Without utimensat we have no way to update a symlink rather than + * the target, so just return immediately. */ + if (is_symlink) + return 0; + #if HAVE_FUTIMENS { int result = futimens (fd, timespec); @@ -219,9 +231,13 @@ gl_futimens (int fd ATTRIBUTE_UNUSED, } /* Set the access and modification time stamps of FILE to be - TIMESPEC[0] and TIMESPEC[1], respectively. */ + TIMESPEC[0] and TIMESPEC[1], respectively. + If the file is a symlink and is_symlink is set, then the + time stamps of the symlink itself will be updated if + possible, (but if not supported by the operating system + then no change will occur). */ int -utimens (char const *file, struct timespec const timespec[2]) +utimens (char const *file, struct timespec const timespec[2], int is_symlink) { - return gl_futimens (-1, file, timespec); + return gl_futimens (-1, file, timespec, is_symlink); } diff --git a/lib/utimens.h b/lib/utimens.h index 169521d..625785c 100644 --- a/lib/utimens.h +++ b/lib/utimens.h @@ -1,3 +1,3 @@ #include time.h -int gl_futimens (int, char const *, struct timespec const [2]); -int utimens (char const
[Bug-tar] tar fails to preserve hard links with --remove-files
A user of Debian noticed that tar (1.22) does not always preserve hard links when creating an archive with the --remove-files option. Ted Ts'o provided the following analysis: On Sun, 13 Apr 2003 15:45:27 -0400, Theodore Ts'o ty...@mit.edu wrote: I'm pretty sure, by the way, that the problem is that tar is keying off of the st_nlink to decide whether or not to do hard link processing as an optimization. When --remove-files is present, then st_nlink of the hard-linked inode is dropping, and when st_nlink is one, tar can't tell that it was previously a hard-linked file. The fix would require that tar check every single file's inode number against previously written files to see if it was a hard linked file (instead of just checking files where st_nlink 1), in the case when --remove-file option is in use. I've attached two patches to fix this bug. The first implements Ted's suggestion, (using the hard links hash table for all files when the --remove-files option is in effect, regardless of the value of st_nlink). The second patch adds a test case for the bug, (failing before the first patch is added and passing afterwards). Please let me know if you need anything else, -Carl PS. If you could preserve the CC list in any replies that would be appreciated. From f1ed85d46043c523cd5b8196c1d266f3606a2531 Mon Sep 17 00:00:00 2001 From: Carl Worth cwo...@cworth.org Date: Wed, 29 Jul 2009 20:45:58 -0700 Subject: [PATCH 1/2] Preserve hard links with --remove-files When the --remove-files option is in effect, it is no longer reliable to use a file's link count to determine if we should use the hash table for hard links. Instead, we look into the hash table for every file when under the influence of the --remove-files option. --- debian/changelog |3 ++- src/create.c |4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/debian/changelog b/debian/changelog index df3a125..747988e 100644 --- a/debian/changelog +++ b/debian/changelog @@ -3,8 +3,9 @@ tar (1.22-1.2) UNRELEASED; urgency=low * Add Carl Worth as an uploader. * Fix to allow parallel build (-j2), closes #535319 * Don't close file stream before EOF, closes #525818 + * Preserve hard links with --remove-files, closes #188663 - -- Carl Worth cwo...@cworth.org Wed, 29 Jul 2009 16:18:18 -0700 + -- Carl Worth cwo...@cworth.org Wed, 29 Jul 2009 21:28:45 -0700 tar (1.22-1.1) unstable; urgency=low diff --git a/src/create.c b/src/create.c index fde7ed1..559aaa0 100644 --- a/src/create.c +++ b/src/create.c @@ -1377,7 +1377,7 @@ static Hash_table *link_table; static bool dump_hard_link (struct tar_stat_info *st) { - if (link_table st-stat.st_nlink 1) + if (link_table (st-stat.st_nlink 1 || remove_files_option)) { struct link lp; struct link *duplicate; @@ -1424,7 +1424,7 @@ file_count_links (struct tar_stat_info *st) { if (hard_dereference_option) return; - if (st-stat.st_nlink 1) + if (st-stat.st_nlink 1 || remove_files_option) { struct link *duplicate; struct link *lp = xmalloc (offsetof (struct link, name) -- 1.6.3.3 From a75570c728ed2c3f65fb075491a07a9b4ade407f Mon Sep 17 00:00:00 2001 From: Carl Worth cwo...@cworth.org Date: Wed, 29 Jul 2009 21:26:23 -0700 Subject: [PATCH 2/2] Add hardlinks test (to ensure they are preserved with --remove-files) The new hardlinks.at test case verifies the fix in the previous commit, (without that change the test fails, and with the change the test passes). --- tests/hardlinks.at | 50 ++ tests/testsuite.at |2 ++ 2 files changed, 52 insertions(+), 0 deletions(-) create mode 100644 tests/hardlinks.at diff --git a/tests/hardlinks.at b/tests/hardlinks.at new file mode 100644 index 000..9e01ec3 --- /dev/null +++ b/tests/hardlinks.at @@ -0,0 +1,50 @@ +# Process this file with autom4te to create testsuite. -*- Autotest -*- + +# Test suite for GNU tar. +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +# 02110-1301, USA. + +# Problem: hard links not preserved with --remove-files +# Reported by: Theodore Y. Ts'o ty...@mit.edu +# References: e194eae-0001le...@think.thunk.org +# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=188663 + +AT_SETUP([preserve hard links with --remove
Re: [Bug-tar] tar fails to preserve hard links with --remove-files
On Thu, 2009-07-30 at 23:52 +0300, Sergey Poznyakoff wrote: Thanks for reporting. Thank you, Sergey. And thanks for the quick reply. This one is mostly OK, except that I don't see any reason to test the remove_files_option value in file_count_links. This will unnecessarily inflate the link_table and slow down archive creation. I had expected to only have to add it in one location, and I thought I even tested it that way first. (But maybe I made the mistake and added it to the wrong one.) Anyway, I'm glad to see a version of this land upstream. I have installed the attached patch. Excellent. I'll incorporate that into Debian's package for now, and look forward to seeing this change appear in a future release. -Carl signature.asc Description: This is a digitally signed message part