Re: [Bug-tar] Tar 1.27 has become non-deterministic
Nathan Stratton Treadway wrote: > On Mon, Nov 04, 2013 at 15:37:48 -0500, Nathan Stratton Treadway wrote: > > (But I don't know if that level of cross-version consistency is > > important enough to bother with.) > > I just ran across a discussion in the Debian bug tracking system where > the change in output between 1.26 and 1.27 was causing problems: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726963 > "cannot reproduce several *.tar.xz files (possibly tar issue?)" What do you mean with "the output"? The archive format, or the listing output? BTW: star archives all three times since 1985 (this is longer than gtar exists) and archiving more then mtime will of course always create "different" archives. Well since 1992, star allows to select the archive format and thus could allow you to select a less powerful format that only archives the mtime I recommend to be careful when creating archives that are going to be distributed. I e.g. use this: star -Hustar -cPM -find star-1.5.3 ! -type l -chown root -chgrp bin > star-1.5.3.tar when I create a star source tarball. As you see, I do not let normal IDs appear in the archive and I use the historical POSIX.1-1988 archive format. With previous gtar versions, similar precautions would not always help. I know that gtar does not use libfind, but gtar also had the problem, that it did not grant to write archives only in the selected archive format. This e.g. caused some projects (such as mysql) to create source archives that could not be extracted by other tar implementations but gtar...because there have been unexpected vendor specific enhancements. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
Re: [Bug-tar] Tar 1.27 has become non-deterministic
On Mon, Nov 04, 2013 at 15:37:48 -0500, Nathan Stratton Treadway wrote: > (But I don't know if that level of cross-version consistency is > important enough to bother with.) I just ran across a discussion in the Debian bug tracking system where the change in output between 1.26 and 1.27 was causing problems: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726963 "cannot reproduce several *.tar.xz files (possibly tar issue?)" Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: [Bug-tar] Tar 1.27 has become non-deterministic
On Mon, Nov 04, 2013 at 20:33:32 +0200, Sergey Poznyakoff wrote: > > (For what it's worth, with [just] this patch, David's scripts will still > > detect "phantom" changes in the archives when he first moves from 1.26 > > to 1.27, due to the change in the umask field values for these LongLink > > entries.) > > Yes, I know. But that cannot be helped: start_private_header is used > in other places where safe umask is important (perhaps in this case > too). (Note: I wrote "umask field" earlier, but actually the field in question is called "mode".) Well, in v1.26 the write_gnu_long_link() function had the line FILL (header->header.mode, '0'); after the start_private_header() call, so I gather the mode field on these LongLink entries is not used. Thus, presumably if it seemed worth keeping cross-version consistency. you could still explicitly set the field back to "000" inside of write_gnu_long_link() (without changing start_private_header). (But I don't know if that level of cross-version consistency is important enough to bother with.) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: [Bug-tar] Tar 1.27 has become non-deterministic
Hi Nathan, > This patch works to produce unchanging tar archives in my test case > here (on Linux). Great. > I was curious if you had a particular reason to use "-1" instead of "0" > here? No particular reason at all, except that it forces start_private_header to use 0 as mtime. Using 0 instead is much better indeed. > (For what it's worth, with [just] this patch, David's scripts will still > detect "phantom" changes in the archives when he first moves from 1.26 > to 1.27, due to the change in the umask field values for these LongLink > entries.) Yes, I know. But that cannot be helped: start_private_header is used in other places where safe umask is important (perhaps in this case too). Regards, Sergey
Re: [Bug-tar] Tar 1.27 has become non-deterministic
On Sun, Nov 03, 2013 at 21:31:29 +0200, Sergey Poznyakoff wrote: > + header = start_private_header ("././@LongLink", size, (time_t)-1); This patch works to produce unchanging tar archives in my test case here (on Linux). I was curious if you had a particular reason to use "-1" instead of "0" here? When time_t is signed, start_private_header will use "000" for the mtime field in either case, but if there are systems with unsigned time_t will "-1" still do "the right thing"? (For what it's worth, with [just] this patch, David's scripts will still detect "phantom" changes in the archives when he first moves from 1.26 to 1.27, due to the change in the umask field values for these LongLink entries.) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: [Bug-tar] Tar 1.27 has become non-deterministic
Good news. I tried the patch and it fixes the problem! # 1.27 unpatched > for f in $(seq 200); do tar -cp -T filelist -f - | md5sum; done | uniq 179a60d139644f62142b35fd2c292e68 - 2e2ed298b902d6abc338358bafbd3772 - 8c5f62d601c1f20dc42a8e94f6a39cb0 - 195c0c2c4814729ee063e08af4fc5ca5 - e01aebfa80464e78d5374c926766cb48 - # 1.27 patched > for f in $(seq 200); do ./tar2 -cp -T filelist -f - | md5sum; done | uniq 06b8780d15e90f7f026ea5da2b1e5aaf - Cheers, David
Re: [Bug-tar] Tar 1.27 has become non-deterministic
Hi David, > The new version of tar has become non-deterministic. > It looks like now it stores its own creation time or something internally. Please try the attached patch. Regards, Sergey diff --git a/src/create.c b/src/create.c index e14e13d..ab5bb6e 100644 --- a/src/create.c +++ b/src/create.c @@ -543,7 +543,7 @@ write_gnu_long_link (struct tar_stat_info *st, const char *p, char type) union block *header; char *tmpname; - header = start_private_header ("././@LongLink", size, start_time.tv_sec); + header = start_private_header ("././@LongLink", size, (time_t)-1); uid_to_uname (0, &tmpname); UNAME_TO_CHARS (tmpname, header->header.uname); free (tmpname);
Re: [Bug-tar] Tar 1.27 has become non-deterministic
On Sun, Nov 03, 2013 at 07:15:55 -0500, Nathan Stratton Treadway wrote: > On Sun, Nov 03, 2013 at 09:27:04 +1100, David Barri wrote: > > Sure. Here's a diff between two tars I created using tar -cp -T filelist > > > > > diff -u <(xxd a.tar) <(xxd b.tar) > > --- /proc/self/fd/11 2013-11-03 09:21:32.144102315 +1100 > > +++ /proc/self/fd/12 2013-11-03 09:21:32.147435636 +1100 > > @@ -188775,7 +188775,7 @@ > > 02e1660: 3030 3030 3634 3400 3030 3030 644. > > 02e1670: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. > > 02e1680: 3030 3030 3136 3100 3132 3233 3530 3236 161.12235026 > > -02e1690: 3436 3300 3031 3136 3433 0020 4c00 463.011643. L... > > +02e1690: 3531 3100 3031 3136 3335 0020 4c00 511.011635. L... > > Are these both created with tar 1.27? If so, how does the v1.26 file > look in those same spots? Actually, I think I see the problem... The "L" near the end of the changed line is a flag that indicates that the archive member in question is of type "GNUTYPE_LONGNAME". All of the blocks show in your "diff" message show that flag, so it would seem that the issue is confined to the handling of members with long names (i.e. over 100 characters long). The tar archive header for such files is written by the create.c:write_gnu_long_link() function, and comparing that function between 1.26 and 1.27 I see: - header = start_private_header ("././@LongLink", size, time (NULL)); - FILL (header->header.mtime, '0'); - FILL (header->header.mode, '0'); - FILL (header->header.uid, '0'); - FILL (header->header.gid, '0'); - FILL (header->header.devmajor, 0); - FILL (header->header.devminor, 0); + header = start_private_header ("././@LongLink", size, start_time.tv_sec); Presumably the idea was that those fields are all already set to their "default" value by the start_private_header() call, and thus there's no reason to explicitly clear them... but in this case the removal of the FILL (header->header.mtime, '0'); line does mean that the header generated in v1.26 had a blank mtime field, while in 1.27 the value found there is based on the current time (as of the start of this tar run) Anyway, I was able to reproduce this with a 101-character filename: $ echo hi > 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901 $ tar -cp -f longname_1.27.tar 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901 Comparing that with a tar file generated using v1.26 (using $ diff -U 9 <(xxd longname_1.26.tar) <(xxd longname_1.27.tar) ), I get: 000: 2e2f 2e2f 404c 6f6e 674c 696e 6b00 ././@LongLink... 010: 020: 030: 040: 050: -060: 3030 3030 3030 3000 3030 3030 000. +060: 3030 3030 3634 3400 3030 3030 644. 070: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. -080: 3030 3030 3134 3600 3030 3030 3030 3030 146. -090: 3030 3000 3031 3135 3636 0020 4c00 000.011566. L... +080: 3030 3030 3134 3600 3132 3233 3534 3533 146.12235453 +090: 3330 3000 3031 3136 3430 0020 4c00 300.011640. L... 0a0: 0b0: 0c0: 0d0: 0e0: 0f0: 100: 0075 7374 6172 2020 0072 6f6f 7400 .ustar .root... 110: 120: 0072 6f6f 7400 .root... Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: [Bug-tar] Tar 1.27 has become non-deterministic
David Barri wrote: > Fair enough. I was hoping just the description of the problem would be > enough for someone to say "Ah! Yes, that will be because of !" but no > such luck it seems. No reason for apologies For someone familiar with the tar archive format, it takes less than 30 seconds to understand what's happening (even though the "hdump -a" output would make it easier to see the problem). BTW: Your previous assumption that there was a change in the gtar archive format could be proven from the output you sent. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
Re: [Bug-tar] Tar 1.27 has become non-deterministic
On Sun, Nov 03, 2013 at 09:27:04 +1100, David Barri wrote: > Sure. Here's a diff between two tars I created using tar -cp -T filelist > > > diff -u <(xxd a.tar) <(xxd b.tar) > --- /proc/self/fd/11 2013-11-03 09:21:32.144102315 +1100 > +++ /proc/self/fd/12 2013-11-03 09:21:32.147435636 +1100 > @@ -188775,7 +188775,7 @@ > 02e1660: 3030 3030 3634 3400 3030 3030 644. > 02e1670: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. > 02e1680: 3030 3030 3136 3100 3132 3233 3530 3236 161.12235026 > -02e1690: 3436 3300 3031 3136 3433 0020 4c00 463.011643. L... > +02e1690: 3531 3100 3031 3136 3335 0020 4c00 511.011635. L... Are these both created with tar 1.27? If so, how does the v1.26 file look in those same spots? Also, if you showed a some more lines of context (i.e. "-U 9" or so) I think you'd see the path names of the archive members in question, which might give you a useful hint... Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: [Bug-tar] Tar 1.27 has become non-deterministic
Fair enough. I was hoping just the description of the problem would be enough for someone to say "Ah! Yes, that will be because of !" but no such luck it seems. Ok, I'll try to come up with a set of data that reliably reproduces it. Cheers On 3 November 2013 18:52, Paul Eggert wrote: > David Barri wrote: > > Sure. Here's a diff between two tars I created using tar -cp -T filelist > > > >> > diff -u <(xxd a.tar) <(xxd b.tar) > > I'm afraid that's not enough information for me to reproduce the problem, > nor do I get the significance of those differences. > > Here's how I tried (and failed) to reproduce the problem: > > $ echo foo >foo > $ tar -cf tar1 foo > $ tar -cf tar2 foo > $ cmp tar1 tar2 > $ > >
Re: [Bug-tar] Tar 1.27 has become non-deterministic
David Barri wrote: > Sure. Here's a diff between two tars I created using tar -cp -T filelist > >> > diff -u <(xxd a.tar) <(xxd b.tar) I'm afraid that's not enough information for me to reproduce the problem, nor do I get the significance of those differences. Here's how I tried (and failed) to reproduce the problem: $ echo foo >foo $ tar -cf tar1 foo $ tar -cf tar2 foo $ cmp tar1 tar2 $
Re: [Bug-tar] Tar 1.27 has become non-deterministic
Sure. Here's a diff between two tars I created using tar -cp -T filelist > diff -u <(xxd a.tar) <(xxd b.tar) --- /proc/self/fd/11 2013-11-03 09:21:32.144102315 +1100 +++ /proc/self/fd/12 2013-11-03 09:21:32.147435636 +1100 @@ -188775,7 +188775,7 @@ 02e1660: 3030 3030 3634 3400 3030 3030 644. 02e1670: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02e1680: 3030 3030 3136 3100 3132 3233 3530 3236 161.12235026 -02e1690: 3436 3300 3031 3136 3433 0020 4c00 463.011643. L... +02e1690: 3531 3100 3031 3136 3335 0020 4c00 511.011635. L... 02e16a0: 02e16b0: 02e16c0: @@ -189095,7 +189095,7 @@ 02e2a60: 3030 3030 3634 3400 3030 3030 644. 02e2a70: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02e2a80: 3030 3030 3136 3100 3132 3233 3530 3236 161.12235026 -02e2a90: 3436 3300 3031 3136 3433 0020 4c00 463.011643. L... +02e2a90: 3531 3100 3031 3136 3335 0020 4c00 511.011635. L... 02e2aa0: 02e2ab0: 02e2ac0: @@ -189415,7 +189415,7 @@ 02e3e60: 3030 3030 3634 3400 3030 3030 644. 02e3e70: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02e3e80: 3030 3030 3136 3100 3132 3233 3530 3236 161.12235026 -02e3e90: 3436 3300 3031 3136 3433 0020 4c00 463.011643. L... +02e3e90: 3531 3100 3031 3136 3335 0020 4c00 511.011635. L... 02e3ea0: 02e3eb0: 02e3ec0: @@ -189735,7 +189735,7 @@ 02e5260: 3030 3030 3634 3400 3030 3030 644. 02e5270: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02e5280: 3030 3030 3136 3100 3132 3233 3530 3236 161.12235026 -02e5290: 3436 3300 3031 3136 3433 0020 4c00 463.011643. L... +02e5290: 3531 3100 3031 3136 3335 0020 4c00 511.011635. L... 02e52a0: 02e52b0: 02e52c0: @@ -193255,7 +193255,7 @@ 02f2e60: 3030 3030 3634 3400 3030 3030 644. 02f2e70: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02f2e80: 3030 3030 3135 3300 3132 3233 3530 3236 153.12235026 -02f2e90: 3436 3300 3031 3136 3434 0020 4c00 463.011644. L... +02f2e90: 3531 3100 3031 3136 3336 0020 4c00 511.011636. L... 02f2ea0: 02f2eb0: 02f2ec0: @@ -193479,7 +193479,7 @@ 02f3c60: 3030 3030 3634 3400 3030 3030 644. 02f3c70: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02f3c80: 3030 3030 3135 3300 3132 3233 3530 3236 153.12235026 -02f3c90: 3436 3300 3031 3136 3434 0020 4c00 463.011644. L... +02f3c90: 3531 3100 3031 3136 3336 0020 4c00 511.011636. L... 02f3ca0: 02f3cb0: 02f3cc0: @@ -193703,7 +193703,7 @@ 02f4a60: 3030 3030 3634 3400 3030 3030 644. 02f4a70: 3030 3000 3030 3030 3030 3000 3030 3030 000.000. 02f4a80: 3030 3030 3135 3300 3132 3233 3530 3236 153.12235026 -02f4a90: 3436 3300 3031 3136 3434 0020 4c00 463.011644. L... +02f4a90: 3531 3100 3031 3136 3336 0020 4c00 511.011636. L... 02f4aa0: 02f4ab0: 02f4ac0:
Re: [Bug-tar] Tar 1.27 has become non-deterministic
David Barri wrote: > It looks like now it stores its own creation time or something internally. Could you give more details about the problem? What exactly differs between the two tar files? You can use the "od -cX" command to look at them in more detail.