Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-06 Thread Joerg Schilling
Nathan Stratton Treadway  wrote:

> On Mon, Nov 04, 2013 at 15:37:48 -0500, Nathan Stratton Treadway wrote:
> > (But I don't know if that level of cross-version consistency is
> > important enough to bother with.)
>
> I just ran across a discussion in the Debian bug tracking system where
> the change in output between 1.26 and 1.27 was causing problems:
>
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726963
>  "cannot reproduce several *.tar.xz files (possibly tar issue?)"

What do you mean with "the output"? The archive format, or the listing output?

BTW: star archives all three times since 1985 (this is longer than gtar exists) 
and archiving more then mtime will of course always create "different" archives.
Well since 1992, star allows to select the archive format and thus could allow 
you to select a less powerful format that only archives the mtime

I recommend to be careful when creating archives that are going to be 
distributed. I e.g. use this:

star -Hustar -cPM -find star-1.5.3 ! -type l -chown root -chgrp bin > 
star-1.5.3.tar

when I create a star source tarball.

As you see, I do not let normal IDs appear in the archive and I use the 
historical POSIX.1-1988 archive format.

With previous gtar versions, similar precautions would not always help. I know 
that gtar does not use libfind, but gtar also had the problem, that it did not 
grant to write archives only in the selected archive format. This e.g. caused 
some projects (such as mysql) to create source archives that could not be 
extracted by other tar implementations but gtar...because there have been 
unexpected vendor specific enhancements.



Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-05 Thread Nathan Stratton Treadway
On Mon, Nov 04, 2013 at 15:37:48 -0500, Nathan Stratton Treadway wrote:
> (But I don't know if that level of cross-version consistency is
> important enough to bother with.)

I just ran across a discussion in the Debian bug tracking system where
the change in output between 1.26 and 1.27 was causing problems:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=726963
 "cannot reproduce several *.tar.xz files (possibly tar issue?)"

Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-04 Thread Nathan Stratton Treadway
On Mon, Nov 04, 2013 at 20:33:32 +0200, Sergey Poznyakoff wrote:
> > (For what it's worth, with [just] this patch, David's scripts will still
> > detect "phantom" changes in the archives when he first moves from 1.26
> > to 1.27, due to the change in the umask field values for these LongLink
> > entries.)
> 
> Yes, I know.  But that cannot be helped: start_private_header is used
> in other places where safe umask is important (perhaps in this case
> too).

(Note: I wrote "umask field" earlier, but actually the field in question
is called "mode".)

Well, in v1.26 the write_gnu_long_link() function had the line
  FILL (header->header.mode, '0');
after the start_private_header() call, so I gather the mode field on
these LongLink entries is not used.  

Thus, presumably if it seemed worth keeping cross-version consistency.
you could still explicitly set the field back to "000" inside of
write_gnu_long_link() (without changing start_private_header).

(But I don't know if that level of cross-version consistency is
important enough to bother with.)

Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-04 Thread Sergey Poznyakoff
Hi Nathan,

> This patch works to produce unchanging tar archives in my test case
> here (on Linux).

Great. 

> I was curious if you had a particular reason to use "-1" instead of "0"
> here?

No particular reason at all, except that it forces start_private_header
to use 0 as mtime.  Using 0 instead is much better indeed.
 
> (For what it's worth, with [just] this patch, David's scripts will still
> detect "phantom" changes in the archives when he first moves from 1.26
> to 1.27, due to the change in the umask field values for these LongLink
> entries.)

Yes, I know.  But that cannot be helped: start_private_header is used
in other places where safe umask is important (perhaps in this case
too).

Regards,
Sergey



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-04 Thread Nathan Stratton Treadway
On Sun, Nov 03, 2013 at 21:31:29 +0200, Sergey Poznyakoff wrote:
> +  header = start_private_header ("././@LongLink", size, (time_t)-1);

This patch works to produce unchanging tar archives in my test case
here (on Linux).

I was curious if you had a particular reason to use "-1" instead of "0"
here?  When time_t is signed, start_private_header will use
"000" for the mtime field in either case, but if there are
systems with unsigned time_t will "-1" still do "the right thing"?



(For what it's worth, with [just] this patch, David's scripts will still
detect "phantom" changes in the archives when he first moves from 1.26
to 1.27, due to the change in the umask field values for these LongLink
entries.)

Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-04 Thread David Barri
​Good news. I tried the patch and it fixes the problem!

# 1.27 unpatched
> for f in $(seq 200); do tar -cp -T filelist -f - | md5sum; done | uniq
179a60d139644f62142b35fd2c292e68  -
2e2ed298b902d6abc338358bafbd3772  -
8c5f62d601c1f20dc42a8e94f6a39cb0  -
195c0c2c4814729ee063e08af4fc5ca5  -
e01aebfa80464e78d5374c926766cb48  -

# 1.27 patched
> for f in $(seq 200); do ./tar2 -cp -T filelist -f - | md5sum; done | uniq
06b8780d15e90f7f026ea5da2b1e5aaf  -

Cheers,
David


Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-03 Thread Sergey Poznyakoff
Hi David,

> The new version of tar has become non-deterministic.
> It looks like now it stores its own creation time or something internally.

Please try the attached patch.

Regards,
Sergey

diff --git a/src/create.c b/src/create.c
index e14e13d..ab5bb6e 100644
--- a/src/create.c
+++ b/src/create.c
@@ -543,7 +543,7 @@ write_gnu_long_link (struct tar_stat_info *st, const char *p, char type)
   union block *header;
   char *tmpname;

-  header = start_private_header ("././@LongLink", size, start_time.tv_sec);
+  header = start_private_header ("././@LongLink", size, (time_t)-1);
   uid_to_uname (0, &tmpname);
   UNAME_TO_CHARS (tmpname, header->header.uname);
   free (tmpname);


Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-03 Thread Nathan Stratton Treadway
On Sun, Nov 03, 2013 at 07:15:55 -0500, Nathan Stratton Treadway wrote:
> On Sun, Nov 03, 2013 at 09:27:04 +1100, David Barri wrote:
> > Sure. Here's a diff between two tars I created using tar -cp -T filelist
> > 
> > > diff -u <(xxd a.tar) <(xxd b.tar)
> > --- /proc/self/fd/11 2013-11-03 09:21:32.144102315 +1100
> > +++ /proc/self/fd/12 2013-11-03 09:21:32.147435636 +1100
> > @@ -188775,7 +188775,7 @@
> >  02e1660:   3030 3030 3634 3400 3030 3030  644.
> >  02e1670: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
> >  02e1680: 3030 3030 3136 3100 3132 3233 3530 3236  161.12235026
> > -02e1690: 3436 3300 3031 3136 3433 0020 4c00   463.011643. L...
> > +02e1690: 3531 3100 3031 3136 3335 0020 4c00   511.011635. L...
> 
> Are these both created with tar 1.27?  If so, how does the v1.26 file
> look in those same spots?

Actually, I think I see the problem...

The "L" near the end of the changed line is a flag that indicates that
the archive member in question is of type "GNUTYPE_LONGNAME".  All of
the blocks show in your "diff" message show that flag, so it would seem
that the issue is confined to the handling of members with long names
(i.e. over 100 characters long).

The tar archive header for such files is written by the
create.c:write_gnu_long_link()  function, and comparing that function
between 1.26 and 1.27 I see:

-  header = start_private_header ("././@LongLink", size, time (NULL));
-  FILL (header->header.mtime, '0');
-  FILL (header->header.mode, '0');
-  FILL (header->header.uid, '0');
-  FILL (header->header.gid, '0');
-  FILL (header->header.devmajor, 0);
-  FILL (header->header.devminor, 0);
+  header = start_private_header ("././@LongLink", size, start_time.tv_sec);


Presumably the idea was that those fields are all already set to their
"default" value by the start_private_header() call, and thus there's no
reason to explicitly clear them... but in this case the removal of the
  FILL (header->header.mtime, '0');
line does mean that the header generated in v1.26 had a blank mtime
field, while in 1.27 the value found there is based on the current time
(as of the start of this tar run)


Anyway, I was able to reproduce this with a 101-character filename:

  $ echo hi > 
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
  $ tar -cp -f longname_1.27.tar 
12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901
  

Comparing that with a tar file generated using v1.26 (using
  $ diff -U 9  <(xxd longname_1.26.tar) <(xxd longname_1.27.tar)
), I get:

 000: 2e2f 2e2f 404c 6f6e 674c 696e 6b00   ././@LongLink...
 010:          
 020:          
 030:          
 040:          
 050:          
-060:   3030 3030 3030 3000 3030 3030  000.
+060:   3030 3030 3634 3400 3030 3030  644.
 070: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
-080: 3030 3030 3134 3600 3030 3030 3030 3030  146.
-090: 3030 3000 3031 3135 3636 0020 4c00   000.011566. L...
+080: 3030 3030 3134 3600 3132 3233 3534 3533  146.12235453
+090: 3330 3000 3031 3136 3430 0020 4c00   300.011640. L...
 0a0:          
 0b0:          
 0c0:          
 0d0:          
 0e0:          
 0f0:          
 100: 0075 7374 6172 2020 0072 6f6f 7400   .ustar  .root...
 110:          
 120:     0072 6f6f 7400   .root...



Nathan






 




Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-03 Thread Joerg Schilling
David Barri  wrote:

> Fair enough. I was hoping just the description of the problem would be
> enough for someone to say "Ah! Yes, that will be because of !" but no
> such luck it seems.

No reason for apologies

For someone familiar with the tar archive format, it takes less than 30 seconds 
to understand what's happening (even though the "hdump -a" output would make it 
easier to see the problem).

BTW: Your previous assumption that there was a change in the gtar archive 
format could be proven from the output you sent.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-03 Thread Nathan Stratton Treadway
On Sun, Nov 03, 2013 at 09:27:04 +1100, David Barri wrote:
> Sure. Here's a diff between two tars I created using tar -cp -T filelist
> 
> > diff -u <(xxd a.tar) <(xxd b.tar)
> --- /proc/self/fd/11 2013-11-03 09:21:32.144102315 +1100
> +++ /proc/self/fd/12 2013-11-03 09:21:32.147435636 +1100
> @@ -188775,7 +188775,7 @@
>  02e1660:   3030 3030 3634 3400 3030 3030  644.
>  02e1670: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
>  02e1680: 3030 3030 3136 3100 3132 3233 3530 3236  161.12235026
> -02e1690: 3436 3300 3031 3136 3433 0020 4c00   463.011643. L...
> +02e1690: 3531 3100 3031 3136 3335 0020 4c00   511.011635. L...

Are these both created with tar 1.27?  If so, how does the v1.26 file
look in those same spots?

Also, if you showed a some more lines of context (i.e. "-U 9" or so) I
think you'd see the path names of the archive members in question, which
might give you a useful hint...


Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-03 Thread David Barri
Fair enough. I was hoping just the description of the problem would be
enough for someone to say "Ah! Yes, that will be because of !" but no
such luck it seems.

Ok, I'll try to come up with a set of data that reliably reproduces it.

Cheers


On 3 November 2013 18:52, Paul Eggert  wrote:

> David Barri wrote:
> > Sure. Here's a diff between two tars I created using tar -cp -T filelist
> >
> >> > diff -u <(xxd a.tar) <(xxd b.tar)
>
> I'm afraid that's not enough information for me to reproduce the problem,
> nor do I get the significance of those differences.
>
> Here's how I tried (and failed) to reproduce the problem:
>
> $ echo foo >foo
> $ tar -cf tar1 foo
> $ tar -cf tar2 foo
> $ cmp tar1 tar2
> $
>
>


Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-03 Thread Paul Eggert
David Barri wrote:
> Sure. Here's a diff between two tars I created using tar -cp -T filelist
> 
>> > diff -u <(xxd a.tar) <(xxd b.tar)

I'm afraid that's not enough information for me to reproduce the problem,
nor do I get the significance of those differences.

Here's how I tried (and failed) to reproduce the problem:

$ echo foo >foo
$ tar -cf tar1 foo
$ tar -cf tar2 foo
$ cmp tar1 tar2
$ 




Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-02 Thread David Barri
Sure. Here's a diff between two tars I created using tar -cp -T filelist

> diff -u <(xxd a.tar) <(xxd b.tar)
--- /proc/self/fd/11 2013-11-03 09:21:32.144102315 +1100
+++ /proc/self/fd/12 2013-11-03 09:21:32.147435636 +1100
@@ -188775,7 +188775,7 @@
 02e1660:   3030 3030 3634 3400 3030 3030  644.
 02e1670: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02e1680: 3030 3030 3136 3100 3132 3233 3530 3236  161.12235026
-02e1690: 3436 3300 3031 3136 3433 0020 4c00   463.011643. L...
+02e1690: 3531 3100 3031 3136 3335 0020 4c00   511.011635. L...
 02e16a0:          
 02e16b0:          
 02e16c0:          
@@ -189095,7 +189095,7 @@
 02e2a60:   3030 3030 3634 3400 3030 3030  644.
 02e2a70: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02e2a80: 3030 3030 3136 3100 3132 3233 3530 3236  161.12235026
-02e2a90: 3436 3300 3031 3136 3433 0020 4c00   463.011643. L...
+02e2a90: 3531 3100 3031 3136 3335 0020 4c00   511.011635. L...
 02e2aa0:          
 02e2ab0:          
 02e2ac0:          
@@ -189415,7 +189415,7 @@
 02e3e60:   3030 3030 3634 3400 3030 3030  644.
 02e3e70: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02e3e80: 3030 3030 3136 3100 3132 3233 3530 3236  161.12235026
-02e3e90: 3436 3300 3031 3136 3433 0020 4c00   463.011643. L...
+02e3e90: 3531 3100 3031 3136 3335 0020 4c00   511.011635. L...
 02e3ea0:          
 02e3eb0:          
 02e3ec0:          
@@ -189735,7 +189735,7 @@
 02e5260:   3030 3030 3634 3400 3030 3030  644.
 02e5270: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02e5280: 3030 3030 3136 3100 3132 3233 3530 3236  161.12235026
-02e5290: 3436 3300 3031 3136 3433 0020 4c00   463.011643. L...
+02e5290: 3531 3100 3031 3136 3335 0020 4c00   511.011635. L...
 02e52a0:          
 02e52b0:          
 02e52c0:          
@@ -193255,7 +193255,7 @@
 02f2e60:   3030 3030 3634 3400 3030 3030  644.
 02f2e70: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02f2e80: 3030 3030 3135 3300 3132 3233 3530 3236  153.12235026
-02f2e90: 3436 3300 3031 3136 3434 0020 4c00   463.011644. L...
+02f2e90: 3531 3100 3031 3136 3336 0020 4c00   511.011636. L...
 02f2ea0:          
 02f2eb0:          
 02f2ec0:          
@@ -193479,7 +193479,7 @@
 02f3c60:   3030 3030 3634 3400 3030 3030  644.
 02f3c70: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02f3c80: 3030 3030 3135 3300 3132 3233 3530 3236  153.12235026
-02f3c90: 3436 3300 3031 3136 3434 0020 4c00   463.011644. L...
+02f3c90: 3531 3100 3031 3136 3336 0020 4c00   511.011636. L...
 02f3ca0:          
 02f3cb0:          
 02f3cc0:          
@@ -193703,7 +193703,7 @@
 02f4a60:   3030 3030 3634 3400 3030 3030  644.
 02f4a70: 3030 3000 3030 3030 3030 3000 3030 3030  000.000.
 02f4a80: 3030 3030 3135 3300 3132 3233 3530 3236  153.12235026
-02f4a90: 3436 3300 3031 3136 3434 0020 4c00   463.011644. L...
+02f4a90: 3531 3100 3031 3136 3336 0020 4c00   511.011636. L...
 02f4aa0:          
 02f4ab0:          
 02f4ac0:          

​


Re: [Bug-tar] Tar 1.27 has become non-deterministic

2013-11-01 Thread Paul Eggert
David Barri wrote:
> It looks like now it stores its own creation time or something internally.

Could you give more details about the problem?
What exactly differs between the two tar files?
You can use the "od -cX" command to look at them
in more detail.