Bug#763119: [libtar] Bug#763119: misinterprets old-style GNU headers

2014-10-13 Thread Tim Kientzle

On Oct 13, 2014, at 10:38 AM, Magnus Holmgren holmg...@debian.org wrote:

 The 
 difference is that ustar is followed by two spaces, whereas in tar files 
 created by libtar it's followed by a null character.

The history behind this may help make it clearer:

There has been a POSIX standard for the tar file format since 1996.  It used to 
be part of the specification for the tar command-line program, but the file 
format is now part of the specification for the pax command-line program:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06

That standard specifies a 6-byte “magic” field containing “ustar\0” followed by 
a 2-byte version field containing the ASCII characters “00” (zero zero).  These 
8 bytes together are the canonical test for POSIX-compliant tar headers.

GNU tar is derived from pdtar which predated the POSIX standard.  Instead of a 
6-byte field followed by a 2-byte field, pdtar used a single 8-byte field 
containing “ustar\x20\x20\0”.  (I presume the author of pdtar got this from an 
early draft of the POSIX standard but I don’t know that for sure.)

Checking these 8 bytes provides a good test for GNU tar format headers vs. 
POSIX-standard ustar format headers.

Note that GNU tar can now generate POSIX-compliant ustar archives or pax 
extended format archives (with suitable options), so it’s important to 
distinguish the formats, not the programs.

And yes, there are definitely plenty of tar programs that write tar archives 
that are not compliant with either of these (which is probably why that option 
exists, to suppress the format check for non-standard tar archives).

I wrote a tar.5 man page to document my research into this:
  http://www.freebsd.org/cgi/man.cgi?query=tarsektion=5
It documents a lot of different tar variations and includes some discussion of 
how to distinguish them.

Cheers,

Tim

P.S.  strcmp() here is a very bad idea.  Other tar files may not have null 
bytes where you expect them (indeed, someone might deliberately craft a tar 
file without null bytes in order to force a crash).  You should actually use 
memcmp() for these tests since it will check exactly the bytes you expect.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#659294: [libarchive-discuss] Fwd: Bug#659294: libarchive: FTBFS on various architectures (hurd, mipsel, s390, s390x)

2012-02-21 Thread Tim Kientzle

On Feb 21, 2012, at 3:40 AM, Pino Toscano wrote:

 Hi,
 
 (greetings from your favourite Hurd porter)
 
 Alle lunedì 13 febbraio 2012, Tim Kientzle ha scritto:
 So on hurd, I see a couple of interesting failures for bsdtar:
 [...]
 
 Actually, libarchive is pretty fine on Hurd, as it was after I fixed 
 libarchove 3.0.2 (and in 3.0.3 there are no changes leading to issues).
 
 The problem is that the test suite run (just like the whole package 
 build) is done within fakeroot (which means fakeroot-tcp), triggering 
 Debian's #534879.

Thanks, Pino.

Libarchive's test suite does a lot of file operations, including
a lot of cross-checks of file modes, ownership, and other
properties.

The races described in #534789 would likely manifest
as essentially random failures in libarchive's test suite.

Tim




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#659294: [libarchive-discuss] Fwd: Bug#659294: libarchive: FTBFS on various architectures (hurd, mipsel, s390, s390x)

2012-02-12 Thread Tim Kientzle

On Feb 11, 2012, at 1:40 PM, Samuel Thibault wrote:

 Hello,
 
 Andres Mejia, le Fri 10 Feb 2012 16:34:40 -0500, a écrit :
 Hi. The new version of libarchive uploaded to unstable is failing the
 test suite (and thus failing to build the deb packages). We're going
 to need copies of the test directories from the test suites, e.g.,
 
  Details for failing tests: /tmp/libarchive_test.2012-02-06T23.02.12-000
 
 Please provide these test directories to libarchive-disc...@googlegroups.com.
 
 Here they are.

Thank you!  That helps.

So on hurd, I see a couple of interesting failures for bsdtar:

tar/test_copy:  bsdtar is getting a nonsense file mode.

   This test creates a bunch of files and directories with varying
permissions and file name lengths.  One directory (ending in _194)
is getting read as having a file type of 0, which is obviously nonsense.
As a result, it's not getting archived.  The test is recording two failures:
 1) when bsdtar emits an error to stdout when trying to archive
 this directory and
 2) when the directory doesn't appear in the restored copy
This is especially confusing because it's not happening for
any other files or directories in this test.

An strace or truss of the process might clarify things.

tar/test_option_H_upper and tar/test_option_L_upper: restoring incorrect 
permissions on a symlink to a directory

   The test archives and restores a number of files, directories, and symlinks
to those files and directories.  It looks like symlinks to directories are 
getting
restored with different permissions than expected (expected 0755,
seeing 0700).

Does hurd handle symlinks to directories differently than other
systems?  Is the configuration script not finding lchmod() or
lstat() correctly?

Again, an strace or truss of the process might clarify things.

You can run a single test by running the bsdtar_test program
manually and specifying the name of the test:

$ bsdtar_test -vvv -p /full/path/to/bsdtar -r /full/path/to/tar/test test_copy

If you can strace or truss this (following children so we find
out what bsdtar is doing as well), that would be appreciated.

Thanks,

Tim Kientzle




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#659294: [libarchive-discuss] Fwd: Bug#659294: libarchive: FTBFS on various architectures (hurd, mipsel, s390, s390x)

2012-02-09 Thread Tim Kientzle
Each of these reports includes the name of the test directory, e.g.,

  Details for failing tests: /tmp/libarchive_test.2012-02-06T23.02.12-000

Can we get the contents of those directories (which include detailed
logs for each failure, the files involved, and other details)?

Tim


On Feb 9, 2012, at 4:20 PM, Andres Mejia wrote:

 There are some build failures on various architectures in Debian. Note
 that they're failures in the test suite.
 
 
 -- Forwarded message --
 From: Julien Cristau jcris...@debian.org
 Date: Thu, Feb 9, 2012 at 5:52 PM
 Subject: Bug#659294: libarchive: FTBFS on various architectures (hurd,
 mipsel, s390, s390x)
 To: Debian Bug Tracking System sub...@bugs.debian.org
 
 
 Source: libarchive
 Version: 3.0.3-3
 Severity: serious
 Justification: fails to build from source (but built successfully in the past)
 
 libarchive FTBFS on various buildds, with test failures:
 https://buildd.debian.org/status/package.php?p=libarchive
 
 mipsel:
  Totals:
Tests run:  172
Tests failed: 1
Assertions checked:12407225
Assertions failed:3
Skips reported:  73
 
  Failing tests:
60: test_read_disk_directory_traversals (3 failures)
 
  Details for failing tests: /tmp/libarchive_test.2012-02-06T23.02.12-000
 
  FAIL: libarchive_test
 
 s390:
  Totals:
Tests run:  172
Tests failed: 1
Assertions checked:12407234
Assertions failed:3
Skips reported:  73
 
  Failing tests:
60: test_read_disk_directory_traversals (3 failures)
 
  Details for failing tests: /tmp/libarchive_test.2012-02-06T22.43.00-000
 
  FAIL: libarchive_test
 
 s390x:
  Totals:
Tests run:   31
Tests failed: 1
Assertions checked:7460
Assertions failed:2
Skips reported:   1
 
  Failing tests:
13: test_option_b (2 failures)
 
  Details for failing tests: /tmp/bsdtar_test.2012-02-06T22.40.24-000
 
  FAIL: bsdtar_test
 
 hurd-i386:
  Totals:
Tests run:   31
Tests failed: 2
Assertions checked:7459
Assertions failed:3
Skips reported:   1
 
  Failing tests:
7: test_option_H_upper (1 failures)
8: test_option_L_upper (2 failures)
 
  Details for failing tests: /tmp/bsdtar_test.2012-02-07T00.14.52-000
 
  FAIL: bsdtar_test
 
  [...]
 
  Totals:
Tests run:   28
Tests failed: 2
Assertions checked: 923
Assertions failed:   14
Skips reported:   1
 
  Failing tests:
1: test_basic (13 failures)
26: test_passthrough_reverse (1 failures)
 
  Details for failing tests: /tmp/bsdcpio_test.2012-02-07T00.22.32-000
 
  FAIL: bsdcpio_test
 
 Cheers,
 Julien
 
 
 -- 
 ~ Andres
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 libarchive-discuss group.
 To post to this group, send email to libarchive-disc...@googlegroups.com.
 To unsubscribe from this group, send email to 
 libarchive-discuss+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/libarchive-discuss?hl=en.
 
 
 




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#136231: [Bug-tar] Failure with --owner and --group when names cannot be mapped to IDs

2011-08-22 Thread Tim Kientzle

On Aug 13, 2011, at 10:29 AM, Paul Eggert wrote:

 On 08/08/2011 03:28 AM, Thayne Harbaugh wrote:
 
 Attached is a patch that allows archives to be created with
 arbitrary owner or group names.
 
 Thanks for the bug report and patch; I was unaware of the problem.
 This runs into another area that I'd been meaning to enhance for some
 time: tar doesn't let you specify both user name and number (only one
 or the other), and similarly for groups.  I wrote and installed the
 following patch into GNU tar, to address both enhancement requests
 simultaneously. 

FYI:  bsdtar uses separate options:  --uname and --uid for user
name/id, --gname and --gid for group name/id.

Tim




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#610783: [libarchive-discuss] Re: Bug#610783: bsdtar: Doesn't extract the install* and isolinux* directories of d-i images

2011-01-25 Thread Tim Kientzle
Thomas,

It will be a day or two before I can dig deeply into this.

Unfortunately, it's been a while since I looked at that
part of the code in detail, but I don't recall libarchive
requiring all directories to precede all files and
I recall a bunch of test cases over the last couple of years
dealing with various kinds of empty content.  So I
suspect the situation is a little less dire than
you think.  ;-)

Hmmm  Are you working with libarchive 2.8.4?
There have been a number of fixes in trunk specifically
to handle symlinks and other empty data files; maybe some
of those need to be backported.

Tim

CC: Michihiro, who has been doing a lot of work in
this part of libarchive recently.


On Jan 25, 2011, at 7:57 AM, Thomas Schmitt wrote:

 Hi,
 
 the situation now appears a bit better than first perceived in
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=610783
 
 The demand of libarchive that all directory entries have to come
 before any content block eases the task of producing digestible
 addresses for symbolic links, device files and empty data files.
 
 It suffices to let all these files point to an arbitrary block
 after the directory tree.
 In the general situation i would have to make them point to
 their neighbors. A much more ill situation, and also much more
 demanding towards the current libisofs architecture.
 
 If libarchive ever gives up the demand of directories-first, then
 it will have to compute own suitable keys for the affected file
 types which have no data content.
 Another problem would be hard links, which are quite common
 in ISO images. So my change proposal for libarchive appears more
 and more cumbersome.
 
 The simpler change in libisofs will probably allow me to make it
 suitable for unchanged libarchive by default.
 A dedicated block of 2048 zero bytes should avoid any ambiguity
 with non-empty data files.
 I am currently testing an implementation sketch which looks
 quite trustworthy.
 
 
 Another insight:
 The reason why bsdtar with genisoimage did not create two hard links
 to vmlinuz is in my Linux. It never shows hardlink siblings in mounted
 ISO images because it computes the inode number from the byte address
 of the directory entry. Two entries = two different inode numbers.
 So ISO images produced from /mnt contain two copies of vmlinuz.
 
 
 Have a nice day :)
 
 Thomas
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 libarchive-discuss group.
 To post to this group, send email to libarchive-disc...@googlegroups.com.
 To unsubscribe from this group, send email to 
 libarchive-discuss+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/libarchive-discuss?hl=en.
 
 




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#546185: Username lookup failures with bsdtar --chroot

2010-03-14 Thread Tim Kientzle

After some discussion, which you can read at:

http://code.google.com/p/libarchive/issues/detail?id=69

we've decided not to do anything about this at this time.

Basically, this seems like a glibc limitation;
bsdtar asks glibc to do the lookup and apparently
glibc needs more than just /etc/password and /etc/group.

We're willing to consider arguments to the contrary;
please feel free to add your comments to the bug
linked to just above.

Cheers,

Tim Kientzle






--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#546185: bsdtar: doesn't warn when library open after --chroot fails

2010-02-24 Thread Tim Kientzle

I've filed a bug on libarchive.googlecode.com
to track this issue upstream:

http://code.google.com/p/libarchive/issues/detail?id=69



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#530301: libarchive on HURD

2010-02-24 Thread Tim Kientzle

I've filed a bug upstream to track this:
http://code.google.com/p/libarchive/issues/detail?id=68

The UF_NODUMP issue is fixed as of libarchive 2.8.0.

I've sent a request to the original poster for clarification
on the PATH_MAX issue.

It's not clear from the original bug report whether HURD:
 a) Has no limit on the length of a path argument
to system calls such as open(), stat(), etc.
 b) Has some other way to determine that limit.

As soon as I get clarification on this issue, it will
be quite easy to fix.

Tim Kientzle
libarchive author and maintainer



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565474: [Bug-cpio] Re: Bug#565474: cpio makes device nodes into hard links when copying out of a cramfs image

2010-01-17 Thread Tim Kientzle

Carl Miller wrote:

On Sat, Jan 16, 2010 at 05:09:45AM +, Clint Adams wrote:

You mean something like this?

-  (d-header.c_dev_min == min) )
+  (d-header.c_dev_min == min)
+  ((d-header.c_mode  CP_IFBLK) != CP_IFBLK)
+  ((d-header.c_mode  CP_IFCHR) != CP_IFCHR) )


These tests should look like:

  (d-header.c_mode  CP_IFMT) != CP_IFBLK

Note the use of CP_IFMT to mask the file type
(which is a four-bit field).

Cheers,

Tim



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565474: [Bug-cpio] Re: Bug#565474: cpio makes device nodes into hard links when copying out of a cramfs image

2010-01-17 Thread Tim Kientzle

On Fri, Jan 15, 2010 at 08:06:54PM -0800, Carl Miller wrote:

cramfs takes a shortcut with device nodes, and assigns them all inode 1.


I presume it also assigns nlinks == 1?


When using cpio to copy files out of a cramfs image, cpio turns the second
and all subsequent copied device nodes into hard links to the first copied
out device node, based on them all having the same st_dev and st_ino.


Another possible solution:  When checking for hard links
during copy-out, do not generate hardlink entries if
nlinks  2.

Tim



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-24 Thread Tim Kientzle

My documentation for newc is based primarily on studying the
implementation of GNU cpio.  I've not found any good
references for the history of this format.


OK, this is good to know. I'm not saying one or the other program is
wrong, but having a piece of documentation describing an
implementation is of course not the same as a standard. 


POSIX considers cpio to be deprecated, so there's
no chance that POSIX will ever formally standardize
any cpio format variant other than the odc variant
documented under pax.

LSB documents this format since it's used by RPM.
That's the only de jure standard I've found that
discusses this particular cpio variant.  Unfortunately,
the LSB documentation for this format is pretty
incomplete.  It certainly doesn't discuss hardlink
handling.

The de facto standard for this format would be
the implementation of cpio that originally shipped
with SVr4.  I don't know if SVr4 includes any
documentation for the format apart from the implementation
itself.  I don't have access to SVr4 source code.

Cheers,

Tim Kientzle




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Tim Kientzle

The discusison started with the OpenBSD pax implementation, which also
does cpio. OpenBSD pax has the same roots as the FreeBSD one, so I
suspect some of the problems are shared. 


This would be Keith Muller's old combined implementation
of pax/cpio/tar.  Here's the situation as I understand it:

NetBSD and OpenBSD both use Keith Muller's old implementation
for pax, cpio, and tar.  I understand that both projects
have done a lot of work on it over the years.

FreeBSD's situation is in transition:
  * Uses my libarchive-based bsdtar implementation since
FreeBSD 6.0.  (Used GNU tar prior to that.)
  * Uses GNU cpio today, but might switch to my libarchive-based
bsdpcio in FreeBSD 8.0
  * Uses Keith Muller's pax implementation.  (A libarchive-based
pax is still a year or two out.)

I should test this bug against the FreeBSD pax (another divergent
tree based on Keith Muller's work).


I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 


Let us know what you find.

Cheers,

Tim Kientzle



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Tim Kientzle

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158


I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 


Oh, yeah.  Gunnar Ritter's Heirloom toolchest
(based on open-sourced ATT code) is also a good
comparison point:

 http://heirloom.sourceforge.net/tools.html




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-18 Thread Tim Kientzle

Tim Kientzle wrote:

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158


I think it would be good to compare to OpenSolaris cpio, being a third
independent implementation of cpio. At the moment I do not have access
to one, but I'll try to setup something today. 


Oh, yeah.  Gunnar Ritter's Heirloom toolchest
(based on open-sourced ATT code) is also a good
comparison point:

 http://heirloom.sourceforge.net/tools.html


From Gunnar Ritter's cpio.1 manpage:

The -c format was introduced with System V Release 4. Except
for the file size, it imposes no practical limitations on
files archived. The original SVR4 implementation stores the
contents of hard linked files only once and with the last
archived link. This cpio ensures compatibility with SVR4.
With archives created by implementations that employ other
methods for storing hard linked files, each file is extracted
as a single link, and some of these files may be empty.

I'm not sure what exactly this last sentence is supposed to
mean.

Tim



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#42158: a FreeBSD reference in disagreement with pax's behavior

2008-08-15 Thread Tim Kientzle

My documentation for newc is based primarily on studying the
implementation of GNU cpio.  I've not found any good
references for the history of this format.

I'm a little unclear what pax implementation you're
discussing.   Based on the description below, I would
suggest you test whether this program duplicates bodies
for each hardlink it stores.  This is easy to test:  Make
two hardlinks to the same large file, archive them
and see if the resulting archive is twice as big
as the file.  The odc (POSIX-1988) format should
duplicate bodies for hardlinks.  GNU cpio's implementation
of newc format does not.  Tar formats (including
the POSIX-2001 pax extended format) do not as a rule,
though the pax extended format does permit it as an
option.

My sympathies for the maintainers of the pax you're
discussing; it is surprisingly difficult to correctly
handle all three common approaches for hardlink management
within a single program.

Tim Kientzle


Daniel Kahn Gillmor wrote:

Tim Kientzle of FreeBSD (author of libarchive, attempting to CC here)
describes the cpio format here:

 http://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt

This document states about the SRV4 (newc) format (magic 070701, which
is what we're dealing with):

 In this format, hardlinked files are handled by setting the
 filesize to zero for each entry except the last one that appears
 in the archive.

So this is interpretation is shared by at least GNU and FreeBSD,
afaict.

pax appears to be in disagreement with these systems as far as its
creation of SRV4/newc archives goes, since it stores a non-zero
filesize for each entry of a hardlinked file.  It's in dangerous
disagreement with GNU and FreeBSD during the unpacking stage, because
it re-creates hardlinked files as 0 bytes in length if it encounters
archives created by the other utilities.

Hope this is a useful reference,

--dkg

For Tim's reference: we're discussing pax here:
http://bugs.debian.org/42158





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#494169: [Fwd: FW: Bug#494169: libarchive-dev: Please add a way to precompute (non-compressed) archive size]

2008-08-08 Thread Tim Kientzle

Thibaut,

John Goerzen forwarded your idea to me.

You can actually implement this on top of the current libarchive
code quite efficiently.  Use the low-level archive_write_open()
call and provide your own callbacks that just count the write
requests.  Then go through and write the archive as usual,
except skip the write_data() part (for tar and cpio formats,
libarchive will automatically pad the entry with NUL bytes).

This may sound slow, but it's really not.  One of the libarchive
unit tests use this approach to write 1TB archives in just a couple
of seconds.  (Thist test checks libarchive's handling of very large
archives with very large entries.)  Look at test_tar_large.c
for the details of how this particular test works.  (test_tar_large.c
actually does more than just count the data, but it should
give you the general idea.)

This will work very well with all of the tar and cpio formats.
It won't work well with some other formats where the length
does actually depend on the data.

Cheers,

Tim Kientzle

 Original Message 
Date: Thu, 7 Aug 2008 21:31:27 -0500
From: John Goerzen [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: FW: Bug#494169: libarchive-dev: Please add a way to precompute 
(non-compressed) archive size


Hi Tim,

We received the below feature request at Debian.  Not sure if it is
something you would be interested in implementing, but thought I'd
pass it along.

-- John

- Forwarded message from Thibaut VARENE [EMAIL PROTECTED] -

From: Thibaut VARENE [EMAIL PROTECTED]
Date: Thu, 07 Aug 2008 17:37:10 +0200
Reply-To: Thibaut VARENE [EMAIL PROTECTED], [EMAIL PROTECTED]
To: Debian Bug Tracking System [EMAIL PROTECTED]
Subject: Bug#494169: libarchive-dev: Please add a way to precompute
(non-compressed) archive size

Package: libarchive-dev
Severity: wishlist

Hi,

I thought I already reported this, but apparently I didn't so here's the
idea: I'm the author of mod_musicindex, in which I use libarchive to
send on-the-fly tar archives to remote clients.

Right now, the remote client's browser cannot display any ETA /
%complete for the current download since I cannot tell before hand what
will be the exact size of the archive I'm sending them.

It would be very nice if there were some API allowing for the
precomputation of the final size of a non-compressed archive that would
allow me to do something like:

archive_size = archive_size_header(a);
for (filename in file list) {
archive_size += archive_size_addfile(filename);
/* or using stat() and eg archive_size_addstat() */
}
archive_size += archive_size_footer(a);

(brainfart pseudo code, I hope you get the idea)

so that in the end archive_size will be exactly the size of the output
archive (header/padding included), without having to actually read files
or write the archive itself.

I could thus send the remote client the actual size of the data they're
going to be send beforehand.

The trick is, this size cannot be approximate: the browser will cut the
transfer even if I'm still sending them data if it has received as many
bits as it was told.

I'm under the impression that since this is about non-compressed
archive, and considering the structure of a tar archive, my goal should
be feasible without even having to read any input file. Am I wrong?

Hope I'm quite clear, thanks for your help

T-Bone

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: hppa (parisc64)

Kernel: Linux 2.6.22.14 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash




- End forwarded message -



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#494169: [Fwd: FW: Bug#494169: libarchive-dev: Please add a way to precompute (non-compressed) archive size]

2008-08-08 Thread Tim Kientzle

Thibaut VARENE wrote:

On Fri, Aug 8, 2008 at 8:42 AM, Tim Kientzle [EMAIL PROTECTED] wrote:


Thibaut,

John Goerzen forwarded your idea to me.

You can actually implement this on top of the current libarchive
code quite efficiently.  Use the low-level archive_write_open()
call and provide your own callbacks that just count the write
requests.  Then go through and write the archive as usual,
except skip the write_data() part (for tar and cpio formats,
libarchive will automatically pad the entry with NUL bytes).



Hum, I'm not quite sure I get this right... By count the write
requests and skip the write_data() part, you mean count the number
of bytes that should have been written, without writting them?


Yes.


This may sound slow, but it's really not.  One of the libarchive
unit tests use this approach to write 1TB archives in just a couple
of seconds.  (Thist test checks libarchive's handling of very large
archives with very large entries.)  Look at test_tar_large.c
for the details of how this particular test works.  (test_tar_large.c
actually does more than just count the data, but it should
give you the general idea.)



I will have to look into that code indeed. If I get this right tho,
you're basically suggesting that I read the input files twice: once
without writing the data, and the second time writing the data?


No.  I'm suggesting you use three passes:
  1) Get the information for all of the files, create archive_entry 
objects.
  2) Create a fake archive using the technique above.  You don't need 
to read the file data here!  After you call archive_write_close(), 
you'll know the size of the complete archive.  (This is really just your 
original idea.)
  3) Write the real archive as usual, including reading the actual file 
data and writing it to the archive.



Arguably the second read would come from the VFS cache, but that's
only assuming the server isn't too busy serving hundreds of other
files, which is why I'm a bit concerned about optimality... My limited
understanding of the tar format made me believe that it was possible
to know the space taken by a given file in a tar archive just by
looking at its size and adding the necessary padding bytes. Was I
wrong?


You could make this work.  If you're using plain ustar (no tar 
extensions!), then each file has the data padded to a multiple of 512 
bytes and there is a 512 byte header for each file.  Then you need to 
round the total result up the a multiple of the block size.  (Default is 
10240 bytes, you probably should set the block size to 512 bytes.)



For reference, here's the (relatively short) code I use:
http://www.parisc-linux.org/~varenet/musicindex/doc/html/output-tarball_8c-source.html



This will work very well with all of the tar and cpio formats.
It won't work well with some other formats where the length
does actually depend on the data.



Yep, that was quite clear indeed ;)

Thanks for your input!






--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#474400: Testsuite failure of bsdtar

2008-05-12 Thread Tim Kientzle

Bernhard R. Link wrote:


My guess is that the order readdir returns them implies which
file is stored as regular file and which is stored as hard link.
Usually the f_ file is stored as regular file and the l_ file
as hardlink. But on my filesystem the l_ file is stored as regular
file and the f_ file as hardlink.


Of course, you're absolutely right.  There are
different length limits for the source path and the target
and I'm testing right up to those limits, so for this test
it does actually matter which one gets stored in which way.

I think I see an easy way to restructure this test to avoid
this problem on all platforms.  I'll get that fix into 2.5.4.

Thank you for your patience.  I'll let you know as soon as I
have a candidate fix.

Cheers,

Tim Kientzle



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#474400: Testsuite failure of bsdtar

2008-05-11 Thread Tim Kientzle

Good guess, but I don't think that explains anything, since
the order in which hardlinked files get stored doesn't matter.
(There are a few tests that do detailed format verification;
those would be affected by the order in which files get stored,
but none of those depend on the ordering of readdir().)

The reference to 92-characters in that test refers to the length
of the filename not including the directory portion.
(Details: original/ is 9 characters, the link length limit for
ustar is 100 characters, and the test code only verifies the
final path element, hence should never see anything over 91
chars; I should probably put some more detailed comments around
that part of the test code.)

Have you had a chance to try the libarchive 2.5.3b package?
I've fixed a subtle issue with handling almost-too-long filenames
in ustar format (which doesn't appear to explain the problems
you're having) and also reworked a couple of the tests to give
more information.  Maybe that would shed additional light on
this problem.

I've also just committed a change to the test_format_newc code
that allows for a 1-second slop in that test, which should eliminate
the occasional failure you mentioned.  Thanks for reporting that.

Cheers,

Tim

Bernhard R. Link wrote:

I think I found the problem:
# tar -tvvf ustar/archive 21 | grep 
'original/f_abcdefghijklmnopqrstuvwxyzabcdefghijkl '
says:
hrw-r--r-- brl/brl   0 2008-05-11 11:41 
original/f_abcdefghijklmnopqrstuvwxyzabcdefghijkl link to 
original/l_abcdefghijklmnopqrstuvwxyzabcdefghijkl

This is the file with 92 characters, which is (as far as I understand
test_copy.c:111) not expressable as link.

I guess the reason for this is that depending on the filesystem options
readdir returns the prior created files in some random order, so
sometimes l_* gets returned before f_* (judging from the Debian buildds,
on Linux actually more often than not). And thus the test-case fails.

Hochachtungsvoll,
Bernhard R. Link

By the way, I think there is also a race condition in
cpio/test/test_format_new.c in test_format_newc. In one run
one of the assertEqualInt(t, from_hex(e + 46, 8)); failed for me with
two number differing by one. (I guess some the second stepped just as
the wrong moment). Dunno if that is important enough to fix, as seldom
as it seems to happen.








--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#474400: Fwd: Bug#474400: libarchive build failures

2008-05-07 Thread Tim Kientzle

John Goerzen wrote:


Here's some more info on libarchive for you.  Hope this helps.


Paul Cannon wrote:

Running tests on: /home/paul/packages/libarchive-2.4.17/bsdtar
0: test_basic
tar/test/test_basic.c:53: Assertion failed: Ints not equal
  r=256
  0=0
   Description: Error invoking /home/paul/packages/libarchive-2.4.17/bsdtar xf archive 


This is the basic copy test, which simply invokes bsdtar
to archive a file, a directory, a symlink, and a hardlink
and then invokes bsdtar again to restore the created archive to
a different directory.

For some reason, bsdtar is returning an error when
dearchiving.  The first question I have is whether
this is because the created archive is corrupt or
whether the restore process failed.

These tests are all run in a directory
/tmp/bsdtar_test_X/test_basic; the files named
below are all relative to this:
  * 'filelist' is a list of the files to be archived
  * the created archive is in 'copy/archive'
  * stdout/stderr from archiving should be in
'copy/pack.out' and 'copy/pack.err'
  * stdout/stderr from dearchiving should be in
'copy/unpack.out' and 'copy/unpack.err'
  * The restored files should be in the 'copy/' dir

You should be able to manually try unpacking the
archive with:
  $ cd /tmp/bsdtar_test_X/test_basic
  $ mkdir mytest
  $ cd mytest
  $ /home/paul/packages/libarchive-2.4.17/bsdtar xf ../copy/archive

If the archive itself seems correct, then an strace of
bsdtar while extracting might be very illuminating.

If this doesn't shed any light, send me the contents
of /tmp/bsdtar_test_X/test_basic for one of the
failed tests.  Maybe there are some other clues in there.

Cheers,

Tim Kientzle



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419793: libarchive-dev: archive_write_data seems to ignore wrappers in some circumstances

2007-08-18 Thread Tim Kientzle

Bernhard R. Link wrote:


There might be ways to help libraries to still link against libarchive
without having to change their off_t to 64 bit, but using off64_t in
those cases.


Actually, I believe I've laid the groundwork to
make the 64-bit/32-bit issue completely transparent
to software using libarchive.  It requires someone
knowledgable about this area of Linux to help me
fill in the few remaining pieces, but should not
require any deep knowledge of libarchive internals.

The most important piece is to create three versions
of archive_entry_stat() and archive_entry_copy_stat()
(which copy data between a platform-native
struct stat and libarchive's internal storage):

 * archive_entry_stat32 and archive_entry_copy_stat32,
which deal with struct stat32
 * Similar archive_entry_stat64 and archive_entry_copy_stat64
 * archive_entry_stat() and archive_entry_copy_stat()
would be defined twice:  In code as synonyms for the 64-bit
versions (to preserve ABI compat for programs using the
shared libraries) and as macros that map to 32-bit or
64-bit versions depending on the off_t size being
used by the program.

There are a couple of entry points defined in archive.h
that use off_t directly, but those are rarely used
and so are less critical.  Once the above is working,
similar techniques should apply to them.

Getting all of the configuration right so this builds
correctly on platforms that don't have two different
off_t/struct stat definitions is probably the
trickiest part.

This is a low priority for me right now, though I
do believe that I've factored the internals well
enough to make this a feasible project for someone
who knows nothing about libarchive.  If anyone is
interested, let me know.

Tim Kientzle


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419793: libarchive-dev: archive_write_data seems to ignore wrappers in some circumstances

2007-04-18 Thread Tim Kientzle
Note the use of __xstat64/fopen64 for the working code instead of 
__xstat/fopen for the non-working code. At this point I believed it had 
something to do with the use of 64bit offsets (USE_FILE_OFFSET64) but 
simply adding this define to the compiler commandline didn't fix the 
issue.


I'm not familiar with USE_FILE_OFFSET64, I thought the correct
incantation was this:

   gcc -D_FILE_OFFSET_BITS=64

Have you tried this?


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419793: libarchive-dev: archive_write_data seems to ignore wrappers in some circumstances

2007-04-17 Thread Tim Kientzle

Thibaut VARENE wrote:
  I'm the author and maintainer of libapache-mod-musicindex. Recently a
bug has been reported to me, by which the tarball download implemented 
in mod-musicindex wouldn't work properly with apache2, while it did work 
with apache1.3


Note the use of __xstat64/fopen64 for the working code instead of 
__xstat/fopen for the non-working code.


Linux has two struct stat definitions, and code compiled
with one cannot be used with the other.  Libarchive always
compiles with the 64-bit version so it can correctly handle
very large archives.  If your code is compiled to use the
32-bit version, it won't work.

Apparently, httpd.h is somehow forcing your code to use the
32-bit stat, which is incompatible with libarchive.

Tim Kientzle


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419793: libarchive-dev: archive_write_data seems to ignore wrappers in some circumstances

2007-04-17 Thread Tim Kientzle

Thibaut VARENE wrote:

bip 8192
Do I win?
bip 8192
Do I win?
bip 8192

(note, with libarchive 2, 'bip' (the output of archive_write) is 0, 
which seems a bit more coherent).


Yes, libarchive 1 does return wrong values from archive_write.
As you observed, this bug is fixed in libarchive 2.

Tim Kientzle


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]