Re: all estimate timed out

2013-04-12 Thread Chris Hoogendyk

Just to follow up on this. Amanda backups have been running smoothly for a week 
now.

For this one DLE, I set up amgtar and disabled the sparse option. It ran, but took most of Saturday 
to complete. Then, having a full backup of that, I broke it up into 6 DLE's using excludes and 
includes. I added one a day back into the disklist. It now has them all and can spread the fulls 
over the week. Backups for the last couple of days have completed around 4am.


As a followup, in case anyone cares to discuss technicalities and examples, has anyone run into this 
before? It seems any site doing lots of sizable scanned images, or GIS systems with tiff maps, would 
have run into it. I don't know how often sparse file treatment is an important thing. Database files 
can be sparse, but proper procedure is to use the database tools (e.g. mysqldump) for backups and 
not to just backup the data directory. It's not clear to me exactly what gnutar is doing with sparse 
or why it is so inefficient (timewise). I don't think these tif files are sparse. They are just 
large. And gnutar is not just doubling the time as described in 
http://www.gnu.org/software/tar/manual/html_node/sparse.html. I was experiencing on the order of 400 
times as much time for the sparse option compared to when I removed the sparse option.


[ Recalling details from earlier messages -- Amanda 3.3.2 with gtar 1.23 (/usr/sfw/bin/gtar) on 
Solaris 10 on a T5220 (UltraSPARC, 8 core, 32G memory) with multipath SAS interface to J4500 for 
storage using zfs raidz with 2TB drives. Nightly backups go out to an AIT5 tape library on an 
Ultra320 LVD SCSI interface. Backing up on the order of 100 DLEs from 5 machines over GigE on this 
Amanda server. Problem DLE was on localhost on the J4500. ]



On 4/5/13 3:16 PM, Jean-Louis Martineau wrote:

On 04/05/2013 12:09 PM, Chris Hoogendyk wrote:
OK, folks, it is the --sparse option that Amanda is putting on the gtar. This is 
/usr/sfw/bin/tar version 1.23 on Solaris 10. I have a test script that runs the runtar and a test 
directory with just 10 of the tif files in it.


Without the --sparse option, time tells me that it takes 0m0.57s to run the 
script.

With the --sparse option, time tells me that it takes 3m14.91s to run the 
script.

Scale that from 10 to 1300 tif files, and I have serious issues.

Now what? Can I tell Amanda not to do that? What difference will it make? Is 
this a bug in gtar?

Use the amgtar application instead of the GNUTAR program, it allow to disable 
the sparse option.

tar can't know where are the holes, it must read them.

You WANT the sparse option, otherwise your backup will be large because tar 
fill the holes with 0.

Your best option is to use the calcsize or server estimate.

Jean-Louis



--
---

Chris Hoogendyk

-
   O__   Systems Administrator
  c/ /'_ --- Biology  Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst

hoogen...@bio.umass.edu

---

Erdös 4



Re: all estimate timed out

2013-04-12 Thread Nathan Stratton Treadway
On Fri, Apr 12, 2013 at 12:59:39 -0400, Chris Hoogendyk wrote:
 As a followup, in case anyone cares to discuss technicalities and
 examples, has anyone run into this before? It seems any site doing
 lots of sizable scanned images, or GIS systems with tiff maps, would
 have run into it. I don't know how often sparse file treatment is an
 important thing. Database files can be sparse, but proper procedure
 is to use the database tools (e.g. mysqldump) for backups and not to
 just backup the data directory. It's not clear to me exactly what
 gnutar is doing with sparse or why it is so inefficient (timewise).
 I don't think these tif files are sparse. They are just large. And
 gnutar is not just doubling the time as described in
 http://www.gnu.org/software/tar/manual/html_node/sparse.html. I was
 experiencing on the order of 400 times as much time for the sparse
 option compared to when I removed the sparse option.

I have been meaning to reply to your earlier messages but haven't had a
chance to finish the background research I wanted to do first;
meanwhile, a few quick comments and questions:

* When you did your manual test runs with and without --sparse, did the
  estimated sizes shown at the end of the run change any?

* I'll have to go back and see if things were any different with GNU tar
  v1.23, but when I was looking at the latest version's source code it,
  it was clear that at least the intention was that using --sparse would
  only change behavior when the input files are sparse -- so I am
  curious to know for sure if your tiff files are actually sparse.

  The check that tar uses to decide this is to see if the inode's block
  count times the block size for the filesystem is less than the inode's
  listed file size.   (That is, does the file have less space allocated
  than its listed size?)

  Here are a few ways I've used in the past to search for sparse files:

  - if you have GNU ls installed on this system:
  ls -sl --block-size=1 
and then check to see if the number in the first colum is smaller
than the number in the file size column.

  - if you have GNU stat installed you can run
  stat -c %n: alloc: %b * %B  size: %s
, and then check to see if the %b times %B value is less than the %s 
value.

  - using the standard Sun ls, you can do
  ls -sl 
, and then multiply the value in the first column by 512.  (I assume
the block size used is a constant 512 in that case, regardless of
file system.)


* The doubling of the time mentioned in the man page is in the context
  of making an actual dump, but the slowdown is much worse for the
  estimate phase.  That's because normally when tar notices that the
  output file is /dev/null, it realizes that you don't actually want the
  data from the input files, and thus doesn't actually read through
  their contents, but simply looks at the file size (from the inode
  information) and adds that to the dump-size tally before moving on to
  the next file.  So the time spend during the estimate is almost
  entirely due to reading through the directory tree, and won't depend
  on the size of the files in question.

  In the case of a file that's actually sparse, though, if the --sparse
  option is enabled then tar has to actually read in the entire file to
  see how much of it is zero blocks.  So, if many of your files are
  indeed actually sparse, then what will happen is that the estimate
  time will be about the same as the actual dump time, rather than the
  usual much-shorter estimate time.

Nathan






Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: all estimate timed out

2013-04-12 Thread Chris Hoogendyk

Thank you, Nathan. Informative.

The Total bytes written: was identical with and without the --sparse option (right down to the 
last byte ;-) ). It was the time taken to arrive at that estimate that was so very different:


Total bytes written: 2086440960 (2.0GiB, 11MiB/s)
real3m14.91s

Total bytes written: 2086440960 (2.0GiB, 17GiB/s)
real0m0.57s


However, if I do an `ls -sl` on the directory and multiply the first column by 512, that does not 
quite match the length in bytes column. It is the same order of magnitude, but they are slightly 
different. I'm not sure what causes that, but I don't think the tif files are really sparse in the 
usual sense of that. Any imaginable gain in efficiency with regard to space would be minimal, and 
the cost in time is ridiculous.


Here is an example of one directory:

marlin:/export/herbarium/mellon/Masstypes_Scans_Server/ACANTHACEAE# ls -sl

total 4072318
410608 -rw-rw   1 ariehtal herbarum 210246048 Dec 10 11:04 AC00312847.tif
402936 -rw-rw   1 ariehtal herbarum 206423224 Dec  5 16:09 AC00312848.tif
412398 -rw-rw   1 ariehtal herbarum 211246700 Dec  5 16:16 AC00312849.tif
405493 -rw-rw   1 ariehtal herbarum 207676904 Dec 12 11:52 AC00312850.tif
408052 -rw-rw   1 ariehtal herbarum 209052412 Dec  5 15:13 AC00312937.tif
412909 -rw-rw   1 ariehtal herbarum 211451884 Dec  5 15:35 AC00312939.tif
415468 -rw-rw   1 ariehtal herbarum 212788668 Dec 12 11:46 AC00312940.tif
390142 -rw-rw   1 ariehtal herbarum 199753780 Nov 13 11:28 
AC00312941-sj0.tif
406004 -rw-rw   1 ariehtal herbarum 207925584 Dec 10 11:17 AC00312942.tif
408308 -rw-rw   1 ariehtal herbarum 209102728 Dec 10 11:28 AC00312943.tif

marlin:/export/herbarium/mellon/Masstypes_Scans_Server/ACANTHACEAE#



On 4/12/13 3:41 PM, Nathan Stratton Treadway wrote:

On Fri, Apr 12, 2013 at 12:59:39 -0400, Chris Hoogendyk wrote:

As a followup, in case anyone cares to discuss technicalities and
examples, has anyone run into this before? It seems any site doing
lots of sizable scanned images, or GIS systems with tiff maps, would
have run into it. I don't know how often sparse file treatment is an
important thing. Database files can be sparse, but proper procedure
is to use the database tools (e.g. mysqldump) for backups and not to
just backup the data directory. It's not clear to me exactly what
gnutar is doing with sparse or why it is so inefficient (timewise).
I don't think these tif files are sparse. They are just large. And
gnutar is not just doubling the time as described in
http://www.gnu.org/software/tar/manual/html_node/sparse.html. I was
experiencing on the order of 400 times as much time for the sparse
option compared to when I removed the sparse option.

I have been meaning to reply to your earlier messages but haven't had a
chance to finish the background research I wanted to do first;
meanwhile, a few quick comments and questions:

* When you did your manual test runs with and without --sparse, did the
   estimated sizes shown at the end of the run change any?

* I'll have to go back and see if things were any different with GNU tar
   v1.23, but when I was looking at the latest version's source code it,
   it was clear that at least the intention was that using --sparse would
   only change behavior when the input files are sparse -- so I am
   curious to know for sure if your tiff files are actually sparse.

   The check that tar uses to decide this is to see if the inode's block
   count times the block size for the filesystem is less than the inode's
   listed file size.   (That is, does the file have less space allocated
   than its listed size?)

   Here are a few ways I've used in the past to search for sparse files:

   - if you have GNU ls installed on this system:
   ls -sl --block-size=1
 and then check to see if the number in the first colum is smaller
 than the number in the file size column.

   - if you have GNU stat installed you can run
   stat -c %n: alloc: %b * %B  size: %s
 , and then check to see if the %b times %B value is less than the %s
 value.

   - using the standard Sun ls, you can do
   ls -sl
 , and then multiply the value in the first column by 512.  (I assume
 the block size used is a constant 512 in that case, regardless of
 file system.)


* The doubling of the time mentioned in the man page is in the context
   of making an actual dump, but the slowdown is much worse for the
   estimate phase.  That's because normally when tar notices that the
   output file is /dev/null, it realizes that you don't actually want the
   data from the input files, and thus doesn't actually read through
   their contents, but simply looks at the file size (from the inode
   information) and adds that to the dump-size tally before moving on to
   the next file.  So the time spend during the estimate is almost
   entirely due to reading through the directory tree, and won't depend
   on the size of the files 

Re: all estimate timed out

2013-04-12 Thread Nathan Stratton Treadway
On Fri, Apr 12, 2013 at 17:09:11 -0400, Chris Hoogendyk wrote:
 The Total bytes written: was identical with and without the
 --sparse option (right down to the last byte ;-) ). It was the time
 taken to arrive at that estimate that was so very different:
 
 Total bytes written: 2086440960 (2.0GiB, 11MiB/s)
 real3m14.91s
 
 Total bytes written: 2086440960 (2.0GiB, 17GiB/s)
 real0m0.57s
 
 
 However, if I do an `ls -sl` on the directory and multiply the first
 column by 512, that does not quite match the length in bytes column.
 It is the same order of magnitude, but they are slightly different.
 I'm not sure what causes that, but I don't think the tif files are
 really sparse in the usual sense of that. Any imaginable gain in
 efficiency with regard to space would be minimal, and the cost in
 time is ridiculous.
 
 Here is an example of one directory:
 
 marlin:/export/herbarium/mellon/Masstypes_Scans_Server/ACANTHACEAE# ls -sl
 
 total 4072318
 410608 -rw-rw   1 ariehtal herbarum 210246048 Dec 10 11:04 AC00312847.tif
 402936 -rw-rw   1 ariehtal herbarum 206423224 Dec  5 16:09 AC00312848.tif

Well, unless the length of the file is an exact multiple of the block
size, you'll normally find that the figures will be slightly
different... but the allocated space is always larger for non-sparse
files.

In your case, though, it's slightly smaller -- which is why you are
having this problem 

410608 * 512 = 210231296, 14752 less than 210246048
402936 * 512 = 206303232, 119992 less than 206423224
etc.


However, when tar puts the files into the archive, it has it's own
blocking factor, and it would seem that the space savings from the
sparseness in your files is so small that it's lost within that blocking
factor.  So yes, you are definitely in a lots-of-pain-and-no-gain
situation :(

Do you know how these TIF files are getting written onto your system?
You could avoid this problem if you were able to that process altered so
that it didn't create sparse files...

If the files are static, you could consider doing a pass through to
un-sparsify them somehow.  For example, doing a simple cp seems to be
produce normal files:

$ uname -a
SunOS myhost 5.9 Generic_122300-66 sun4u sparc SUNW,Netra-210
$ which cp
/usr/bin/cp
$ mkdir test1
$ echo hi | dd of=test1/t.t seek=1
0+1 records in
0+1 records out
$ cp -Rp test1 test2 
$ ls -ls test1 test2
test1:
total 48
   48 -rw-r-   1 x474712  other5120003 Apr 12 18:05 t.t

test2:
total 10032
10032 -rw-r-   1 x474712  other5120003 Apr 12 18:05 t.t


(Note that the copy of the file found in test2/ is fully allocated.)



However, it sounds like in your particular situation the workaround of
using a amgtar with --sparse turned off might be good enough (given
that it's actually okay for the backup to ignore the fact the original
files are sparse).



Nathan




Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: all estimate timed out

2013-04-05 Thread Jean-Louis Martineau

On 04/05/2013 12:09 PM, Chris Hoogendyk wrote:
OK, folks, it is the --sparse option that Amanda is putting on the 
gtar. This is /usr/sfw/bin/tar version 1.23 on Solaris 10. I have a 
test script that runs the runtar and a test directory with just 10 of 
the tif files in it.


Without the --sparse option, time tells me that it takes 0m0.57s to 
run the script.


With the --sparse option, time tells me that it takes 3m14.91s to 
run the script.


Scale that from 10 to 1300 tif files, and I have serious issues.

Now what? Can I tell Amanda not to do that? What difference will it 
make? Is this a bug in gtar?


Use the amgtar application instead of the GNUTAR program, it allow to 
disable the sparse option.


tar can't know where are the holes, it must read them.

You WANT the sparse option, otherwise your backup will be large because 
tar fill the holes with 0.


Your best option is to use the calcsize or server estimate.

Jean-Louis




Re: all estimate timed out

2013-04-05 Thread Brian Cuttler

Chris,

I don't know what tif files look like internally, don't know how
they compress.

Just of out left field... does your zpool have compression
enabled? I realized zfs will compress or not on a per block
basis, but I don't know what if any overhead is being incurred,
if the tif files are not compressed then there should be no
additional overhead to decompress them on read.

I would also probably hesitate to enable compression of a zfs
file system that was used for amanda work area, since you are
storing data that has already been zip'd. Though this also has
no impact on the estimate phase.

Our site has tended to gzip --fast, rather than --best, and have
on a few our our amanda servers moved to pigz. Again, potential
amdump issues but not amcheck issues.

Sanity check, the zpool itself is healthy? The drives are all of
the same architecture and spindle speeds?

good luck,

Brian


On Fri, Apr 05, 2013 at 11:09:16AM -0400, Chris Hoogendyk wrote:
 Thank you!
 
 Not sure why the debug file would list runtar in the form of a parameter, 
 when it's not to be used as such. Anyway, that got it working.
 
 Which brings me back to my original problem. As indicated previously, the 
 filesystem in question only has 2806 files and 140 directories. As I watch 
 the runtar in verbose mode, when it hits the tif files, it is taking 20 
 seconds on each tif file. The tif files are scans of herbarium type 
 specimens and are pretty uniformly 200MB each. If I do a find on all the 
 tif files, piped to `wc -l`, there are 1300 of them. Times 20 seconds each 
 gives me the 26000 seconds that shows up in the sendsize debug file for 
 this filesystem.
 
 So, why would these tif files only be going by at 10MB/s into /dev/null? No 
 compression involved. My (real) tapes run much faster than that. I also 
 pointed out that I have more than a dozen other filesystems on the same 
 zpool that are giving me no trouble (five 2TB drives in a raidz1 on a J4500 
 with multipath SAS).
 
 Any ideas how to speed that up?
 
 I think I may start out by breaking them down into sub DLE's. There are 129 
 directories corresponding to taxonomic families.
 
 
 On 4/4/13 8:05 PM, Jean-Louis Martineau wrote:
 On 04/04/2013 02:48 PM, Chris Hoogendyk wrote:
 I may just quietly go nuts. I'm trying to run the command directly. In 
 the debug file, one example is:
 
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning 
 /usr/local/libexec/amanda/runtar runtar daily 
 /usr/local/etc/amanda/tools/gtar --create --file /dev/null 
 --numeric-owner --directory /export/herbarium --one-file-system 
 --listed-incremental 
 /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new 
 --sparse --ignore-failed-read --totals . in pipeline
 
 So, I created a script working off that and adding verbose:
 
#!/bin/ksh
 
OPTIONS= --create --file /dev/null --numeric-owner --directory 
/export/herbarium
--one-file-system --listed-incremental;
OPTIONS=${OPTIONS} 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new 
--sparse
--ignore-failed-read --totals --verbose .;
 
COMMAND=/usr/local/libexec/amanda/runtar runtar daily 
/usr/local/etc/amanda/tools/gtar ${OPTIONS};
#COMMAND=/usr/sfw/bin/gtar ${OPTIONS};
 
 remove the 'runtar' argument
 
 
exec ${COMMAND};
 
 
 If I run that as user amanda, I get:
 
runtar: Can only be used to create tar archives
 
 
 If I exchange the two commands so that I'm using gtar directly rather 
 than runtar, then I get:
 
/usr/sfw/bin/gtar: Cowardly refusing to create an empty archive
Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more
information.
 
 -- 
 ---
 
 Chris Hoogendyk
 
 -
O__   Systems Administrator
   c/ /'_ --- Biology  Geology Departments
  (*) \(*) -- 140 Morrill Science Center
 ~~ - University of Massachusetts, Amherst
 
 hoogen...@bio.umass.edu
 
 ---
 
 Erdös 4
 
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



Re: all estimate timed out

2013-04-05 Thread Chris Hoogendyk
OK, folks, it is the --sparse option that Amanda is putting on the gtar. This is /usr/sfw/bin/tar 
version 1.23 on Solaris 10. I have a test script that runs the runtar and a test directory with just 
10 of the tif files in it.


Without the --sparse option, time tells me that it takes 0m0.57s to run the 
script.

With the --sparse option, time tells me that it takes 3m14.91s to run the 
script.

Scale that from 10 to 1300 tif files, and I have serious issues.

Now what? Can I tell Amanda not to do that? What difference will it make? Is 
this a bug in gtar?



On 4/5/13 11:09 AM, Chris Hoogendyk wrote:

Thank you!

Not sure why the debug file would list runtar in the form of a parameter, when it's not to be used 
as such. Anyway, that got it working.


Which brings me back to my original problem. As indicated previously, the filesystem in question 
only has 2806 files and 140 directories. As I watch the runtar in verbose mode, when it hits the 
tif files, it is taking 20 seconds on each tif file. The tif files are scans of herbarium type 
specimens and are pretty uniformly 200MB each. If I do a find on all the tif files, piped to `wc 
-l`, there are 1300 of them. Times 20 seconds each gives me the 26000 seconds that shows up in the 
sendsize debug file for this filesystem.


So, why would these tif files only be going by at 10MB/s into /dev/null? No compression involved. 
My (real) tapes run much faster than that. I also pointed out that I have more than a dozen other 
filesystems on the same zpool that are giving me no trouble (five 2TB drives in a raidz1 on a 
J4500 with multipath SAS).


Any ideas how to speed that up?

I think I may start out by breaking them down into sub DLE's. There are 129 directories 
corresponding to taxonomic families.



On 4/4/13 8:05 PM, Jean-Louis Martineau wrote:

On 04/04/2013 02:48 PM, Chris Hoogendyk wrote:
I may just quietly go nuts. I'm trying to run the command directly. In the debug file, one 
example is:


Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning /usr/local/libexec/amanda/runtar runtar 
daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner --directory 
/export/herbarium --one-file-system --listed-incremental 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse 
--ignore-failed-read --totals . in pipeline


So, I created a script working off that and adding verbose:

   #!/bin/ksh

   OPTIONS= --create --file /dev/null --numeric-owner --directory 
/export/herbarium
   --one-file-system --listed-incremental;
   OPTIONS=${OPTIONS} 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse
   --ignore-failed-read --totals --verbose .;

   COMMAND=/usr/local/libexec/amanda/runtar runtar daily /usr/local/etc/amanda/tools/gtar 
${OPTIONS};

   #COMMAND=/usr/sfw/bin/gtar ${OPTIONS};


remove the 'runtar' argument



   exec ${COMMAND};


If I run that as user amanda, I get:

   runtar: Can only be used to create tar archives


If I exchange the two commands so that I'm using gtar directly rather than 
runtar, then I get:

   /usr/sfw/bin/gtar: Cowardly refusing to create an empty archive
   Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more
   information.




--
---

Chris Hoogendyk

-
   O__   Systems Administrator
  c/ /'_ --- Biology  Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst

hoogen...@bio.umass.edu

---

Erdös 4



Re: all estimate timed out

2013-04-05 Thread Chris Hoogendyk

Thank you!

Not sure why the debug file would list runtar in the form of a parameter, when it's not to be used 
as such. Anyway, that got it working.


Which brings me back to my original problem. As indicated previously, the filesystem in question 
only has 2806 files and 140 directories. As I watch the runtar in verbose mode, when it hits the tif 
files, it is taking 20 seconds on each tif file. The tif files are scans of herbarium type specimens 
and are pretty uniformly 200MB each. If I do a find on all the tif files, piped to `wc -l`, there 
are 1300 of them. Times 20 seconds each gives me the 26000 seconds that shows up in the sendsize 
debug file for this filesystem.


So, why would these tif files only be going by at 10MB/s into /dev/null? No compression involved. My 
(real) tapes run much faster than that. I also pointed out that I have more than a dozen other 
filesystems on the same zpool that are giving me no trouble (five 2TB drives in a raidz1 on a J4500 
with multipath SAS).


Any ideas how to speed that up?

I think I may start out by breaking them down into sub DLE's. There are 129 directories 
corresponding to taxonomic families.



On 4/4/13 8:05 PM, Jean-Louis Martineau wrote:

On 04/04/2013 02:48 PM, Chris Hoogendyk wrote:
I may just quietly go nuts. I'm trying to run the command directly. In the debug file, one 
example is:


Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning /usr/local/libexec/amanda/runtar runtar 
daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner --directory 
/export/herbarium --one-file-system --listed-incremental 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse --ignore-failed-read 
--totals . in pipeline


So, I created a script working off that and adding verbose:

   #!/bin/ksh

   OPTIONS= --create --file /dev/null --numeric-owner --directory 
/export/herbarium
   --one-file-system --listed-incremental;
   OPTIONS=${OPTIONS} 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse
   --ignore-failed-read --totals --verbose .;

   COMMAND=/usr/local/libexec/amanda/runtar runtar daily /usr/local/etc/amanda/tools/gtar 
${OPTIONS};

   #COMMAND=/usr/sfw/bin/gtar ${OPTIONS};


remove the 'runtar' argument



   exec ${COMMAND};


If I run that as user amanda, I get:

   runtar: Can only be used to create tar archives


If I exchange the two commands so that I'm using gtar directly rather than 
runtar, then I get:

   /usr/sfw/bin/gtar: Cowardly refusing to create an empty archive
   Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more
   information.


--
---

Chris Hoogendyk

-
   O__   Systems Administrator
  c/ /'_ --- Biology  Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst

hoogen...@bio.umass.edu

---

Erdös 4



Re: all estimate timed out

2013-04-04 Thread Brian Cuttler

Chris,

sorry for the email trouble, this is a new phenomenon and I
don't know what is causing it, if you can identify the bad
header please let me know. We updated our mailhost a few months
ago, but my MUA (mutt) has not changed nor has my editor (emacs).

My large directories are exceptions, even here, and I am educating
the users to do things differently. However I do have lots of files
on zfs in general...

I don't believe that gzip is used in the estimate phase, I think
that it produces raw dump size for dump scheduling and that tape
allocation is left for later in the process. If gzip is used you
should see it in # ps, or top (or prstat), you could always  start
a dump after disabling estimate and see if that phase runs any better.
Since you can be sure of finishing estimate phase by checking
# amstatus, you can always abort the dump if you don't want a
non-compressed backup. (Jean-Louis will know off-hand)

How does the dump phase perform?


On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote:
 For some reason, the headers in the particular message from the list (from 
 Brian) are causing my mail client or something to completely strip the 
 message so that it is blank when I reply. That is, I compose a message, it 
 looks good, and I send it. But then I get a blank bcc, brian gets a blank 
 message, and the list gets a blank message. Weird. So I'm replying to 
 Christoph Scheeder's message and pasting in the contents for replying to 
 Brian. That will put the list thread somewhat out of order, but better than 
 completely disconnecting from the thread. Here goes (for the third time):
 
 ---
 
 So, Brian, this is the puzzle. Your file systems have a reason for being 
 difficult. They have several hundred thousand files PER directory.
 
 The filesystem that is causing me trouble, as I indicated, only has 2806 
 total files and 140 total directories. That's basically nothing.
 
 So, is this gzip choking on tif files? Is gzip even involved when sending 
 estimates? If I remove compression will it fix this? I could break it up 
 into multiple DLE's, but Amanda will still need estimates of all the pieces.
 
 Or is it something entirely different? And, if so, how should I go about 
 looking for it?
 
 
 
 On 4/3/13 1:14 PM, Brian Cuttler wrote:
 Chris,
 
 for larger file systems I've moved to server estimate, less
 accurate but takes the entire estimate phase out of the equation.
 
 We have had a lot of success with pig zip rather than regular
 gzip, is it'll take advantage of the mutiple CPUs and give
 parallelization during compression, which is often our bottleneck
 during actual dumping. In one system I cut DLE dump time from
 13 to 8 hours, a huge savings (I think those where the numbers,
 I can look them up...).
 
 ZFS will allow unlimited capacity, and enough files per directory
 to choke access, we have backups that run very badly here, with
 litterally several hundred thousand files PER directory, and
 multiple such directories.
 
 For backups themselves, I do use snapshots where I can on my
 ZFS file systems.
 
 On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:
 This seems like an obvious read the FAQ situation, but . . .
 
 I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 jbod
 disk array with multipath SAS. It all should be fast and is on the local
 server, so there isn't any network path outside localhost for the DLE's
 that are giving me trouble. They are zfs on raidz1 with five 2TB drives.
 Gnutar is v1.23. This server is successfully backing up several other
 servers as well as many more DLE's on the localhost. Output to an AIT5 
 tape
 library.
 
 I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem
 outrageously long (jumped from the default 5 minutes to 30 minutes, and
 from the default 30 minutes to an hour).
 
 The filesystem (DLE) that is giving me trouble (hasn't backed up in a
 couple of weeks) is /export/herbarium, which looks like:
 
 marlin:/export/herbarium# df -k .
 Filesystemkbytesused   avail capacity  Mounted on
 J4500-pool1/herbarium
   2040109465 262907572 177720189313%
   /export/herbarium
 marlin:/export/herbarium# find . -type f | wc -l
  2806
 marlin:/export/herbarium# find . -type d | wc -l
   140
 marlin:/export/herbarium#
 
 
 So, it is only 262G and only has 2806 files. Shouldn't be that big a deal.
 They are typically tif scans.
 
 One thought that hits me is: possibly, because it is over 200G of tif
 scans, compression is causing trouble? But this is just getting estimates,
 output going to /dev/null.
 
 Here is a segment from the very end of the sendsize debug file from April 
 1
 (the debug file ends after these lines):
 
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for
 /export/herbarium level 0: 

Re: all estimate timed out

2013-04-04 Thread Chris Hoogendyk
Still getting blank emails on a test reply (just to myself) to Brian's emails. So, I'm replying to 
my own email to the list and then pasting in the reply to Brian. It's clearly a weirdness in the 
headers coming from Brian, but it could also be some misbehavior in response to those by my mail 
client -- Thunderbird 17.0.5.


I changed the dump type to not use compression. If tif files are not going to compress anyway, then 
I might as well not even ask Amanda to try. However, it never gets to the dump, because it gets all 
estimate timed out.


I will try breaking it into multiple DLE's and also changing it to server estimate. But, until I 
know what is really causing the problem, I'm not optimistic about the possibility of a successful dump.


As I said, everything else runs without trouble, including DLE's that are different zfs filesystems 
on the same zpool.



On 4/4/13 9:39 AM, Brian Cuttler wrote:

Chris,

sorry for the email trouble, this is a new phenomenon and I
don't know what is causing it, if you can identify the bad
header please let me know. We updated our mailhost a few months
ago, but my MUA (mutt) has not changed nor has my editor (emacs).

My large directories are exceptions, even here, and I am educating
the users to do things differently. However I do have lots of files
on zfs in general...

I don't believe that gzip is used in the estimate phase, I think
that it produces raw dump size for dump scheduling and that tape
allocation is left for later in the process. If gzip is used you
should see it in # ps, or top (or prstat), you could always  start
a dump after disabling estimate and see if that phase runs any better.
Since you can be sure of finishing estimate phase by checking
# amstatus, you can always abort the dump if you don't want a
non-compressed backup. (Jean-Louis will know off-hand)

How does the dump phase perform?


On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote:

For some reason, the headers in the particular message from the list (from
Brian) are causing my mail client or something to completely strip the
message so that it is blank when I reply. That is, I compose a message, it
looks good, and I send it. But then I get a blank bcc, brian gets a blank
message, and the list gets a blank message. Weird. So I'm replying to
Christoph Scheeder's message and pasting in the contents for replying to
Brian. That will put the list thread somewhat out of order, but better than
completely disconnecting from the thread. Here goes (for the third time):

---

So, Brian, this is the puzzle. Your file systems have a reason for being
difficult. They have several hundred thousand files PER directory.

The filesystem that is causing me trouble, as I indicated, only has 2806
total files and 140 total directories. That's basically nothing.

So, is this gzip choking on tif files? Is gzip even involved when sending
estimates? If I remove compression will it fix this? I could break it up
into multiple DLE's, but Amanda will still need estimates of all the pieces.

Or is it something entirely different? And, if so, how should I go about
looking for it?



On 4/3/13 1:14 PM, Brian Cuttler wrote:

Chris,

for larger file systems I've moved to server estimate, less
accurate but takes the entire estimate phase out of the equation.

We have had a lot of success with pig zip rather than regular
gzip, is it'll take advantage of the mutiple CPUs and give
parallelization during compression, which is often our bottleneck
during actual dumping. In one system I cut DLE dump time from
13 to 8 hours, a huge savings (I think those where the numbers,
I can look them up...).

ZFS will allow unlimited capacity, and enough files per directory
to choke access, we have backups that run very badly here, with
litterally several hundred thousand files PER directory, and
multiple such directories.

For backups themselves, I do use snapshots where I can on my
ZFS file systems.

On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:

This seems like an obvious read the FAQ situation, but . . .

I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 jbod
disk array with multipath SAS. It all should be fast and is on the local
server, so there isn't any network path outside localhost for the DLE's
that are giving me trouble. They are zfs on raidz1 with five 2TB drives.
Gnutar is v1.23. This server is successfully backing up several other
servers as well as many more DLE's on the localhost. Output to an AIT5
tape
library.

I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem
outrageously long (jumped from the default 5 minutes to 30 minutes, and

from the default 30 minutes to an hour).

The filesystem (DLE) that is giving me trouble (hasn't backed up in a
couple of weeks) is /export/herbarium, which looks like:

marlin:/export/herbarium# df -k .
Filesystemkbytesused   avail capacity  Mounted on
J4500-pool1/herbarium

Re: all estimate timed out

2013-04-04 Thread Brian Cuttler

Reply using thunderbird rather than mutt.

Any way to vet the zfs file system? Make sure its sane and doesn't 
contain some kind of a bad

link causing a loop?

If you where to run the command used by estimate, which I believe 
displays in the debug file,
can you run that successfully on the command line? If you run it 
verbose, can you see where

its hangs or where it slows down?

On 4/4/2013 12:34 PM, Chris Hoogendyk wrote:
Still getting blank emails on a test reply (just to myself) to Brian's 
emails. So, I'm replying to my own email to the list and then pasting 
in the reply to Brian. It's clearly a weirdness in the headers coming 
from Brian, but it could also be some misbehavior in response to those 
by my mail client -- Thunderbird 17.0.5.


I changed the dump type to not use compression. If tif files are not 
going to compress anyway, then I might as well not even ask Amanda to 
try. However, it never gets to the dump, because it gets all estimate 
timed out.


I will try breaking it into multiple DLE's and also changing it to 
server estimate. But, until I know what is really causing the 
problem, I'm not optimistic about the possibility of a successful dump.


As I said, everything else runs without trouble, including DLE's that 
are different zfs filesystems on the same zpool.



On 4/4/13 9:39 AM, Brian Cuttler wrote:

Chris,

sorry for the email trouble, this is a new phenomenon and I
don't know what is causing it, if you can identify the bad
header please let me know. We updated our mailhost a few months
ago, but my MUA (mutt) has not changed nor has my editor (emacs).

My large directories are exceptions, even here, and I am educating
the users to do things differently. However I do have lots of files
on zfs in general...

I don't believe that gzip is used in the estimate phase, I think
that it produces raw dump size for dump scheduling and that tape
allocation is left for later in the process. If gzip is used you
should see it in # ps, or top (or prstat), you could always  start
a dump after disabling estimate and see if that phase runs any better.
Since you can be sure of finishing estimate phase by checking
# amstatus, you can always abort the dump if you don't want a
non-compressed backup. (Jean-Louis will know off-hand)

How does the dump phase perform?


On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote:
For some reason, the headers in the particular message from the list 
(from

Brian) are causing my mail client or something to completely strip the
message so that it is blank when I reply. That is, I compose a 
message, it
looks good, and I send it. But then I get a blank bcc, brian gets a 
blank

message, and the list gets a blank message. Weird. So I'm replying to
Christoph Scheeder's message and pasting in the contents for 
replying to
Brian. That will put the list thread somewhat out of order, but 
better than
completely disconnecting from the thread. Here goes (for the third 
time):


---

So, Brian, this is the puzzle. Your file systems have a reason for 
being

difficult. They have several hundred thousand files PER directory.

The filesystem that is causing me trouble, as I indicated, only has 
2806

total files and 140 total directories. That's basically nothing.

So, is this gzip choking on tif files? Is gzip even involved when 
sending
estimates? If I remove compression will it fix this? I could break 
it up
into multiple DLE's, but Amanda will still need estimates of all the 
pieces.


Or is it something entirely different? And, if so, how should I go 
about

looking for it?



On 4/3/13 1:14 PM, Brian Cuttler wrote:

Chris,

for larger file systems I've moved to server estimate, less
accurate but takes the entire estimate phase out of the equation.

We have had a lot of success with pig zip rather than regular
gzip, is it'll take advantage of the mutiple CPUs and give
parallelization during compression, which is often our bottleneck
during actual dumping. In one system I cut DLE dump time from
13 to 8 hours, a huge savings (I think those where the numbers,
I can look them up...).

ZFS will allow unlimited capacity, and enough files per directory
to choke access, we have backups that run very badly here, with
litterally several hundred thousand files PER directory, and
multiple such directories.

For backups themselves, I do use snapshots where I can on my
ZFS file systems.

On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:

This seems like an obvious read the FAQ situation, but . . .

I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a 
J4500 jbod
disk array with multipath SAS. It all should be fast and is on the 
local
server, so there isn't any network path outside localhost for the 
DLE's
that are giving me trouble. They are zfs on raidz1 with five 2TB 
drives.

Gnutar is v1.23. This server is successfully backing up several other
servers as well as many more DLE's on the localhost. Output to an 
AIT5

tape
library.


Re: all estimate timed out

2013-04-04 Thread Chris Hoogendyk

I may just quietly go nuts. I'm trying to run the command directly. In the 
debug file, one example is:

Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning /usr/local/libexec/amanda/runtar runtar 
daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner --directory 
/export/herbarium --one-file-system --listed-incremental 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse --ignore-failed-read 
--totals . in pipeline


So, I created a script working off that and adding verbose:

   #!/bin/ksh

   OPTIONS= --create --file /dev/null --numeric-owner --directory 
/export/herbarium
   --one-file-system --listed-incremental;
   OPTIONS=${OPTIONS} 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse
   --ignore-failed-read --totals --verbose .;

   COMMAND=/usr/local/libexec/amanda/runtar runtar daily 
/usr/local/etc/amanda/tools/gtar ${OPTIONS};
   #COMMAND=/usr/sfw/bin/gtar ${OPTIONS};

   exec ${COMMAND};


If I run that as user amanda, I get:

   runtar: Can only be used to create tar archives


If I exchange the two commands so that I'm using gtar directly rather than 
runtar, then I get:

   /usr/sfw/bin/gtar: Cowardly refusing to create an empty archive
   Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more
   information.



On 4/4/13 1:22 PM, Brian Cuttler wrote:

Reply using thunderbird rather than mutt.

Any way to vet the zfs file system? Make sure its sane and doesn't contain some 
kind of a bad
link causing a loop?

If you where to run the command used by estimate, which I believe displays in 
the debug file,
can you run that successfully on the command line? If you run it verbose, can 
you see where
its hangs or where it slows down?

On 4/4/2013 12:34 PM, Chris Hoogendyk wrote:
Still getting blank emails on a test reply (just to myself) to Brian's emails. So, I'm replying 
to my own email to the list and then pasting in the reply to Brian. It's clearly a weirdness in 
the headers coming from Brian, but it could also be some misbehavior in response to those by my 
mail client -- Thunderbird 17.0.5.


I changed the dump type to not use compression. If tif files are not going to compress anyway, 
then I might as well not even ask Amanda to try. However, it never gets to the dump, because it 
gets all estimate timed out.


I will try breaking it into multiple DLE's and also changing it to server estimate. But, until 
I know what is really causing the problem, I'm not optimistic about the possibility of a 
successful dump.


As I said, everything else runs without trouble, including DLE's that are different zfs 
filesystems on the same zpool.



On 4/4/13 9:39 AM, Brian Cuttler wrote:

Chris,

sorry for the email trouble, this is a new phenomenon and I
don't know what is causing it, if you can identify the bad
header please let me know. We updated our mailhost a few months
ago, but my MUA (mutt) has not changed nor has my editor (emacs).

My large directories are exceptions, even here, and I am educating
the users to do things differently. However I do have lots of files
on zfs in general...

I don't believe that gzip is used in the estimate phase, I think
that it produces raw dump size for dump scheduling and that tape
allocation is left for later in the process. If gzip is used you
should see it in # ps, or top (or prstat), you could always start
a dump after disabling estimate and see if that phase runs any better.
Since you can be sure of finishing estimate phase by checking
# amstatus, you can always abort the dump if you don't want a
non-compressed backup. (Jean-Louis will know off-hand)

How does the dump phase perform?


On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote:

For some reason, the headers in the particular message from the list (from
Brian) are causing my mail client or something to completely strip the
message so that it is blank when I reply. That is, I compose a message, it
looks good, and I send it. But then I get a blank bcc, brian gets a blank
message, and the list gets a blank message. Weird. So I'm replying to
Christoph Scheeder's message and pasting in the contents for replying to
Brian. That will put the list thread somewhat out of order, but better than
completely disconnecting from the thread. Here goes (for the third time):

---

So, Brian, this is the puzzle. Your file systems have a reason for being
difficult. They have several hundred thousand files PER directory.

The filesystem that is causing me trouble, as I indicated, only has 2806
total files and 140 total directories. That's basically nothing.

So, is this gzip choking on tif files? Is gzip even involved when sending
estimates? If I remove compression will it fix this? I could break it up
into multiple DLE's, but Amanda will still need estimates of all the pieces.

Or is it something entirely different? And, if so, how should I go about
looking for it?



On 4/3/13 1:14 PM, Brian 

Re: all estimate timed out

2013-04-04 Thread Jean-Louis Martineau

On 04/04/2013 02:48 PM, Chris Hoogendyk wrote:
I may just quietly go nuts. I'm trying to run the command directly. In 
the debug file, one example is:


Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning 
/usr/local/libexec/amanda/runtar runtar daily 
/usr/local/etc/amanda/tools/gtar --create --file /dev/null 
--numeric-owner --directory /export/herbarium --one-file-system 
--listed-incremental 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new 
--sparse --ignore-failed-read --totals . in pipeline


So, I created a script working off that and adding verbose:

   #!/bin/ksh

   OPTIONS= --create --file /dev/null --numeric-owner --directory 
/export/herbarium

   --one-file-system --listed-incremental;
   OPTIONS=${OPTIONS} 
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new 
--sparse

   --ignore-failed-read --totals --verbose .;

   COMMAND=/usr/local/libexec/amanda/runtar runtar daily 
/usr/local/etc/amanda/tools/gtar ${OPTIONS};

   #COMMAND=/usr/sfw/bin/gtar ${OPTIONS};


remove the 'runtar' argument



   exec ${COMMAND};


If I run that as user amanda, I get:

   runtar: Can only be used to create tar archives


If I exchange the two commands so that I'm using gtar directly rather 
than runtar, then I get:


   /usr/sfw/bin/gtar: Cowardly refusing to create an empty archive
   Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more
   information.






Re: all estimate timed out

2013-04-04 Thread Nathan Stratton Treadway
On Thu, Apr 04, 2013 at 17:48:46 -0400, Chris Hoogendyk wrote:
 If I exchange the two commands so that I'm using gtar directly rather
 than runtar, then I get:
 
/usr/sfw/bin/gtar: Cowardly refusing to create an empty archive
Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more
information.

I can't see why this is happening off hand, but generally that means
that either the trailing . is missing from the command that was
actually executed, or that argument getting eaten by some other
option.  You might try printing out out ${COMMAND} immediately before
running it, just to make sure nothing obvious is missing that way.

(Also, any particular reason you are using exec here?  I don't know
why it would be eating the . under ksh, but you might try without that
and see if the problem goes away.)

Worst case, try adding the name of a file found in your
/export/herbarium directory after the . and see if that at least
allows gtar to run.

Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: all estimate timed out

2013-04-04 Thread Nathan Stratton Treadway
On Thu, Apr 04, 2013 at 17:48:46 -0400, Chris Hoogendyk wrote:
 So, I created a script working off that and adding verbose:
 
#!/bin/ksh
 
OPTIONS= --create --file /dev/null --numeric-owner --directory 
 /export/herbarium
--one-file-system --listed-incremental;
OPTIONS=${OPTIONS} 
 /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse
--ignore-failed-read --totals --verbose .;
 
COMMAND=/usr/local/libexec/amanda/runtar runtar daily 
 /usr/local/etc/amanda/tools/gtar ${OPTIONS};
#COMMAND=/usr/sfw/bin/gtar ${OPTIONS};
 
exec ${COMMAND};
 
 
 If I run that as user amanda, I get:
 
runtar: Can only be used to create tar archives

(Personally I'd do my initial investigation using gtar directly, but I see
that runtar prints that error message when it finds that argv[3] isn't
--create, and also that it expects argv[1] to be the config name.  So I
think it would work if you just left out the standalone runtar  from
the command:

 COMMAND=/usr/local/libexec/amanda/runtar daily 
/usr/local/etc/amanda/tools/gtar ${OPTIONS}
)

Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: all estimate timed out

2013-04-03 Thread C.Scheeder

Hi Chris,

Am 03.04.2013 17:26, schrieb Chris Hoogendyk:

This seems like an obvious read the FAQ situation, but . . .

I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 jbod disk 
array with multipath
SAS. It all should be fast and is on the local server, so there isn't any 
network path outside
localhost for the DLE's that are giving me trouble. They are zfs on raidz1 with 
five 2TB drives.
Gnutar is v1.23. This server is successfully backing up several other servers 
as well as many more
DLE's on the localhost. Output to an AIT5 tape library.

I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem 
outrageously long (jumped
from the default 5 minutes to 30 minutes, and from the default 30 minutes to an 
hour).

The filesystem (DLE) that is giving me trouble (hasn't backed up in a couple of 
weeks) is
/export/herbarium, which looks like:

marlin:/export/herbarium# df -k .
Filesystemkbytesused   avail capacity  Mounted on
J4500-pool1/herbarium
  2040109465 262907572 177720189313% 
/export/herbarium
marlin:/export/herbarium# find . -type f | wc -l
 2806
marlin:/export/herbarium# find . -type d | wc -l
  140
marlin:/export/herbarium#


So, it is only 262G and only has 2806 files. Shouldn't be that big a deal. They 
are typically tif
scans.

One thought that hits me is: possibly, because it is over 200G of tif scans, 
compression is causing
trouble? But this is just getting estimates, output going to /dev/null.

Here is a segment from the very end of the sendsize debug file from April 1 
(the debug file ends
after these lines):

Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for 
/export/herbarium level 0: 26302.500


Nice, it took 7 hours 18 Minutes and 22 Seconds to get the level-0 estimate.


Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate size for 
/export/herbarium level 0:
262993150 KB
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: waiting for runtar 
/export/herbarium child
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: after runtar /export/herbarium 
wait
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: getting size via gnutar for 
/export/herbarium level 1
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning 
/usr/local/libexec/amanda/runtar runtar
daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null 
--numeric-owner --directory
/export/herbarium --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse 
--ignore-failed-read
--totals . in pipeline
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: Total bytes written: 77663795200 
(73GiB, 9.5MiB/s)
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: .
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate time for 
/export/herbarium level 1: 7827.571
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate size for 
/export/herbarium level 1: 75843550 KB
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: waiting for runtar 
/export/herbarium child
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: after runtar /export/herbarium 
wait
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: done with amname 
/export/herbarium dirname
/export/herbarium spindle 45002


and aditional it took 2 hours 11 minutes getting the level-1 estimate.

in sum it took about 9 and a half hour to get the estimates

so your etimeout of 30 minutes is a little bit low for this machine, isn't it?

you should consider using another method of getting estimates for that machine,
or you should find out what makes the estimates on that machine so slow,
as the backup itself will likely take longer then the estimates

Christoph



Re: all estimate timed out

2013-04-03 Thread Chris Hoogendyk


On 4/3/13 12:15 PM, C.Scheeder wrote:

Hi Chris,

Am 03.04.2013 17:26, schrieb Chris Hoogendyk:

This seems like an obvious read the FAQ situation, but . . .

I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 jbod disk 
array with multipath
SAS. It all should be fast and is on the local server, so there isn't any 
network path outside
localhost for the DLE's that are giving me trouble. They are zfs on raidz1 with 
five 2TB drives.
Gnutar is v1.23. This server is successfully backing up several other servers 
as well as many more
DLE's on the localhost. Output to an AIT5 tape library.

I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem 
outrageously long (jumped
from the default 5 minutes to 30 minutes, and from the default 30 minutes to an 
hour).

The filesystem (DLE) that is giving me trouble (hasn't backed up in a couple of 
weeks) is
/export/herbarium, which looks like:

marlin:/export/herbarium# df -k .
Filesystemkbytesused   avail capacity Mounted on
J4500-pool1/herbarium
  2040109465 262907572 177720189313% 
/export/herbarium
marlin:/export/herbarium# find . -type f | wc -l
 2806
marlin:/export/herbarium# find . -type d | wc -l
  140
marlin:/export/herbarium#


So, it is only 262G and only has 2806 files. Shouldn't be that big a deal. They 
are typically tif
scans.

One thought that hits me is: possibly, because it is over 200G of tif scans, 
compression is causing
trouble? But this is just getting estimates, output going to /dev/null.

Here is a segment from the very end of the sendsize debug file from April 1 
(the debug file ends
after these lines):

Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for /export/herbarium level 0: 
26302.500


Nice, it took 7 hours 18 Minutes and 22 Seconds to get the level-0 estimate.


Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate size for 
/export/herbarium level 0:
262993150 KB
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: waiting for runtar 
/export/herbarium child
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: after runtar /export/herbarium 
wait
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: getting size via gnutar for 
/export/herbarium level 1
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning 
/usr/local/libexec/amanda/runtar runtar
daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null 
--numeric-owner --directory
/export/herbarium --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse 
--ignore-failed-read
--totals . in pipeline
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: Total bytes written: 77663795200 
(73GiB, 9.5MiB/s)
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: .
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate time for 
/export/herbarium level 1: 7827.571
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate size for /export/herbarium level 1: 
75843550 KB

Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: waiting for runtar 
/export/herbarium child
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: after runtar /export/herbarium 
wait
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: done with amname 
/export/herbarium dirname
/export/herbarium spindle 45002


and aditional it took 2 hours 11 minutes getting the level-1 estimate.

in sum it took about 9 and a half hour to get the estimates

so your etimeout of 30 minutes is a little bit low for this machine, isn't it?

you should consider using another method of getting estimates for that machine,
or you should find out what makes the estimates on that machine so slow,
as the backup itself will likely take longer then the estimates


I should just note that when you say that machine, it is really that DLE. There are many other 
DLE's on that machine, on that disk array, and even on that same zpool, that return estimates and 
that successfully backup.



--
---

Chris Hoogendyk

-
   O__   Systems Administrator
  c/ /'_ --- Biology  Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst

hoogen...@bio.umass.edu

---

Erdös 4



Re: all estimate timed out

2013-04-03 Thread Brian Cuttler

Chris,

for larger file systems I've moved to server estimate, less
accurate but takes the entire estimate phase out of the equation.

We have had a lot of success with pig zip rather than regular
gzip, is it'll take advantage of the mutiple CPUs and give
parallelization during compression, which is often our bottleneck
during actual dumping. In one system I cut DLE dump time from
13 to 8 hours, a huge savings (I think those where the numbers,
I can look them up...).

ZFS will allow unlimited capacity, and enough files per directory
to choke access, we have backups that run very badly here, with
litterally several hundred thousand files PER directory, and
multiple such directories.

For backups themselves, I do use snapshots where I can on my
ZFS file systems.

On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:
 This seems like an obvious read the FAQ situation, but . . .
 
 I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 jbod 
 disk array with multipath SAS. It all should be fast and is on the local 
 server, so there isn't any network path outside localhost for the DLE's 
 that are giving me trouble. They are zfs on raidz1 with five 2TB drives. 
 Gnutar is v1.23. This server is successfully backing up several other 
 servers as well as many more DLE's on the localhost. Output to an AIT5 tape 
 library.
 
 I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem 
 outrageously long (jumped from the default 5 minutes to 30 minutes, and 
 from the default 30 minutes to an hour).
 
 The filesystem (DLE) that is giving me trouble (hasn't backed up in a 
 couple of weeks) is /export/herbarium, which looks like:
 
marlin:/export/herbarium# df -k .
Filesystemkbytesused   avail capacity  Mounted on
J4500-pool1/herbarium
  2040109465 262907572 177720189313% 
  /export/herbarium
marlin:/export/herbarium# find . -type f | wc -l
 2806
marlin:/export/herbarium# find . -type d | wc -l
  140
marlin:/export/herbarium#
 
 
 So, it is only 262G and only has 2806 files. Shouldn't be that big a deal. 
 They are typically tif scans.
 
 One thought that hits me is: possibly, because it is over 200G of tif 
 scans, compression is causing trouble? But this is just getting estimates, 
 output going to /dev/null.
 
 Here is a segment from the very end of the sendsize debug file from April 1 
 (the debug file ends after these lines):
 
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for 
 /export/herbarium level 0: 26302.500
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate size for 
 /export/herbarium level 0: 262993150 KB
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: waiting for runtar 
 /export/herbarium child
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: after runtar 
 /export/herbarium wait
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: getting size via gnutar for 
 /export/herbarium level 1
 Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning 
 /usr/local/libexec/amanda/runtar runtar daily 
 /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner 
 --directory /export/herbarium --one-file-system --listed-incremental 
 /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new 
 --sparse --ignore-failed-read --totals . in pipeline
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: Total bytes written: 
 77663795200 (73GiB, 9.5MiB/s)
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: .
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate time for 
 /export/herbarium level 1: 7827.571
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate size for 
 /export/herbarium level 1: 75843550 KB
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: waiting for runtar 
 /export/herbarium child
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: after runtar 
 /export/herbarium wait
 Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: done with amname 
 /export/herbarium dirname /export/herbarium spindle 45002
 
 
 -- 
 ---
 
 Chris Hoogendyk
 
 -
O__   Systems Administrator
   c/ /'_ --- Biology  Geology Departments
  (*) \(*) -- 140 Morrill Science Center
 ~~ - University of Massachusetts, Amherst
 
 hoogen...@bio.umass.edu
 
 ---
 
 Erd?s 4
 
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



Re: all estimate timed out

2013-04-03 Thread Chris Hoogendyk
For some reason, the headers in the particular message from the list (from Brian) are causing my 
mail client or something to completely strip the message so that it is blank when I reply. That is, 
I compose a message, it looks good, and I send it. But then I get a blank bcc, brian gets a blank 
message, and the list gets a blank message. Weird. So I'm replying to Christoph Scheeder's message 
and pasting in the contents for replying to Brian. That will put the list thread somewhat out of 
order, but better than completely disconnecting from the thread. Here goes (for the third time):


---

So, Brian, this is the puzzle. Your file systems have a reason for being difficult. They have 
several hundred thousand files PER directory.


The filesystem that is causing me trouble, as I indicated, only has 2806 total files and 140 total 
directories. That's basically nothing.


So, is this gzip choking on tif files? Is gzip even involved when sending estimates? If I remove 
compression will it fix this? I could break it up into multiple DLE's, but Amanda will still need 
estimates of all the pieces.


Or is it something entirely different? And, if so, how should I go about 
looking for it?



On 4/3/13 1:14 PM, Brian Cuttler wrote:

Chris,

for larger file systems I've moved to server estimate, less
accurate but takes the entire estimate phase out of the equation.

We have had a lot of success with pig zip rather than regular
gzip, is it'll take advantage of the mutiple CPUs and give
parallelization during compression, which is often our bottleneck
during actual dumping. In one system I cut DLE dump time from
13 to 8 hours, a huge savings (I think those where the numbers,
I can look them up...).

ZFS will allow unlimited capacity, and enough files per directory
to choke access, we have backups that run very badly here, with
litterally several hundred thousand files PER directory, and
multiple such directories.

For backups themselves, I do use snapshots where I can on my
ZFS file systems.

On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote:

This seems like an obvious read the FAQ situation, but . . .

I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 jbod
disk array with multipath SAS. It all should be fast and is on the local
server, so there isn't any network path outside localhost for the DLE's
that are giving me trouble. They are zfs on raidz1 with five 2TB drives.
Gnutar is v1.23. This server is successfully backing up several other
servers as well as many more DLE's on the localhost. Output to an AIT5 tape
library.

I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem
outrageously long (jumped from the default 5 minutes to 30 minutes, and
from the default 30 minutes to an hour).

The filesystem (DLE) that is giving me trouble (hasn't backed up in a
couple of weeks) is /export/herbarium, which looks like:

marlin:/export/herbarium# df -k .
Filesystemkbytesused   avail capacity  Mounted on
J4500-pool1/herbarium
  2040109465 262907572 177720189313%
  /export/herbarium
marlin:/export/herbarium# find . -type f | wc -l
 2806
marlin:/export/herbarium# find . -type d | wc -l
  140
marlin:/export/herbarium#


So, it is only 262G and only has 2806 files. Shouldn't be that big a deal.
They are typically tif scans.

One thought that hits me is: possibly, because it is over 200G of tif
scans, compression is causing trouble? But this is just getting estimates,
output going to /dev/null.

Here is a segment from the very end of the sendsize debug file from April 1
(the debug file ends after these lines):

Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: .
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate time for
/export/herbarium level 0: 26302.500
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: estimate size for
/export/herbarium level 0: 262993150 KB
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: waiting for runtar
/export/herbarium child
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: after runtar
/export/herbarium wait
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: getting size via gnutar for
/export/herbarium level 1
Mon Apr  1 08:05:49 2013: thd-32a58: sendsize: Spawning
/usr/local/libexec/amanda/runtar runtar daily
/usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner
--directory /export/herbarium --one-file-system --listed-incremental
/usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new
--sparse --ignore-failed-read --totals . in pipeline
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: Total bytes written:
77663795200 (73GiB, 9.5MiB/s)
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: .
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate time for
/export/herbarium level 1: 7827.571
Mon Apr  1 10:16:17 2013: thd-32a58: sendsize: estimate size for
/export/herbarium level 1: 75843550 KB
Mon Apr  1 

Re: all estimate timed out

2008-06-13 Thread Marc Muehlfeld

John Heim schrieb:
  marvin  /var   lev 0  FAILED [disk /var, all 
estimate timed out]
 marvin  /etc   lev 0  FAILED [disk /etc, all 
estimate timed out]
 marvin  /backup/ulam/current/mail  lev 0  FAILED [disk 
/backup/ulam/current/mail, all estimate timed out]

 planner: ERROR Request to marvin failed: timeout waiting for REP


Have you checked the amanda.conf etimeout parameter? Maybe try inceasing it.




--
Marc Muehlfeld (Leitung IT)
Zentrum fuer Humangenetik und Laboratoriumsmedizin Dr. Klein und Dr. Rost
Lochhamer Str. 29 - D-82152 Martinsried
Telefon: +49(0)89/895578-0 - Fax: +49(0)89/895578-78
http://www.medizinische-genetik.de