Re: all estimate timed out
On Thu, Apr 04, 2013 at 17:48:46 -0400, Chris Hoogendyk wrote: > So, I created a script working off that and adding verbose: > >#!/bin/ksh > >OPTIONS=" --create --file /dev/null --numeric-owner --directory > /export/herbarium >--one-file-system --listed-incremental"; >OPTIONS="${OPTIONS} > /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse >--ignore-failed-read --totals --verbose ."; > >COMMAND="/usr/local/libexec/amanda/runtar runtar daily > /usr/local/etc/amanda/tools/gtar ${OPTIONS}"; >#COMMAND="/usr/sfw/bin/gtar ${OPTIONS}"; > >exec ${COMMAND}; > > > If I run that as user amanda, I get: > >runtar: Can only be used to create tar archives (Personally I'd do my initial investigation using gtar directly, but I see that runtar prints that error message when it finds that argv[3] isn't "--create", and also that it expects argv[1] to be the config name. So I think it would work if you just left out the standalone "runtar " from the command: COMMAND="/usr/local/libexec/amanda/runtar daily /usr/local/etc/amanda/tools/gtar ${OPTIONS}" ) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: all estimate timed out
On Thu, Apr 04, 2013 at 17:48:46 -0400, Chris Hoogendyk wrote: > If I exchange the two commands so that I'm using gtar directly rather > than runtar, then I get: > >/usr/sfw/bin/gtar: Cowardly refusing to create an empty archive >Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more >information. I can't see why this is happening off hand, but generally that means that either the trailing "." is missing from the command that was actually executed, or that argument getting "eaten" by some other option. You might try printing out out ${COMMAND} immediately before running it, just to make sure nothing obvious is missing that way. (Also, any particular reason you are using "exec" here? I don't know why it would be eating the "." under ksh, but you might try without that and see if the problem goes away.) Worst case, try adding the name of a file found in your /export/herbarium directory after the "." and see if that at least allows gtar to run. Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: all estimate timed out
On 04/04/2013 02:48 PM, Chris Hoogendyk wrote: I may just quietly go nuts. I'm trying to run the command directly. In the debug file, one example is: Mon Apr 1 08:05:49 2013: thd-32a58: sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner --directory /export/herbarium --one-file-system --listed-incremental /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse --ignore-failed-read --totals ." in pipeline So, I created a script working off that and adding verbose: #!/bin/ksh OPTIONS=" --create --file /dev/null --numeric-owner --directory /export/herbarium --one-file-system --listed-incremental"; OPTIONS="${OPTIONS} /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse --ignore-failed-read --totals --verbose ."; COMMAND="/usr/local/libexec/amanda/runtar runtar daily /usr/local/etc/amanda/tools/gtar ${OPTIONS}"; #COMMAND="/usr/sfw/bin/gtar ${OPTIONS}"; remove the 'runtar' argument exec ${COMMAND}; If I run that as user amanda, I get: runtar: Can only be used to create tar archives If I exchange the two commands so that I'm using gtar directly rather than runtar, then I get: /usr/sfw/bin/gtar: Cowardly refusing to create an empty archive Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more information.
Re: all estimate timed out
I may just quietly go nuts. I'm trying to run the command directly. In the debug file, one example is: Mon Apr 1 08:05:49 2013: thd-32a58: sendsize: Spawning "/usr/local/libexec/amanda/runtar runtar daily /usr/local/etc/amanda/tools/gtar --create --file /dev/null --numeric-owner --directory /export/herbarium --one-file-system --listed-incremental /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse --ignore-failed-read --totals ." in pipeline So, I created a script working off that and adding verbose: #!/bin/ksh OPTIONS=" --create --file /dev/null --numeric-owner --directory /export/herbarium --one-file-system --listed-incremental"; OPTIONS="${OPTIONS} /usr/local/var/amanda/gnutar-lists/localhost_export_herbarium_1.new --sparse --ignore-failed-read --totals --verbose ."; COMMAND="/usr/local/libexec/amanda/runtar runtar daily /usr/local/etc/amanda/tools/gtar ${OPTIONS}"; #COMMAND="/usr/sfw/bin/gtar ${OPTIONS}"; exec ${COMMAND}; If I run that as user amanda, I get: runtar: Can only be used to create tar archives If I exchange the two commands so that I'm using gtar directly rather than runtar, then I get: /usr/sfw/bin/gtar: Cowardly refusing to create an empty archive Try `/usr/sfw/bin/gtar --help' or `/usr/sfw/bin/gtar --usage' for more information. On 4/4/13 1:22 PM, Brian Cuttler wrote: Reply using thunderbird rather than mutt. Any way to vet the zfs file system? Make sure its sane and doesn't contain some kind of a bad link causing a loop? If you where to run the command used by estimate, which I believe displays in the debug file, can you run that successfully on the command line? If you run it verbose, can you see where its hangs or where it slows down? On 4/4/2013 12:34 PM, Chris Hoogendyk wrote: Still getting blank emails on a test reply (just to myself) to Brian's emails. So, I'm replying to my own email to the list and then pasting in the reply to Brian. It's clearly a weirdness in the headers coming from Brian, but it could also be some misbehavior in response to those by my mail client -- Thunderbird 17.0.5. I changed the dump type to not use compression. If tif files are not going to compress anyway, then I might as well not even ask Amanda to try. However, it never gets to the dump, because it gets "all estimate timed out." I will try breaking it into multiple DLE's and also changing it to "server estimate". But, until I know what is really causing the problem, I'm not optimistic about the possibility of a successful dump. As I said, everything else runs without trouble, including DLE's that are different zfs filesystems on the same zpool. On 4/4/13 9:39 AM, Brian Cuttler wrote: Chris, sorry for the email trouble, this is a new phenomenon and I don't know what is causing it, if you can identify the bad header please let me know. We updated our mailhost a few months ago, but my MUA (mutt) has not changed nor has my editor (emacs). My "large" directories are exceptions, even here, and I am educating the users to do things differently. However I do have lots of files on zfs in general... I don't believe that gzip is used in the estimate phase, I think that it produces "raw" dump size for dump scheduling and that tape allocation is left for later in the process. If gzip is used you should see it in # ps, or top (or prstat), you could always start a dump after disabling estimate and see if that phase runs any better. Since you can be sure of finishing estimate phase by checking # amstatus, you can always abort the dump if you don't want a non-compressed backup. (Jean-Louis will know off-hand) How does the dump phase perform? On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote: For some reason, the headers in the particular message from the list (from Brian) are causing my mail client or something to completely strip the message so that it is blank when I reply. That is, I compose a message, it looks good, and I send it. But then I get a blank bcc, brian gets a blank message, and the list gets a blank message. Weird. So I'm replying to Christoph Scheeder's message and pasting in the contents for replying to Brian. That will put the list thread somewhat out of order, but better than completely disconnecting from the thread. Here goes (for the third time): --- So, Brian, this is the puzzle. Your file systems have a reason for being difficult. They have "several hundred thousand files PER directory." The filesystem that is causing me trouble, as I indicated, only has 2806 total files and 140 total directories. That's basically nothing. So, is this gzip choking on tif files? Is gzip even involved when sending estimates? If I remove compression will it fix this? I could break it up into multiple DLE's, but Amanda will still need estimates of all the pieces. Or is it something entirely different? And, if so, how should I go about looking for it? On 4/3/13 1:
Re: all estimate timed out
Reply using thunderbird rather than mutt. Any way to vet the zfs file system? Make sure its sane and doesn't contain some kind of a bad link causing a loop? If you where to run the command used by estimate, which I believe displays in the debug file, can you run that successfully on the command line? If you run it verbose, can you see where its hangs or where it slows down? On 4/4/2013 12:34 PM, Chris Hoogendyk wrote: Still getting blank emails on a test reply (just to myself) to Brian's emails. So, I'm replying to my own email to the list and then pasting in the reply to Brian. It's clearly a weirdness in the headers coming from Brian, but it could also be some misbehavior in response to those by my mail client -- Thunderbird 17.0.5. I changed the dump type to not use compression. If tif files are not going to compress anyway, then I might as well not even ask Amanda to try. However, it never gets to the dump, because it gets "all estimate timed out." I will try breaking it into multiple DLE's and also changing it to "server estimate". But, until I know what is really causing the problem, I'm not optimistic about the possibility of a successful dump. As I said, everything else runs without trouble, including DLE's that are different zfs filesystems on the same zpool. On 4/4/13 9:39 AM, Brian Cuttler wrote: Chris, sorry for the email trouble, this is a new phenomenon and I don't know what is causing it, if you can identify the bad header please let me know. We updated our mailhost a few months ago, but my MUA (mutt) has not changed nor has my editor (emacs). My "large" directories are exceptions, even here, and I am educating the users to do things differently. However I do have lots of files on zfs in general... I don't believe that gzip is used in the estimate phase, I think that it produces "raw" dump size for dump scheduling and that tape allocation is left for later in the process. If gzip is used you should see it in # ps, or top (or prstat), you could always start a dump after disabling estimate and see if that phase runs any better. Since you can be sure of finishing estimate phase by checking # amstatus, you can always abort the dump if you don't want a non-compressed backup. (Jean-Louis will know off-hand) How does the dump phase perform? On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote: For some reason, the headers in the particular message from the list (from Brian) are causing my mail client or something to completely strip the message so that it is blank when I reply. That is, I compose a message, it looks good, and I send it. But then I get a blank bcc, brian gets a blank message, and the list gets a blank message. Weird. So I'm replying to Christoph Scheeder's message and pasting in the contents for replying to Brian. That will put the list thread somewhat out of order, but better than completely disconnecting from the thread. Here goes (for the third time): --- So, Brian, this is the puzzle. Your file systems have a reason for being difficult. They have "several hundred thousand files PER directory." The filesystem that is causing me trouble, as I indicated, only has 2806 total files and 140 total directories. That's basically nothing. So, is this gzip choking on tif files? Is gzip even involved when sending estimates? If I remove compression will it fix this? I could break it up into multiple DLE's, but Amanda will still need estimates of all the pieces. Or is it something entirely different? And, if so, how should I go about looking for it? On 4/3/13 1:14 PM, Brian Cuttler wrote: Chris, for larger file systems I've moved to "server estimate", less accurate but takes the entire estimate phase out of the equation. We have had a lot of success with pig zip rather than regular gzip, is it'll take advantage of the mutiple CPUs and give parallelization during compression, which is often our bottleneck during actual dumping. In one system I cut DLE dump time from 13 to 8 hours, a huge savings (I think those where the numbers, I can look them up...). ZFS will allow unlimited capacity, and enough files per directory to choke access, we have backups that run very badly here, with litterally several hundred thousand files PER directory, and multiple such directories. For backups themselves, I do use snapshots where I can on my ZFS file systems. On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote: This seems like an obvious "read the FAQ" situation, but . . . I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 "jbod" disk array with multipath SAS. It all should be fast and is on the local server, so there isn't any network path outside localhost for the DLE's that are giving me trouble. They are zfs on raidz1 with five 2TB drives. Gnutar is v1.23. This server is successfully backing up several other servers as well as many more DLE's on the localhost. Output to an AIT5
Re: all estimate timed out
Still getting blank emails on a test reply (just to myself) to Brian's emails. So, I'm replying to my own email to the list and then pasting in the reply to Brian. It's clearly a weirdness in the headers coming from Brian, but it could also be some misbehavior in response to those by my mail client -- Thunderbird 17.0.5. I changed the dump type to not use compression. If tif files are not going to compress anyway, then I might as well not even ask Amanda to try. However, it never gets to the dump, because it gets "all estimate timed out." I will try breaking it into multiple DLE's and also changing it to "server estimate". But, until I know what is really causing the problem, I'm not optimistic about the possibility of a successful dump. As I said, everything else runs without trouble, including DLE's that are different zfs filesystems on the same zpool. On 4/4/13 9:39 AM, Brian Cuttler wrote: Chris, sorry for the email trouble, this is a new phenomenon and I don't know what is causing it, if you can identify the bad header please let me know. We updated our mailhost a few months ago, but my MUA (mutt) has not changed nor has my editor (emacs). My "large" directories are exceptions, even here, and I am educating the users to do things differently. However I do have lots of files on zfs in general... I don't believe that gzip is used in the estimate phase, I think that it produces "raw" dump size for dump scheduling and that tape allocation is left for later in the process. If gzip is used you should see it in # ps, or top (or prstat), you could always start a dump after disabling estimate and see if that phase runs any better. Since you can be sure of finishing estimate phase by checking # amstatus, you can always abort the dump if you don't want a non-compressed backup. (Jean-Louis will know off-hand) How does the dump phase perform? On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote: For some reason, the headers in the particular message from the list (from Brian) are causing my mail client or something to completely strip the message so that it is blank when I reply. That is, I compose a message, it looks good, and I send it. But then I get a blank bcc, brian gets a blank message, and the list gets a blank message. Weird. So I'm replying to Christoph Scheeder's message and pasting in the contents for replying to Brian. That will put the list thread somewhat out of order, but better than completely disconnecting from the thread. Here goes (for the third time): --- So, Brian, this is the puzzle. Your file systems have a reason for being difficult. They have "several hundred thousand files PER directory." The filesystem that is causing me trouble, as I indicated, only has 2806 total files and 140 total directories. That's basically nothing. So, is this gzip choking on tif files? Is gzip even involved when sending estimates? If I remove compression will it fix this? I could break it up into multiple DLE's, but Amanda will still need estimates of all the pieces. Or is it something entirely different? And, if so, how should I go about looking for it? On 4/3/13 1:14 PM, Brian Cuttler wrote: Chris, for larger file systems I've moved to "server estimate", less accurate but takes the entire estimate phase out of the equation. We have had a lot of success with pig zip rather than regular gzip, is it'll take advantage of the mutiple CPUs and give parallelization during compression, which is often our bottleneck during actual dumping. In one system I cut DLE dump time from 13 to 8 hours, a huge savings (I think those where the numbers, I can look them up...). ZFS will allow unlimited capacity, and enough files per directory to choke access, we have backups that run very badly here, with litterally several hundred thousand files PER directory, and multiple such directories. For backups themselves, I do use snapshots where I can on my ZFS file systems. On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote: This seems like an obvious "read the FAQ" situation, but . . . I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 "jbod" disk array with multipath SAS. It all should be fast and is on the local server, so there isn't any network path outside localhost for the DLE's that are giving me trouble. They are zfs on raidz1 with five 2TB drives. Gnutar is v1.23. This server is successfully backing up several other servers as well as many more DLE's on the localhost. Output to an AIT5 tape library. I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem outrageously long (jumped from the default 5 minutes to 30 minutes, and >from the default 30 minutes to an hour). The filesystem (DLE) that is giving me trouble (hasn't backed up in a couple of weeks) is /export/herbarium, which looks like: marlin:/export/herbarium# df -k . Filesystemkbytesused avail capacity Mounted on J4500-pool1/h
Re: all estimate timed out
Chris, sorry for the email trouble, this is a new phenomenon and I don't know what is causing it, if you can identify the bad header please let me know. We updated our mailhost a few months ago, but my MUA (mutt) has not changed nor has my editor (emacs). My "large" directories are exceptions, even here, and I am educating the users to do things differently. However I do have lots of files on zfs in general... I don't believe that gzip is used in the estimate phase, I think that it produces "raw" dump size for dump scheduling and that tape allocation is left for later in the process. If gzip is used you should see it in # ps, or top (or prstat), you could always start a dump after disabling estimate and see if that phase runs any better. Since you can be sure of finishing estimate phase by checking # amstatus, you can always abort the dump if you don't want a non-compressed backup. (Jean-Louis will know off-hand) How does the dump phase perform? On Wed, Apr 03, 2013 at 05:42:12PM -0400, Chris Hoogendyk wrote: > For some reason, the headers in the particular message from the list (from > Brian) are causing my mail client or something to completely strip the > message so that it is blank when I reply. That is, I compose a message, it > looks good, and I send it. But then I get a blank bcc, brian gets a blank > message, and the list gets a blank message. Weird. So I'm replying to > Christoph Scheeder's message and pasting in the contents for replying to > Brian. That will put the list thread somewhat out of order, but better than > completely disconnecting from the thread. Here goes (for the third time): > > --- > > So, Brian, this is the puzzle. Your file systems have a reason for being > difficult. They have "several hundred thousand files PER directory." > > The filesystem that is causing me trouble, as I indicated, only has 2806 > total files and 140 total directories. That's basically nothing. > > So, is this gzip choking on tif files? Is gzip even involved when sending > estimates? If I remove compression will it fix this? I could break it up > into multiple DLE's, but Amanda will still need estimates of all the pieces. > > Or is it something entirely different? And, if so, how should I go about > looking for it? > > > > On 4/3/13 1:14 PM, Brian Cuttler wrote: > >Chris, > > > >for larger file systems I've moved to "server estimate", less > >accurate but takes the entire estimate phase out of the equation. > > > >We have had a lot of success with pig zip rather than regular > >gzip, is it'll take advantage of the mutiple CPUs and give > >parallelization during compression, which is often our bottleneck > >during actual dumping. In one system I cut DLE dump time from > >13 to 8 hours, a huge savings (I think those where the numbers, > >I can look them up...). > > > >ZFS will allow unlimited capacity, and enough files per directory > >to choke access, we have backups that run very badly here, with > >litterally several hundred thousand files PER directory, and > >multiple such directories. > > > >For backups themselves, I do use snapshots where I can on my > >ZFS file systems. > > > >On Wed, Apr 03, 2013 at 11:26:01AM -0400, Chris Hoogendyk wrote: > >>This seems like an obvious "read the FAQ" situation, but . . . > >> > >>I'm running Amanda 3.3.2 on a Sun T5220 with Solaris 10 and a J4500 "jbod" > >>disk array with multipath SAS. It all should be fast and is on the local > >>server, so there isn't any network path outside localhost for the DLE's > >>that are giving me trouble. They are zfs on raidz1 with five 2TB drives. > >>Gnutar is v1.23. This server is successfully backing up several other > >>servers as well as many more DLE's on the localhost. Output to an AIT5 > >>tape > >>library. > >> > >>I've upped the etimeout to 1800 and the dtimeout to 3600, which both seem > >>outrageously long (jumped from the default 5 minutes to 30 minutes, and > >>from the default 30 minutes to an hour). > >> > >>The filesystem (DLE) that is giving me trouble (hasn't backed up in a > >>couple of weeks) is /export/herbarium, which looks like: > >> > >>marlin:/export/herbarium# df -k . > >>Filesystemkbytesused avail capacity Mounted on > >>J4500-pool1/herbarium > >> 2040109465 262907572 177720189313% > >> /export/herbarium > >>marlin:/export/herbarium# find . -type f | wc -l > >> 2806 > >>marlin:/export/herbarium# find . -type d | wc -l > >> 140 > >>marlin:/export/herbarium# > >> > >> > >>So, it is only 262G and only has 2806 files. Shouldn't be that big a deal. > >>They are typically tif scans. > >> > >>One thought that hits me is: possibly, because it is over 200G of tif > >>scans, compression is causing trouble? But this is just getting estimates, > >>output going to /dev/null. > >> > >>Here is a segment from the very end of the sendsize debug file from April > >>1 >