Re: My 3.1.2 server starts ok, but taper never writes anything
On Mon, Sep 20, 2010 at 3:21 PM, Dustin J. Mitchell wrote: > It's impractical to run this check in ./configure, but what do you > think about running this check even on 'make all'? I committed this - on FreeBSD, this test will occur on 'make install'. > Do you mind adding a FreeBSD problem report, if there's not one > already? Can you give me a pointer so I can add myself as cc? This would still be great.. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sun, Sep 19, 2010 at 4:48 PM, Stephen Corbesero wrote: > That did it Or may have done it. Sounds like this has worked out. I had added a test to Amanda, which runs in 'make check' that should have detected this: 536 thread-check: libTests.la 537 $(PERL) -I$(builddir) -I$(builddir)/.libs -I$(srcdir) \ 538 -MAmanda::Tests -e 'alarm(10); Amanda::Tests::try_threads' \ 539 || echo "Perl cannot run extensions which use threads; consider linking perl" \ 540 "with -pthreads or compiling perl with threading enabled" I suppose we should be more aggressive about running this test, at least on FreeBSD, and at least until this is fixed or worked around in the FreeBSD ports. It's impractical to run this check in ./configure, but what do you think about running this check even on 'make all'? Do you mind adding a FreeBSD problem report, if there's not one already? Can you give me a pointer so I can add myself as cc? Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sun, Sep 19, 2010 at 11:05 PM, John Hein wrote: > (Currently, I > don't think the amanda client executes threaded code, but that may > change in the future - or may have already in 3.1/3.2 code - Dustin?). There is now a good bit of Perl running on the client side, but none of it uses threads. We haven't foresworn threads on the client, though, so I would expect that to change eventually. At some point, there were a few folks thinking about branching an older version of Amanda like 2.5.2 to create a maintained "minimal client" that could be installed quite widely, even if it didn't support fancy features like the Application API. I'm not sure what happened to that particular project, but it could be a way around problems with increasing client-side requirements. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sun, Sep 19, 2010 at 4:48 PM, Stephen Corbesero wrote: > I upgraded my FreeBSD sources to the latest 8.x stable and recompiled > the world (and kernel). I then compiled and installed Perl-5.10.1 for > my FreeBSD by hand, specifically enabling its threaded option. > Finally, I recompiled and reinstalled Amanda 3.1.2. I'm glad it worked! There's some info at http://wiki.zmanda.com/index.php/Installation/OS_Specific_Notes/Installing_Amanda_on_FreeBSD which apparently could use some additional information. Can you fill in what you know there, to help the next guy? Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
That did it Or may have done it. I upgraded my FreeBSD sources to the latest 8.x stable and recompiled the world (and kernel). I then compiled and installed Perl-5.10.1 for my FreeBSD by hand, specifically enabling its threaded option. Finally, I recompiled and reinstalled Amanda 3.1.2. I made a single virtual tape disk and tested the new configuration. The taper process was accumulating run time in that configuration and the single file system was backed up and transferred to 'tape'. :) I re-enabled my virtual tape configuration and the chg-disk changer, and as I type this, I am flushing all the backup files that have been accumulating from my tests over the last week. That appears to be working as well. Thank you. On Sun, Sep 19, 2010 at 11:58:04AM -0500, Dustin J. Mitchell wrote: > On Sun, Sep 19, 2010 at 11:16 AM, Stephen Corbesero wrote: > > It still seems to hang right after the message about the 'Final > > linkage'. That is coming from the link_elements() subroutine in > > xfer-src/xfer.c. > > Which is basically where the threading starts.. > > > I am not sure what to do next to get more evidence. > > Have you checked that libc_r, glib, perl, and Amanda are all compiled > with the same threading library? John Hein dug up a thread from when > we tried to work this out back in March: > > http://lists.freebsd.org/pipermail/freebsd-perl/2010-March/002511.html > > the thread went un-answered at the time. The solution we found was to > build a threaded perl, using the same underlying threading library > that everything else is built against. If I recall, John ended up > doing a *lot* of custom builds for this. I'm not sure what the > "right" solution is here. > > Dustin > > -- > Open Source Storage Engineer > http://www.zmanda.com > -- Stephen CorbeseroIt's always darkest Bethlehem, PA 18015 before pitch black. corbes...@ptd.net
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sun, Sep 19, 2010 at 11:16 AM, Stephen Corbesero wrote: > It still seems to hang right after the message about the 'Final > linkage'. That is coming from the link_elements() subroutine in > xfer-src/xfer.c. Which is basically where the threading starts.. > I am not sure what to do next to get more evidence. Have you checked that libc_r, glib, perl, and Amanda are all compiled with the same threading library? John Hein dug up a thread from when we tried to work this out back in March: http://lists.freebsd.org/pipermail/freebsd-perl/2010-March/002511.html the thread went un-answered at the time. The solution we found was to build a threaded perl, using the same underlying threading library that everything else is built against. If I recall, John ended up doing a *lot* of custom builds for this. I'm not sure what the "right" solution is here. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
The saga continues. I changed my debugging statements to the Amanda::Debug::debug method as your suggested, so my messages are the SGC:'s in the taper debug log file below. I also added some of these debug calls to the Amanda::Taper::Scribe routines, but I don't see much more information. It still seems to hang right after the message about the 'Final linkage'. That is coming from the link_elements() subroutine in xfer-src/xfer.c. I am not sure what to do next to get more evidence. Sep 19 11:49:19 2010: taper: pid 3545 ruid 601 euid 601 version 3.1.2: rename at Sun Sep 19 11:49:19 2010 Sun Sep 19 11:49:19 2010: taper: SGC: main::Controller::new() Sun Sep 19 11:49:19 2010: taper: SGC: main::Controller::start() Sun Sep 19 11:49:19 2010: taper: SGC: main::Controller::_assert_in_state(init) Sun Sep 19 11:49:19 2010: taper: Amanda::Changer::compat initialized with script /usr/local/libexec/amanda/chg-disk, temporary directory /usr/local/etc/amanda/daily Sun Sep 19 11:49:19 2010: taper: SGC: Amanda::Taper::Scribe::new() Sun Sep 19 11:49:19 2010: taper: SGC: Amanda::Taper::Scribe::Dev_Handling::new() Sun Sep 19 11:49:19 2010: taper: driver/taper << START-TAPER 20100919114917 Sun Sep 19 11:49:19 2010: taper: SGC: main::Controller::_assert_in_state(init) Sun Sep 19 11:49:19 2010: taper: SGC: Amanda::Taper::Scribe::start() Sun Sep 19 11:49:19 2010: taper: Amanda::Taper::Scribe: starting Sun Sep 19 11:49:19 2010: taper: SGC: Amanda::Taper::Scribe::Dev_Handling::start() Sun Sep 19 11:49:19 2010: taper: SGC: Amanda::Taper::Scribe::Dev_Handling::_start_scanning() Sun Sep 19 11:49:19 2010: taper: Amanda::Taper::Scan::traditional stage 1: search for oldest reusable volume Sun Sep 19 11:49:19 2010: taper: Amanda::Taper::Scan::traditional no oldest reusable volume Sun Sep 19 11:49:19 2010: taper: Amanda::Taper::Scan::traditional stage 2: scan for any reusable volume Sun Sep 19 11:49:19 2010: taper: Amanda::Changer::compat: invoking /usr/local/libexec/amanda/chg-disk with -info Sun Sep 19 11:49:21 2010: taper: Amanda::Changer::compat: Got response '1 12 1' with exit status 0 Sun Sep 19 11:49:21 2010: taper: Amanda::Changer::compat: invoking /usr/local/libexec/amanda/chg-disk with -slot current Sun Sep 19 11:49:22 2010: taper: Amanda::Changer::compat: Got response '1 file:/amanda/vtl/daily' with exit status 0 Sun Sep 19 11:49:22 2010: taper: Amanda::Taper::Scan::traditional result: 'DAILY-1' on file:/amanda/vtl/daily slot 1, mode 2 Sun Sep 19 11:49:22 2010: taper: SGC: Amanda::Taper::Scribe::Dev_Handling::_maybe_callback() Sun Sep 19 11:49:22 2010: taper: SGC: main::Controller::_scribe_started_cb() Sun Sep 19 11:49:22 2010: taper: driver/taper >> TAPER-OK Sun Sep 19 11:50:12 2010: taper: driver/taper << FILE-WRITE 00-2 /amanda/holding-b/20100919114917/galadriel.corbesero.net._.2 galadriel.corbesero.net / 2 20100919114917 0 130875 Sun Sep 19 11:50:12 2010: taper: SGC: main::Controller::_assert_in_state(idle) Sun Sep 19 11:50:12 2010: taper: SGC: main::Controller::setup_and_start_dump() Sun Sep 19 11:50:12 2010: taper: SGC: main::Controller::_assert_in_state(idle) Sun Sep 19 11:50:12 2010: taper: SGC: main::Controller::get_splitting_config() Sun Sep 19 11:50:12 2010: taper: warning: Use of uninitialized value $params{"data_path"} in string eq at /usr/local/libexec/amanda/taper line 606. Sun Sep 19 11:50:12 2010: taper: SGC: Amanda::Taper::Scribe::get_xfer_dest() Sun Sep 19 11:50:12 2010: taper: Amanda::Taper::Scribe: get_xfer_dest(split_method=none) Sun Sep 19 11:50:12 2010: taper: Amanda::Taper::Scribe setting up a transfer with split method none Sun Sep 19 11:50:12 2010: taper: SGC: Amanda::Taper::Scribe::get_device() Sun Sep 19 11:50:12 2010: taper: SGC: Amanda::Taper::Scribe::Dev_Handling::peek_device() -> Amanda::Device::Device=HASH(0x29f7c71c) Sun Sep 19 11:50:12 2010: taper: XDT thd-0x298ecf70: using slab_size 32768 and max_slabs 2 Sun Sep 19 11:50:12 2010: taper: Starting -> )> Sun Sep 19 11:50:12 2010: taper: Final linkage: -(PULL_BUFFER)-> -(PUSH_BUFFER)-> On Sat, Sep 18, 2010 at 04:43:53PM -0500, Dustin J. Mitchell wrote: > On Sat, Sep 18, 2010 at 4:17 PM, Stephen Corbesero wrote: > > Also, if it is worth anything, the state of the process is listed as > > 'umtxn' from top/ps. (This is a FreeBSD 8.1 system.) > > Huh, so that sounds like a kernel lock: > http://old.nabble.com/what-is-umtxn-td20860047.html > > Can you identify further the last few lines of perl executed? You > should use Amanda::Debug::debug instead, so that your logging will be > intermixed with the other information from taper_debug, in the taper > debug log. You mentioned that you enabled taper_debug, but never sent > along the debug log - can you do that now? > -- Stephen CorbeseroIt's always darkest Bethlehem, PA 18015 before pitch black. corbes...@ptd.net
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sat, Sep 18, 2010 at 4:17 PM, Stephen Corbesero wrote: > Also, if it is worth anything, the state of the process is listed as > 'umtxn' from top/ps. (This is a FreeBSD 8.1 system.) Huh, so that sounds like a kernel lock: http://old.nabble.com/what-is-umtxn-td20860047.html Can you identify further the last few lines of perl executed? You should use Amanda::Debug::debug instead, so that your logging will be intermixed with the other information from taper_debug, in the taper debug log. You mentioned that you enabled taper_debug, but never sent along the debug log - can you do that now? Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
Here it is. Again, this is the output from gdb after all the loading symbols messages. Also, if it is worth anything, the state of the process is listed as 'umtxn' from top/ps. (This is a FreeBSD 8.1 system.) [Switching to Thread 28469480 (LWP 100132)] 0x28af6225 in __error () from /lib/libthr.so.3 (gdb) bt #0 0x28af6225 in __error () from /lib/libthr.so.3 #1 0x28af603b in __error () from /lib/libthr.so.3 #2 0x28af8f60 in _thread_state_running () from /lib/libthr.so.3 #3 0x0011 in ?? () #4 0x in ?? () #5 0x in ?? () #6 0x in ?? () #7 0x0001 in ?? () #8 0xbfbfe3b8 in ?? () #9 0x28700548 in ?? () from /usr/local/lib/libglib-2.0.so.0 #10 0x28469480 in ?? () #11 0x08049c60 in ?? () #12 0xbfbfe2f8 in ?? () #13 0x28af0807 in pthread_mutex_getyieldloops_np () from /lib/libthr.so.3 #14 0x00018724 in ?? () #15 0x28087c00 in ?? () #16 0x00018724 in ?? () #17 0x28338000 in ?? () #18 0x28338200 in ?? () #19 0x28338400 in ?? () #20 0x28af7bf4 in ?? () from /lib/libthr.so.3 #21 0x in ?? () #22 0x28469480 in ?? () #23 0xbfbfe318 in ?? () #24 0x28af112f in pthread_mutex_trylock () from /lib/libthr.so.3 Previous frame inner to this frame (corrupt stack?) (gdb) On Sat, Sep 18, 2010 at 03:58:22PM -0500, Dustin J. Mitchell wrote: > On Sat, Sep 18, 2010 at 3:35 PM, Stephen Corbesero wrote: > > (gdb) info threads > > * 1 Thread 28469480 (LWP 100148) 0x28af6225 in __error () > > from /lib/libthr.so.3 > > Can you run the 'bt' command here to see what that thread is up to? > > (gdb) bt > > Dustin > > -- > Open Source Storage Engineer > http://www.zmanda.com > -- Stephen CorbeseroIt's always darkest Bethlehem, PA 18015 before pitch black. corbes...@ptd.net
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sat, Sep 18, 2010 at 3:35 PM, Stephen Corbesero wrote: > (gdb) info threads > * 1 Thread 28469480 (LWP 100148) 0x28af6225 in __error () > from /lib/libthr.so.3 Can you run the 'bt' command here to see what that thread is up to? (gdb) bt Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
I did the gdb thing, but it really didn't produce much info. I used the command line gdb /usr/bin/perl5 70805 and after all of the 'loaded symbols' messages, I generated the following output. [Switching to Thread 28469480 (LWP 100148)] 0x28af6225 in __error () from /lib/libthr.so.3 (gdb) info threads * 1 Thread 28469480 (LWP 100148) 0x28af6225 in __error () from /lib/libthr.so.3 I then added some simple debugging messages to the taper perl script. I just printed the subroutine's being called to STDERR so they would show up in the amdump log file. The following is the tail of that file from the point where the taper should transferring the file to my virtual tape. driver: send-cmd time 26.691 to taper: FILE-WRITE 00-2 /amanda/holding-b/20100918162652/galadriel.corbesero.net._.2 galadriel.corbesero.net / 2 20100918162652 0 847 SGC: main::Controller::_assert_in_state() SGC: main::Controller::setup_and_start_dump() SGC: main::Controller::_assert_in_state() SGC: main::Controller::get_splitting_config() driver: startaflush: FIRST galadriel.corbesero.net / 109 9216000 driver: state time 26.699 free kps: 8000 space: 16256915 taper: writing idle-dumpers: 4 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 26.699 if default: free 8000 driver: hdisk-state time 26.699 hdisk 0: free 2129920 dumpers 0 hdisk 1: free 14126995 dumpers 0 As you can guess, the SGC: lines are my trace outputs. The end of this amdump is the spot where taper just sits What should I look into next? Should I add more debugging lines to the taper script and the other taper modules? On Mon, Sep 13, 2010 at 11:14:49PM -0500, Dustin J. Mitchell wrote: > On Mon, Sep 13, 2010 at 8:30 PM, Stephen Corbesero wrote: > > If I kill the taper process, everything shuts down "normally", but the > > email says there was a taper protocol error and nothing was written. > > > > I turned on taper_debug in my amanda.conf, but that didn't seem to > > produce any extra output. I have also snooped around various log > > files, but I don't see any clues. > > If you can run gdb or the equivalent on the "hung" taper process, and > send the backtrace, that would be helpful. > > You'll want to use the 'info threads' command in gdb to figure out > what threads are active, and then the 'thread' command to switch to > each one and the 'backtrace' command to generate a backtrace for it. > > Dustin > > -- > Open Source Storage Engineer > http://www.zmanda.com -- Stephen CorbeseroIt's always darkest Bethlehem, PA 18015 before pitch black. corbes...@ptd.net
Re: My 3.1.2 server starts ok, but taper never writes anything
On Mon, Sep 13, 2010 at 8:30 PM, Stephen Corbesero wrote: > If I kill the taper process, everything shuts down "normally", but the > email says there was a taper protocol error and nothing was written. > > I turned on taper_debug in my amanda.conf, but that didn't seem to > produce any extra output. I have also snooped around various log > files, but I don't see any clues. If you can run gdb or the equivalent on the "hung" taper process, and send the backtrace, that would be helpful. You'll want to use the 'info threads' command in gdb to figure out what threads are active, and then the 'thread' command to switch to each one and the 'backtrace' command to generate a backtrace for it. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
No.. I have only waited a few hours at a time, but it appears to just sit there. If I kill the taper process, everything shuts down "normally", but the email says there was a taper protocol error and nothing was written. I turned on taper_debug in my amanda.conf, but that didn't seem to produce any extra output. I have also snooped around various log files, but I don't see any clues. On Mon, Sep 13, 2010 at 07:38:35PM -0500, Dustin J. Mitchell wrote: > The warning is harmless, and has been fixed in trunk. > > On Mon, Sep 13, 2010 at 5:32 PM, Stephen Corbesero wrote: > > Sun Sep 12 10:29:05 2010: taper: Amanda::Taper::Scribe setting up a > > transfer with split method none > > Sun Sep 12 10:29:05 2010: taper: Starting > ( -> )> > > Sun Sep 12 10:29:05 2010: taper: Final linkage: > > -(PULL_BUFFER)-> > > -(PUSH_BUFFER)-> > > > > This is a FILE-WRITE - a write from holding disk - so there's no > connection to the dumper required. It looks like this was running > fine up until this point. Did it never terminate? If so, can you > tell what it was doing via gdb or strace? > > Dustin > > -- > Open Source Storage Engineer > http://www.zmanda.com -- Stephen CorbeseroIt's always darkest Bethlehem, PA 18015 before pitch black. corbes...@ptd.net
Re: My 3.1.2 server starts ok, but taper never writes anything
The warning is harmless, and has been fixed in trunk. On Mon, Sep 13, 2010 at 5:32 PM, Stephen Corbesero wrote: > Sun Sep 12 10:29:05 2010: taper: Amanda::Taper::Scribe setting up a transfer > with split method none > Sun Sep 12 10:29:05 2010: taper: Starting ( -> )> > Sun Sep 12 10:29:05 2010: taper: Final linkage: > -(PULL_BUFFER)-> > -(PUSH_BUFFER)-> This is a FILE-WRITE - a write from holding disk - so there's no connection to the dumper required. It looks like this was running fine up until this point. Did it never terminate? If so, can you tell what it was doing via gdb or strace? Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: My 3.1.2 server starts ok, but taper never writes anything
Yes, My guess would be the taper is just sitting there waiting for data. Here is the debug output. It is fairly short. The uninitialized value of data_path is what led me to believe that the send-cmd was bad. Sun Sep 12 10:28:44 2010: taper: pid 4465 ruid 601 euid 601 version 3.1.2: start at Sun Sep 12 10:28:44 2010 Sun Sep 12 10:28:44 2010: taper: pid 4465 ruid 601 euid 601 version 3.1.2: rename at Sun Sep 12 10:28:44 2010 Sun Sep 12 10:28:44 2010: taper: Amanda::Changer::compat initialized with script /usr/local/libexec/amanda/chg-disk, temporary directory /usr/local/etc/amanda/daily Sun Sep 12 10:28:44 2010: taper: Amanda::Taper::Scan::traditional stage 1: search for oldest reusable volume Sun Sep 12 10:28:44 2010: taper: Amanda::Taper::Scan::traditional no oldest reusable volume Sun Sep 12 10:28:44 2010: taper: Amanda::Taper::Scan::traditional stage 2: scan for any reusable volume Sun Sep 12 10:28:44 2010: taper: Amanda::Changer::compat: invoking /usr/local/libexec/amanda/chg-disk with -info Sun Sep 12 10:28:46 2010: taper: Amanda::Changer::compat: Got response '1 12 1' with exit status 0 Sun Sep 12 10:28:46 2010: taper: Amanda::Changer::compat: invoking /usr/local/libexec/amanda/chg-disk with -slot current Sun Sep 12 10:28:47 2010: taper: Amanda::Changer::compat: Got response '1 file:/amanda/vtl/daily' with exit status 0 Sun Sep 12 10:28:47 2010: taper: Amanda::Taper::Scan::traditional result: 'DAILY-1' on file:/amanda/vtl/daily slot 1, mode 2 Sun Sep 12 10:29:05 2010: taper: warning: Use of uninitialized value $params{"data_path"} in string eq at /usr/local/libexec/amanda/taper line 578. Sun Sep 12 10:29:05 2010: taper: Amanda::Taper::Scribe setting up a transfer with split method none Sun Sep 12 10:29:05 2010: taper: Starting -> )> Sun Sep 12 10:29:05 2010: taper: Final linkage: -(PULL_BUFFER)-> -(PUSH_BUFFER)-> On Mon, Sep 13, 2010 at 01:56:54PM -0500, Dustin J. Mitchell wrote: > On Sun, Sep 12, 2010 at 9:54 AM, Stephen Corbesero wrote: > > driver: send-cmd time 22.514 to taper: FILE-WRITE 00-2 > > /amanda/holding-b/20100912102842/galadriel.corbesero.net._.2 > > galadriel.corbesero.net / 2 20100912102842 0 192 > > It looks like the taper is running. The send-cmd component you quoted > is correct - it's the options for that host. > > What are you seeing that's got you thinking the taper never gets any > data? Can you take a look at the taper debug log and see what you > find? > > Dustin > > -- > Open Source Storage Engineer > http://www.zmanda.com -- Stephen CorbeseroIt's always darkest Bethlehem, PA 18015 before pitch black. corbes...@ptd.net
Re: My 3.1.2 server starts ok, but taper never writes anything
On Sun, Sep 12, 2010 at 9:54 AM, Stephen Corbesero wrote: > driver: send-cmd time 22.514 to taper: FILE-WRITE 00-2 > /amanda/holding-b/20100912102842/galadriel.corbesero.net._.2 > galadriel.corbesero.net / 2 20100912102842 0 192 It looks like the taper is running. The send-cmd component you quoted is correct - it's the options for that host. What are you seeing that's got you thinking the taper never gets any data? Can you take a look at the taper debug log and see what you find? Dustin -- Open Source Storage Engineer http://www.zmanda.com