Re: sm crashing on startup
Tomasz Sternawrites: > W dniu 03.01.2017, wto o godzinie 23∶35 -0500, użytkownik Greg Troxel > napisał: > >> Jabberd mostly works fine, but on boot sm crashes. I have adjusted >> sequencing, although in theory it should not matter > > Does 48125019 [1] fix your issue? > > [1] > https://github.com/jabberd2/jabberd2/commit/48125019452e291b2c57275c789f3d7df87d7146 I applied the patch and rebuilt with -g. I get the same behavior: on first booting the machine (which starts jabberd), sm crashes. If I then run sm again, it starts fine and runs reliably indefinitely. I am running 2.4.0, with sqlite3 backend, on NetBSD 6 i386, built from pkgsrc with gcc 4.5. Reproduction recipe which may or may not work for you: send sm a HUP after that, log in From the logs, sm connected to router, and disconnected after 6 seconds. This is when the user below (the first to connect) succeeds in authenticating. This same user was the second to authenticate when I restarted sm seconds later. The last lines of sm log before the crash were Thu Jan 5 06:44:30 2017 [notice] module 'iq-vcard' added to chain 'user-delete' (order 9 index 6 seq 2) Thu Jan 5 06:44:30 2017 [notice] module 'iq-version' added to chain 'disco-extend' (order 0 index 17 seq 1) Thu Jan 5 06:44:30 2017 [notice] module 'help' added to chain 'disco-extend' (order 1 index 18 seq 1) Thu Jan 5 06:44:30 2017 [notice] reopening log ... Thu Jan 5 06:44:30 2017 [notice] log started I did notice on startup Thu Jan 5 06:44:29 2017 [notice] module 'help' added to chain 'disco-extend' (order 1 index 18 seq 1) Thu Jan 5 06:44:29 2017 [notice] version: jabberd sm 2.4.0 Thu Jan 5 06:44:29 2017 [notice] [example.com] configured Thu Jan 5 06:44:29 2017 [notice] attempting connection to router at 127.0.0.1, port=5347 Thu Jan 5 06:44:29 2017 [notice] connection to router established Thu Jan 5 06:44:29 2017 [notice] sm ready for sessions Thu Jan 5 06:44:30 2017 [notice] HUP handled. reloading modules... Thu Jan 5 06:44:30 2017 [notice] modules search path: /usr/pkg/lib/jabberd Thu Jan 5 06:44:30 2017 [notice] module 'status' added to chain 'sess-start' (order 0 index 0 seq 0) Thu Jan 5 06:44:30 2017 [notice] module 'status' added to chain 'sess-end' (order 0 index 0 seq 1) Thu Jan 5 06:44:30 2017 [notice] module 'iq-last' added to chain 'sess-end' (order 1 index 1 seq 0) The HUP is probably (wildly speculating) because the controlling tty of the init script was the console, sm didn't detach, and when the init scripts finished the tty was revoked to clean it up for console login. But why the HUP happened is minor; the issue is the behavior when it happened. I don't see HUP in the router or c2s logs. But now it makes sense why it crashes on boot and not later. After most of an hour of sm running, I sent a HUP, and sm reloaded modules and stayed up. I logged out and in and on login it crashed. Same value "status" in c/h. Here is the backtrace from the crash on boot, which is similar to one I posted yesterday. (I have replaced the JID string; but the issue seems to be the first connection, not this particular user.) #0 0x080635de in xhash_getx (h=0x74617473, key=0xbb7e334c "storage.path", len=12) at xhash.c:174 #1 0x0806364f in xhash_get (h=0x74617473, key=) at xhash.c:187 #2 0x0805c426 in config_get_one (c=0xbb102060, key=0xbb7e334c "storage.path", num=0) at config.c:280 #3 0xbb7e258b in storage_add_type (st=0xbb10b040, driver=0xbb119028 "sqlite", type=0xbb06cb6a "active") at storage.c:114 #4 0xbb7e2a44 in storage_get (st=0xbb10b040, type=0xbb06cb6a "active", owner=0xbb12fc80 "user@examplecom", filter=0x0, os=0xbf7fe258) at storage.c:239 #5 0xbb06ca23 in _active_user_load (mi=0xbb12f660, user=0xbb11b550) at mod_active.c:35 You can see that argument h to frames 0 and 1 is suspect (should be a pointer). In frame 2, config_get_one has a config_t c which does contain that value in c->hash. But then I realized that (char *) >h (also "c") is "status". (gdb) print c $1 = (config_t) 0xbb102060 (gdb) print *c $2 = {hash = 0x74617473, nad = 0xbb007375} (gdb) x/s c 0xbb102060: "status" In frame 3, storage_t is ok (gdb) print st $3 = (storage_t) 0xbb10b040 (gdb) print *st $4 = {config = 0xbb102060, log = 0xbb102068, drivers = 0xbb119000, types = 0xbb11a000, default_drv = 0xbb11b040} (gdb) print *st->log $5 = {type = log_FILE, file = 0xbb3d94c0} (gdb) print *st->drivers $6 = {p = 0xbb103080, prime = 101, dirty = 1, count = 1, zen = 0xbb122800, free_list = 0x0, iter_bucket = -1, iter_node = 0x0, stat = 0x0} (gdb) print *st->types $7 = {p = 0xbb1030c0, prime = 101, dirty = 0, count = 0, zen = 0xbb123000, free_list = 0x0, iter_bucket = -1, iter_node = 0x0, stat = 0x0} My next step would be some guard for config_t, and to turn on the existing guards. But, I thought I would post what I have found out so far in case something jumps out at somebody. With no basis, I am suspicious of util/config.c. signature.asc
Re: sm crashing on startup
W dniu 03.01.2017, wto o godzinie 23∶35 -0500, użytkownik Greg Troxel napisał: > Jabberd mostly works fine, but on boot sm > crashes. I have adjusted sequencing, although in theory it should > not matter Does 48125019 [1] fix your issue? [1] https://github.com/jabberd2/jabberd2/commit/48125019452e291b2c57275c789f3d7df87d7146 -- /o__ (_<^' Good teaching is one-fourth preparation and three-fourths good theatre.
sm crashing on startup
I have a NetBSD 6 system, with jabberd 2.4.0 (built for netbsd 5 still. Although binary compat at this level is almost beyond suspicion, I am rebuilding all packages) . Jabberd mostly works fine, but on boot sm crashes. I have adjusted sequencing, although in theory it should not matter (and aside from sm does not): Starting router. Starting s2s. Starting c2s. Starting sm. Starting muc. (plus some other things interleaved; NetBSD does a topological sort of the startup files in /etc/rc.d/ and then uses that order.) I found that ~always, sm was not running after boot, and that starting sm was enough to make the server work. So I started logging exits due to core dump and had the sm startup change to a directory where the jabberd pid could write a core file. I got the following, and I know I need to rebuild jabberd with debug symbols. But I wonder if the issue is processing a message from router before the sm startup has completed. (gdb) bt #0 0x080612dc in ?? () #1 0x0806140a in xhash_getx () #2 0x0806145e in xhash_get () #3 0x0805a6a4 in config_get_one () #4 0xbb7e25ea in storage_add_type () from /usr/pkg/lib/jabberd/libstorage.so.0 #5 0xbb7e2a2a in storage_get () from /usr/pkg/lib/jabberd/libstorage.so.0 #6 0xbb0c9aa2 in _active_user_load () from /usr/pkg/lib/jabberd/mod_active.so #7 0x0804f67b in mm_user_load () #8 0x08053325 in user_load () #9 0x080522c7 in sess_start () #10 0xbb0df2fa in _session_in_router () from /usr/pkg/lib/jabberd/mod_session.so #11 0x0804f44b in mm_in_router () #12 0x0804e107 in dispatch () #13 0x08053021 in sm_sx_callback () #14 0x08054f21 in __sx_event () #15 0x0805495f in _sx_process_read () #16 0x08054e69 in sx_can_read () #17 0x08052c93 in sm_mio_callback () #18 0x0805a1d7 in ?? () #19 0x0804ed22 in main ()
Re: crashing at startup?
Greg Troxel writes: - On a netbsd-5 i386 box (xen domU) that is otherwise stable, with jabberd - 2.3.2 and mu-conference 0.8.81, on boot we saw: - - Feb 6 05:33:34 foo /netbsd: pid 317 (sm), uid 1001: exited on signal 11 (c - ore not dumped, err = 13) - Feb 6 05:34:26 foo /netbsd: pid 540 (mu-conference), uid 1001: exited on s - ignal 5 (core not dumped, err = 13) - - Once up, after restarting servers with /etc/rc.d/jabberd restart (and - then muc), the jabberd2 programs were stable, but mu-conference has - occasional crashes. [...] - Other than getting a core dump and examining it, which is the obvious - thing to do, I'm curious if anyone else has seen something like this. I've not see anything like that on NetBSD 6_STABLE/alpha. I do see mu-conference crashing semi-regularly, as well as it occasionally running away (but not nearly as much after the last set of patches.) Anything odd/unusual in the sm.xml configuration file? fyi: I'm running 2.3.2nb8 from pkgsrc-2014Q4 (not that anything has changed in pkgsrc since Q4 was branched.) -- Eric Schnoebelene...@cirr.com http://www.cirr.com I'm not a minion of evil I'm upper management.
Re: crashing at startup?
e...@cirr.com (Eric Schnoebelen) writes: I've not see anything like that on NetBSD 6_STABLE/alpha. I do see mu-conference crashing semi-regularly, as well as it occasionally running away (but not nearly as much after the last set of patches.) It's cool you are running this on alpha. The sm crash was only once. I am seeing ongoing mu-conference crashes, which look like they are hitting asserts in glib unlocking mutexes. I have built stuff with -g but haven't flipped it to run yet (production box with only 66 active connections right now, lower than the usual ~90). Anything odd/unusual in the sm.xml configuration file? No. Just switching logging to file, setting hostname, and stuff you basically have to edit. pgp3biTOXDYYj.pgp Description: PGP signature
crashing at startup?
On a netbsd-5 i386 box (xen domU) that is otherwise stable, with jabberd 2.3.2 and mu-conference 0.8.81, on boot we saw: Feb 6 05:33:34 foo /netbsd: pid 317 (sm), uid 1001: exited on signal 11 (core not dumped, err = 13) Feb 6 05:34:26 foo /netbsd: pid 540 (mu-conference), uid 1001: exited on signal 5 (core not dumped, err = 13) Once up, after restarting servers with /etc/rc.d/jabberd restart (and then muc), the jabberd2 programs were stable, but mu-conference has occasional crashes. I noticed that the order of starting up the components is c2s sm s2s router muc which strikes me as odd (why isn't router first). But, it's not like these are fully up before the parent exits, and in theory all daemons should cope/retry. Other than getting a core dump and examining it, which is the obvious thing to do, I'm curious if anyone else has seen something like this. Greg pgp2HHXicyxbe.pgp Description: PGP signature