Re: sm crashing on startup

2017-01-05 Thread Greg Troxel

Tomasz Sterna  writes:

> W dniu 03.01.2017, wto o godzinie 23∶35 -0500, użytkownik Greg Troxel
> napisał:
>
>> Jabberd mostly works fine, but on boot sm crashes.  I have adjusted
>> sequencing, although in theory it should not matter
>
> Does 48125019 [1] fix your issue?
>
> [1] 
> https://github.com/jabberd2/jabberd2/commit/48125019452e291b2c57275c789f3d7df87d7146

I applied the patch and rebuilt with -g.  I get the same behavior: on
first booting the machine (which starts jabberd), sm crashes.  If I then
run sm again, it starts fine and runs reliably indefinitely.  I am
running 2.4.0, with sqlite3 backend, on NetBSD 6 i386, built from pkgsrc
with gcc 4.5.

Reproduction recipe which may or may not work for you:
  send sm a HUP
  after that, log in

From the logs, sm connected to router, and disconnected after 6 seconds.
This is when the user below (the first to connect) succeeds in
authenticating.  This same user was the second to authenticate when I
restarted sm seconds later.  The last lines of sm log before the crash were

Thu Jan  5 06:44:30 2017 [notice] module 'iq-vcard' added to chain 
'user-delete' (order 9 index 6 seq 2)
Thu Jan  5 06:44:30 2017 [notice] module 'iq-version' added to chain 
'disco-extend' (order 0 index 17 seq 1)
Thu Jan  5 06:44:30 2017 [notice] module 'help' added to chain 'disco-extend' 
(order 1 index 18 seq 1)
Thu Jan  5 06:44:30 2017 [notice] reopening log ...
Thu Jan  5 06:44:30 2017 [notice] log started

I did notice on startup

Thu Jan  5 06:44:29 2017 [notice] module 'help' added to chain 'disco-extend' 
(order 1 index 18 seq 1)
Thu Jan  5 06:44:29 2017 [notice] version: jabberd sm 2.4.0
Thu Jan  5 06:44:29 2017 [notice] [example.com] configured
Thu Jan  5 06:44:29 2017 [notice] attempting connection to router at 127.0.0.1, 
port=5347
Thu Jan  5 06:44:29 2017 [notice] connection to router established
Thu Jan  5 06:44:29 2017 [notice] sm ready for sessions
Thu Jan  5 06:44:30 2017 [notice] HUP handled. reloading modules...
Thu Jan  5 06:44:30 2017 [notice] modules search path: /usr/pkg/lib/jabberd
Thu Jan  5 06:44:30 2017 [notice] module 'status' added to chain 'sess-start' 
(order 0 index 0 seq 0)
Thu Jan  5 06:44:30 2017 [notice] module 'status' added to chain 'sess-end' 
(order 0 index 0 seq 1)
Thu Jan  5 06:44:30 2017 [notice] module 'iq-last' added to chain 'sess-end' 
(order 1 index 1 seq 0)

The HUP is probably (wildly speculating) because the controlling tty of
the init script was the console, sm didn't detach, and when the init
scripts finished the tty was revoked to clean it up for console login.
But why the HUP happened is minor; the issue is the behavior when it
happened.  I don't see HUP in the router or c2s logs.  But now it makes
sense why it crashes on boot and not later.

After most of an hour of sm running, I sent a HUP, and sm reloaded
modules and stayed up. I logged out and in and on login it crashed.
Same value "status" in c/h.

Here is the backtrace from the crash on boot, which is similar to one I
posted yesterday.  (I have replaced the JID string; but the issue seems
to be the first connection, not this particular user.)

#0  0x080635de in xhash_getx (h=0x74617473, key=0xbb7e334c "storage.path", 
len=12) at xhash.c:174
#1  0x0806364f in xhash_get (h=0x74617473, key=) at xhash.c:187
#2  0x0805c426 in config_get_one (c=0xbb102060, key=0xbb7e334c "storage.path", 
num=0) at config.c:280
#3  0xbb7e258b in storage_add_type (st=0xbb10b040, driver=0xbb119028 "sqlite", 
type=0xbb06cb6a "active") at storage.c:114
#4  0xbb7e2a44 in storage_get (st=0xbb10b040, type=0xbb06cb6a "active", 
owner=0xbb12fc80 "user@examplecom", filter=0x0, os=0xbf7fe258) at storage.c:239
#5  0xbb06ca23 in _active_user_load (mi=0xbb12f660, user=0xbb11b550) at 
mod_active.c:35

You can see that argument h to frames 0 and 1 is suspect (should be a
pointer).  In frame 2, config_get_one has a config_t c which does
contain that value in c->hash.  But then I realized that (char *) >h
(also "c") is "status".

(gdb) print c
$1 = (config_t) 0xbb102060
(gdb) print *c
$2 = {hash = 0x74617473, nad = 0xbb007375}
(gdb) x/s c
0xbb102060:  "status"

In frame 3, storage_t is ok

(gdb) print st
$3 = (storage_t) 0xbb10b040
(gdb) print *st
$4 = {config = 0xbb102060, log = 0xbb102068, drivers = 0xbb119000, types = 
0xbb11a000, default_drv = 0xbb11b040}
(gdb) print *st->log
$5 = {type = log_FILE, file = 0xbb3d94c0}
(gdb) print *st->drivers
$6 = {p = 0xbb103080, prime = 101, dirty = 1, count = 1, zen = 0xbb122800, 
free_list = 0x0, iter_bucket = -1, iter_node = 0x0, stat = 0x0}
(gdb) print *st->types
$7 = {p = 0xbb1030c0, prime = 101, dirty = 0, count = 0, zen = 0xbb123000, 
free_list = 0x0, iter_bucket = -1, iter_node = 0x0, stat = 0x0}

My next step would be some guard for config_t, and to turn on the
existing guards.

But, I thought I would post what I have found out so far in case
something jumps out at somebody.  With no basis, I am suspicious of
util/config.c.


signature.asc

Re: sm crashing on startup

2017-01-04 Thread Tomasz Sterna
W dniu 03.01.2017, wto o godzinie 23∶35 -0500, użytkownik Greg Troxel
napisał:
>  Jabberd mostly works fine, but on boot sm
> crashes.  I have adjusted sequencing, although in theory it should
> not matter

Does 48125019 [1] fix your issue?


[1] 
https://github.com/jabberd2/jabberd2/commit/48125019452e291b2c57275c789f3d7df87d7146


-- 
 /o__ 
(_<^' Good teaching is one-fourth preparation and three-fourths good theatre.




sm crashing on startup

2017-01-03 Thread Greg Troxel
I have a NetBSD 6 system, with jabberd 2.4.0 (built for netbsd 5 still.
Although binary compat at this level is almost beyond suspicion, I am
rebuilding all packages) .  Jabberd mostly works fine, but on boot sm
crashes.  I have adjusted sequencing, although in theory it should not
matter (and aside from sm does not):

  Starting router.
  Starting s2s.
  Starting c2s.
  Starting sm.
  Starting muc.

(plus some other things interleaved; NetBSD does a topological sort of
the startup files in /etc/rc.d/ and then uses that order.)

I found that ~always, sm was not running after boot, and that starting
sm was enough to make the server work.  So I started logging exits due
to core dump and had the sm startup change to a directory where the
jabberd pid could write a core file.  I got the following, and I know I
need to rebuild jabberd with debug symbols.  But I wonder if the issue
is processing a message from router before the sm startup has completed.

(gdb) bt
#0  0x080612dc in ?? ()
#1  0x0806140a in xhash_getx ()
#2  0x0806145e in xhash_get ()
#3  0x0805a6a4 in config_get_one ()
#4  0xbb7e25ea in storage_add_type () from
/usr/pkg/lib/jabberd/libstorage.so.0
#5  0xbb7e2a2a in storage_get () from
/usr/pkg/lib/jabberd/libstorage.so.0
#6  0xbb0c9aa2 in _active_user_load () from
/usr/pkg/lib/jabberd/mod_active.so
#7  0x0804f67b in mm_user_load ()
#8  0x08053325 in user_load ()
#9  0x080522c7 in sess_start ()
#10 0xbb0df2fa in _session_in_router () from
/usr/pkg/lib/jabberd/mod_session.so
#11 0x0804f44b in mm_in_router ()
#12 0x0804e107 in dispatch ()
#13 0x08053021 in sm_sx_callback ()
#14 0x08054f21 in __sx_event ()
#15 0x0805495f in _sx_process_read ()
#16 0x08054e69 in sx_can_read ()
#17 0x08052c93 in sm_mio_callback ()
#18 0x0805a1d7 in ?? ()
#19 0x0804ed22 in main ()




Re: crashing at startup?

2015-02-26 Thread Eric Schnoebelen
Greg Troxel writes:
- On a netbsd-5 i386 box (xen domU) that is otherwise stable, with jabberd
- 2.3.2 and mu-conference 0.8.81, on boot we saw:
- 
- Feb  6 05:33:34 foo /netbsd: pid 317 (sm), uid 1001: exited on signal 11 (c
- ore not dumped, err = 13)
- Feb  6 05:34:26 foo /netbsd: pid 540 (mu-conference), uid 1001: exited on s
- ignal 5 (core not dumped, err = 13)
- 
- Once up, after restarting servers with /etc/rc.d/jabberd restart (and
- then muc), the jabberd2 programs were stable, but mu-conference has
- occasional crashes.

[...]

- Other than getting a core dump and examining it, which is the obvious
- thing to do, I'm curious if anyone else has seen something like this.

I've not see anything like that on NetBSD 6_STABLE/alpha.  I do
see mu-conference crashing semi-regularly, as well as it
occasionally running away (but  not nearly as much after the
last set of patches.)

Anything odd/unusual in the sm.xml configuration file?

fyi: I'm running 2.3.2nb8 from pkgsrc-2014Q4 (not that anything
has changed in pkgsrc since Q4 was branched.)

--
Eric Schnoebelene...@cirr.com   http://www.cirr.com
I'm not a minion of evil
I'm upper management.




Re: crashing at startup?

2015-02-26 Thread Greg Troxel

e...@cirr.com (Eric Schnoebelen) writes:

 I've not see anything like that on NetBSD 6_STABLE/alpha.  I do
 see mu-conference crashing semi-regularly, as well as it
 occasionally running away (but  not nearly as much after the
 last set of patches.)

It's cool you are running this on alpha.

The sm crash was only once.  I am seeing ongoing mu-conference crashes,
which look like they are hitting asserts in glib unlocking mutexes.
I have built stuff with -g but haven't flipped it to run yet (production
box with only 66 active connections right now, lower than the usual ~90).

 Anything odd/unusual in the sm.xml configuration file?

No.  Just switching logging to file, setting hostname, and stuff you
basically have to edit.



pgp3biTOXDYYj.pgp
Description: PGP signature


crashing at startup?

2015-02-09 Thread Greg Troxel

On a netbsd-5 i386 box (xen domU) that is otherwise stable, with jabberd
2.3.2 and mu-conference 0.8.81, on boot we saw:

Feb  6 05:33:34 foo /netbsd: pid 317 (sm), uid 1001: exited on signal 11 (core 
not dumped, err = 13)
Feb  6 05:34:26 foo /netbsd: pid 540 (mu-conference), uid 1001: exited on 
signal 5 (core not dumped, err = 13)

Once up, after restarting servers with /etc/rc.d/jabberd restart (and
then muc), the jabberd2 programs were stable, but mu-conference has
occasional crashes.

I noticed that the order of starting up the components is

c2s
sm
s2s
router
muc

which strikes me as odd (why isn't router first).  But, it's not like
these are fully up before the parent exits, and in theory all daemons
should cope/retry.

Other than getting a core dump and examining it, which is the obvious
thing to do, I'm curious if anyone else has seen something like this.

Greg


pgp2HHXicyxbe.pgp
Description: PGP signature