crashing at startup?

2015-02-09 Thread Greg Troxel

On a netbsd-5 i386 box (xen domU) that is otherwise stable, with jabberd
2.3.2 and mu-conference 0.8.81, on boot we saw:

Feb  6 05:33:34 foo /netbsd: pid 317 (sm), uid 1001: exited on signal 11 (core 
not dumped, err = 13)
Feb  6 05:34:26 foo /netbsd: pid 540 (mu-conference), uid 1001: exited on 
signal 5 (core not dumped, err = 13)

Once up, after restarting servers with /etc/rc.d/jabberd restart (and
then muc), the jabberd2 programs were stable, but mu-conference has
occasional crashes.

I noticed that the order of starting up the components is

c2s
sm
s2s
router
muc

which strikes me as odd (why isn't router first).  But, it's not like
these are fully up before the parent exits, and in theory all daemons
should cope/retry.

Other than getting a core dump and examining it, which is the obvious
thing to do, I'm curious if anyone else has seen something like this.

Greg


pgp2HHXicyxbe.pgp
Description: PGP signature


Re: crashing at startup?

2015-02-26 Thread Greg Troxel

e...@cirr.com (Eric Schnoebelen) writes:

 I've not see anything like that on NetBSD 6_STABLE/alpha.  I do
 see mu-conference crashing semi-regularly, as well as it
 occasionally running away (but  not nearly as much after the
 last set of patches.)

It's cool you are running this on alpha.

The sm crash was only once.  I am seeing ongoing mu-conference crashes,
which look like they are hitting asserts in glib unlocking mutexes.
I have built stuff with -g but haven't flipped it to run yet (production
box with only 66 active connections right now, lower than the usual ~90).

 Anything odd/unusual in the sm.xml configuration file?

No.  Just switching logging to file, setting hostname, and stuff you
basically have to edit.



pgp3biTOXDYYj.pgp
Description: PGP signature


Re: XMPP SPAM

2015-11-10 Thread Greg Troxel

Simon Josefsson  writes:

> I'm running my own jabberd2 server since a couple of months.  For the
> past 2-3 weeks I've been starting to receive XMPP spam (a couple of
> times per week).  Is there some configuration that could help here, or
> do how people handle this?  Sample s2s log output below (IP and hostname
> of spammer de-identified; josefsson.org is my domain, jabber.spammer.net
> is the remote server).

I wonder if greylisting could help.  I almost never receive incoming
jabber messages from people that I don't already have on a roster.  So a
delay of 30m would be ok for new presence requests.  But I realize that
kind of breaks the I in IM.

Another thought is an IP-address-based RBL, like the ones used for spam.


signature.asc
Description: PGP signature


Re: Future of jabberd

2016-05-30 Thread Greg Troxel
I'd also like to see multi-user conferencing integrated.   I did a
little maintenance on the muc plugin a few years ago, but the code is
still too scary.




Re: jabberd-2.4.0 release

2016-05-23 Thread Greg Troxel

Tomasz Sterna  writes:

> Next jabberd2 release is available.
>
> Get 2.4.0 release at GitHub:
> https://github.com/jabberd2/jabberd2/releases
>
> This is a bugfix release.

Does this imply that it should be safe, aside from cautions in NEWS, to
update a machine running 2.3.x to 2.4.0?   Often a minor version change
indicates something more dramatic than bugfixes, so I thought I would
ask.

In other words, why isn't this 2.3.N+1?


signature.asc
Description: PGP signature


Re: stale connections, keepalive?

2016-08-29 Thread Greg Troxel

Tomasz Sterna  writes:

> See io.keepalive [1][2] options.
> Setting this up will flush single whitespace character over the wire
> when the connection dangs idle. This triggers the TCP layer connection
> validation.

I set this up as:
  check every 300
  close connections idle for 86400 (1d)
  send keepaliave after 14400 (4h)

I realized I was seeing the problem on my other server too, just less
(because I usually don't use my phone to it).  I'll see if this resolves
the issue.  (My phone getting a wakeup every 4h is going to be zero
compared to what happens in between, I think.)


signature.asc
Description: PGP signature


stale connections, keepalive?

2016-08-27 Thread Greg Troxel

I have a server running 2.3.6 on NetBSD.  It has been basically working
fine.

I connect to it with Conversations on Android, and the nature of phones
is that they switch from wifi to cellular a lot, and I think this leads
to dangling connections.

keepalive is apparently not default on NetBSD, which seems legit.  I
have seen large numbers (50?) of connections to some combination a cell
IP address and wifi networks I am often at.  These seem to persist.
After a while the server is troubled, I think because it has hit its
open file descriptors limit.

So (without reading the code):

  should jabberd2 force TCP keepalive on?

  should c2s (and s2s probably, but less likely to be an issue) close
  client connections if it has not seen anything from the client in some
  time period, like 8h?

  is there any expectation in the protocol that clients should be doing
  any application-level keep-alive?

I am inclined to patch my server to turn on tcp keepalive.   I don't
think NetBSD has an option to force keepalive on if the program doesn't,
although I have seen this option in OS X.


signature.asc
Description: PGP signature


Re: stale connections, keepalive?

2016-08-29 Thread Greg Troxel

Christof Meerwald <cme...@cmeerw.org> writes:

> On Sun, Aug 28, 2016 at 02:13:34PM +0200, Tomasz Sterna wrote:
>> W dniu 27.08.2016, sob o godzinie 14∶55 -0400, użytkownik Greg Troxel
>> napisał:
>> >   should jabberd2 force TCP keepalive on?
>> I'm not sure whether it is possible.
>> At least on Linux it is a system-wide setting and requires root to
>> change.
>
> Are you sure? There appear to be some socket options that can be set
> for each socket:
>
> http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#setsockopt

I'm not running Linux, so we're talking more or less about what POSIX
specifies for the BSD sockets interface.  But, that page describes in
the SO_KEEPALIVE case exactly the traditional BSD socket option for
keepalive, which I suspect dates from about 4.2BSD but my memory of the
late 80s is now a bit fuzzy.

So yes, I meant to have a way to enable keepalive via SO_KEEPALIVE on
all sockets.  But that's not really the right thing.

Tomasz's point about Linux and system-wide setting is probably about
what the default value is if a program doesn't ask for keepalives. OS X
has a sysctl for this.  NetBSD doesn't; it's up to the program, as it
was historically.

On the system in question, it is surely behind a buggy firewall.
However, that's beyond my control.  It's interesting that this doesn't
show up, because I would expect the mobile to lose the data connection
and fail to close the TCP connection fairly often.

Arguably I have a system-wide problem, not a jabber problem.   But
still, given that clients just vanish, it seems like there should be
some mechanism for connections to get cleaned up.


I will check out the application-level keepalive.   What I think I want
is that for a connection from a client, if there has been no traffic in
or out for 24h, to send a space.  That will break the stale connections
after a day, and it should not cause any additional traffic on real
connections.  For now I can just send a keepalive every 24h, and that's
close enough.


signature.asc
Description: PGP signature


Re: stale connections, keepalive?

2016-08-29 Thread Greg Troxel

Tomasz Sterna  writes:

> jabberd2 has support for application layer keepalives.
>
> See io.keepalive [1][2] options.
> Setting this up will flush single whitespace character over the wire
> when the connection dangs idle. This triggers the TCP layer connection
> validation.

Great - thanks very much for that pointer.

> Having said that, I am running my server without both application layer
> and TCP keepalives turned on and see no issues with dangling
> connections.

I run two jabberd2 servers.  One of them is not behind a buggy firewall
and does not have dangling connections.  However, I'm not sure anyone is
using a phone.

> But.. I had them a lot, when my server was behind a buggy Cisco router
> doing NAT. It was dropping idle connections from its NAT table without
> telling anyone, so later when mobile network closed a connection it
> silently dropped RST packets not knowing who to NAT them to. This was
> causing a lot of dangling connections on my server.

This is probably more or less the issue, and if so I can't fix it.

Are you saying that a cell provider tracks TCP state and when the data
connection is lost sends RST packets for open connections?   I hadn't
realized that, but it seems obviously sensible if a little bit of a
layer violation.


signature.asc
Description: PGP signature


Re: jabberd-2.5.0 release

2017-01-05 Thread Greg Troxel

Thanks for making a release.

Updated in pkgsrc, and tested on netbsd-6 i386 (in a Xen domU, not that
it should matter).  Now, sm starts on boot.  I do see 'reopening log' in
the log, so it must still be getting the HUP.  But nothing is amiss, and
I can log into my server without having to restart sm.

https://mail-index.netbsd.org/pkgsrc-changes/2017/01/06/msg151199.html


signature.asc
Description: PGP signature


sm crashing on startup

2017-01-03 Thread Greg Troxel
I have a NetBSD 6 system, with jabberd 2.4.0 (built for netbsd 5 still.
Although binary compat at this level is almost beyond suspicion, I am
rebuilding all packages) .  Jabberd mostly works fine, but on boot sm
crashes.  I have adjusted sequencing, although in theory it should not
matter (and aside from sm does not):

  Starting router.
  Starting s2s.
  Starting c2s.
  Starting sm.
  Starting muc.

(plus some other things interleaved; NetBSD does a topological sort of
the startup files in /etc/rc.d/ and then uses that order.)

I found that ~always, sm was not running after boot, and that starting
sm was enough to make the server work.  So I started logging exits due
to core dump and had the sm startup change to a directory where the
jabberd pid could write a core file.  I got the following, and I know I
need to rebuild jabberd with debug symbols.  But I wonder if the issue
is processing a message from router before the sm startup has completed.

(gdb) bt
#0  0x080612dc in ?? ()
#1  0x0806140a in xhash_getx ()
#2  0x0806145e in xhash_get ()
#3  0x0805a6a4 in config_get_one ()
#4  0xbb7e25ea in storage_add_type () from
/usr/pkg/lib/jabberd/libstorage.so.0
#5  0xbb7e2a2a in storage_get () from
/usr/pkg/lib/jabberd/libstorage.so.0
#6  0xbb0c9aa2 in _active_user_load () from
/usr/pkg/lib/jabberd/mod_active.so
#7  0x0804f67b in mm_user_load ()
#8  0x08053325 in user_load ()
#9  0x080522c7 in sess_start ()
#10 0xbb0df2fa in _session_in_router () from
/usr/pkg/lib/jabberd/mod_session.so
#11 0x0804f44b in mm_in_router ()
#12 0x0804e107 in dispatch ()
#13 0x08053021 in sm_sx_callback ()
#14 0x08054f21 in __sx_event ()
#15 0x0805495f in _sx_process_read ()
#16 0x08054e69 in sx_can_read ()
#17 0x08052c93 in sm_mio_callback ()
#18 0x0805a1d7 in ?? ()
#19 0x0804ed22 in main ()




Re: sm crashing on startup

2017-01-05 Thread Greg Troxel

Tomasz Sterna <to...@xiaoka.com> writes:

> W dniu 03.01.2017, wto o godzinie 23∶35 -0500, użytkownik Greg Troxel
> napisał:
>
>> Jabberd mostly works fine, but on boot sm crashes.  I have adjusted
>> sequencing, although in theory it should not matter
>
> Does 48125019 [1] fix your issue?
>
> [1] 
> https://github.com/jabberd2/jabberd2/commit/48125019452e291b2c57275c789f3d7df87d7146

I applied the patch and rebuilt with -g.  I get the same behavior: on
first booting the machine (which starts jabberd), sm crashes.  If I then
run sm again, it starts fine and runs reliably indefinitely.  I am
running 2.4.0, with sqlite3 backend, on NetBSD 6 i386, built from pkgsrc
with gcc 4.5.

Reproduction recipe which may or may not work for you:
  send sm a HUP
  after that, log in

From the logs, sm connected to router, and disconnected after 6 seconds.
This is when the user below (the first to connect) succeeds in
authenticating.  This same user was the second to authenticate when I
restarted sm seconds later.  The last lines of sm log before the crash were

Thu Jan  5 06:44:30 2017 [notice] module 'iq-vcard' added to chain 
'user-delete' (order 9 index 6 seq 2)
Thu Jan  5 06:44:30 2017 [notice] module 'iq-version' added to chain 
'disco-extend' (order 0 index 17 seq 1)
Thu Jan  5 06:44:30 2017 [notice] module 'help' added to chain 'disco-extend' 
(order 1 index 18 seq 1)
Thu Jan  5 06:44:30 2017 [notice] reopening log ...
Thu Jan  5 06:44:30 2017 [notice] log started

I did notice on startup

Thu Jan  5 06:44:29 2017 [notice] module 'help' added to chain 'disco-extend' 
(order 1 index 18 seq 1)
Thu Jan  5 06:44:29 2017 [notice] version: jabberd sm 2.4.0
Thu Jan  5 06:44:29 2017 [notice] [example.com] configured
Thu Jan  5 06:44:29 2017 [notice] attempting connection to router at 127.0.0.1, 
port=5347
Thu Jan  5 06:44:29 2017 [notice] connection to router established
Thu Jan  5 06:44:29 2017 [notice] sm ready for sessions
Thu Jan  5 06:44:30 2017 [notice] HUP handled. reloading modules...
Thu Jan  5 06:44:30 2017 [notice] modules search path: /usr/pkg/lib/jabberd
Thu Jan  5 06:44:30 2017 [notice] module 'status' added to chain 'sess-start' 
(order 0 index 0 seq 0)
Thu Jan  5 06:44:30 2017 [notice] module 'status' added to chain 'sess-end' 
(order 0 index 0 seq 1)
Thu Jan  5 06:44:30 2017 [notice] module 'iq-last' added to chain 'sess-end' 
(order 1 index 1 seq 0)

The HUP is probably (wildly speculating) because the controlling tty of
the init script was the console, sm didn't detach, and when the init
scripts finished the tty was revoked to clean it up for console login.
But why the HUP happened is minor; the issue is the behavior when it
happened.  I don't see HUP in the router or c2s logs.  But now it makes
sense why it crashes on boot and not later.

After most of an hour of sm running, I sent a HUP, and sm reloaded
modules and stayed up. I logged out and in and on login it crashed.
Same value "status" in c/h.

Here is the backtrace from the crash on boot, which is similar to one I
posted yesterday.  (I have replaced the JID string; but the issue seems
to be the first connection, not this particular user.)

#0  0x080635de in xhash_getx (h=0x74617473, key=0xbb7e334c "storage.path", 
len=12) at xhash.c:174
#1  0x0806364f in xhash_get (h=0x74617473, key=) at xhash.c:187
#2  0x0805c426 in config_get_one (c=0xbb102060, key=0xbb7e334c "storage.path", 
num=0) at config.c:280
#3  0xbb7e258b in storage_add_type (st=0xbb10b040, driver=0xbb119028 "sqlite", 
type=0xbb06cb6a "active") at storage.c:114
#4  0xbb7e2a44 in storage_get (st=0xbb10b040, type=0xbb06cb6a "active", 
owner=0xbb12fc80 "user@examplecom", filter=0x0, os=0xbf7fe258) at storage.c:239
#5  0xbb06ca23 in _active_user_load (mi=0xbb12f660, user=0xbb11b550) at 
mod_active.c:35

You can see that argument h to frames 0 and 1 is suspect (should be a
pointer).  In frame 2, config_get_one has a config_t c which does
contain that value in c->hash.  But then I realized that (char *) >h
(also "c") is "status".

(gdb) print c
$1 = (config_t) 0xbb102060
(gdb) print *c
$2 = {hash = 0x74617473, nad = 0xbb007375}
(gdb) x/s c
0xbb102060:  "status"

In frame 3, storage_t is ok

(gdb) print st
$3 = (storage_t) 0xbb10b040
(gdb) print *st
$4 = {config = 0xbb102060, log = 0xbb102068, drivers = 0xbb119000, types = 
0xbb11a000, default_drv = 0xbb11b040}
(gdb) print *st->log
$5 = {type = log_FILE, file = 0xbb3d94c0}
(gdb) print *st->drivers
$6 = {p = 0xbb103080, prime = 101, dirty = 1, count = 1, zen = 0xbb122800, 
free_list = 0x0, iter_bucket = -1, iter_node = 0x0, stat = 0x0}
(gdb) print *st->types
$7 = {p = 0xbb1030c0, prime = 101, dirty = 0, count = 0, zen = 0xbb123000, 
free_list = 0x0, iter_bucket = -1, iter_node = 0x0, stat = 0x0}

My next step would be some guard for config_t, and to turn on the
existing guards.

But, I

Re: mu-conference, gna, future

2017-09-19 Thread Greg Troxel

  Does anybody know what happened or is happening with GNA?  Might it be
  coming back?

Apparently it went away quite a while ago, and it's not coming back.

  Is anybody still using mu-conference?  (I'm not personally, but do
  know of one significant-size installation.)

No one spoke up.

  Is there a better way to do chatrooms with jabberd2?

No insight...

  Is anyone interested in continuing to maintain mu-conference?

No one spoke up.

  I am sort of inclined to create a project on gitlab (because it seems
  too much work to ask for nongnu.savannah.org, both for me and for the
  FSF staff, and I don't see that FSF hosting mu-conference advances the
  goals of Free Software enough to be worth their resources).
  Encouragement?  Objections?

Given the total silence, I have decided not to do anything about
mu-conference.  I have removed the pkgsrc package (because there is not
functioning upstream and the last release is old).


signature.asc
Description: PGP signature


mu-conference, gna, future

2017-09-03 Thread Greg Troxel

mu-conference is an implementation of xmpp conference rooms.  It worked
with jabberd 1.4 (and might still) and works with jabberd 2.  It might
be the standard approach for conferencing with jabberd2:
  https://github.com/jabberd2/jabberd2/wiki/InstallGuide-MU-Conferencing

Around 2013 hosting moved to gna.org (FSFE?), both repo and ; I'm not sure 
where it was
before.

At some point I became one of the upstream maintainers and in 2013
released 0.8.81 to fix a timing-dependent startup bug as well as to get
a number of other features in a release.  Since then, there hasn't been
a vast outcry about bugs and it seems to have been mostly stable.

Most recently, it was hosted at gna.org, which I think used to be run by
FSFE, and has apparently disappeared.

Questions arising:

  Does anybody know what happened or is happening with GNA?  Might it be
  coming back?

  Is anybody still using mu-conference?  (I'm not personally, but do
  know of one significant-size installation.)

  Is there a better way to do chatrooms with jabberd2?

  Is anyone interested in continuing to maintain mu-conference?

  I am sort of inclined to create a project on gitlab (because it seems
  too much work to ask for nongnu.savannah.org, both for me and for the
  FSF staff, and I don't see that FSF hosting mu-conference advances the
  goals of Free Software enough to be worth their resources).
  Encouragement?  Objections?

  Anything else?
  


signature.asc
Description: PGP signature