The server had an uptime of about 50 days before this occurred. There were no problems and nothing has changed in the 2 or so days since this problem began. Like had said previously, it seems to have occurred since reflashing and re-registering a student's XO, but I believe that to be a coincidence.
> - Are you perhaps using an AP that does its own DHCP? One way to > check for certain is to connect an XO, and then grep /var/lib/dhcpd/ > (or is it /var/spool/dhcpd/ ?) for the MAC address of the XO.... We are using 5 wireless AP's. 4 of which are Linksys WRT54G's running DD-WRT and one is a D-Link modem/AP combo. DHCP is deactivated on all of the above. > - Did you also leave XOs running connected to it, or were XOs > completely disconnected? I believe all XO's were disconnected. It is possible some were left connected while in their charging cabinets, but doubtful. >Is there anything else that could be odd or non-standard in your >setup? Are you in a VM? Is eth0 on the XS configured via dhcp with a >short lease? Is there anything in the network between the XOs and the >XS? Nothing non-standard really. eth0 is fixed. Although, this server came pre-installed from the folks involved with the Give One Get One program in Rwanda. I'm not sure what was modified from the stock server install. I am debating reinstalling the server from scratch. I haven't been paying as much attention to the server lately as I should. As it had been running for about 50 days, I only checked in with the school periodically. There were problems but mainly in relation to the presence service and reliably connecting 30 - 100 laptops to the network at one time. I attribute this behavior to the Linksys AP's as they only seem to handle about 20 connections per AP reliably. There is also a good amount of wireless interference to contend with; however, the server was working well. As it is a bit under-powered, load averages generally stay within the 1.2-1.5 range. As I write this, the server has an uptime of about 9 hours. Load averages have reached 25 across the board. The dump files have consumed over a gig of space filling up the root partition. >while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd; >ejabberdctl connected-users | wc-l) >> mylog ; sleep 60 ; done; Tried the script at night with the high load, and it cannot complete as the ejabberd node has since crashed. ejabberdctl yields the following error: _________________________________________________________________________ RPC failed on the node ejabb...@schoolserver: {'EXIT', {badarg, [{ets,lookup, [hooks, {ejabberd_ctl_process, global}]}, {ejabberd_hooks,run_fold,4}, {ejabberd_ctl,process,1}, {rpc, '-handle_call/3-fun-0-', 5}]}} __________________________________________________________________ Individually issuing the commands: # vmstat Thu Dec 17 20:07:19 UTC 2009 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 25 0 705768 63912 123132 239040 53 92 153 711 1089 539 61 38 0 1 0 # ps_mem.py | grep ejabberd No output I've included a screenshot of htop for your viewing pleasure. http://omploader.org/vMzBvZQ/htop_screen.jpg I'll give you more relevant info tomorrow. On Thu, Dec 17, 2009 at 12:16 PM, Martin Langhoff <martin.langh...@gmail.com > wrote: > On Thu, Dec 17, 2009 at 1:12 PM, Martin Langhoff > <martin.langh...@gmail.com> wrote > > On Thu, Dec 17, 2009 at 11:35 AM, Devon Connolly <dev...@gmail.com> > wrote: > >> XS Version: 0.6 > >> 1 GB Physical Ram, 2GB Swap > > > > Ok - the RAM is on the low side for an XS but should handle 150 ok. > > > >> # ejabberdctl connected-users > > ... > > I counted 12 lines in the output of connected-users. That should not > > cause trouble. > > Also - can you get your hands on ps_mem.py, and run it when the > machine is getting into trouble? I want to correlate the output of > ps_mem.py for ejabberd vs the number of connected users, run something > like this on a console > > while true; do (echo `date -u `; vmstat; ps_mem.py | grep ejabberd; > ejabberdctl connected-users | wc-l) >> mylog ; sleep 60 ; done; > > untested, may need tweaking to work properly. If you run it during the > day and also during the night, will be most interesting. > > cheers, > > > m > -- > martin.langh...@gmail.com > mar...@laptop.org -- School Server Architect > - ask interesting questions > - don't get distracted with shiny stuff - working code first > - http://wiki.laptop.org/go/User:Martinlanghoff >
_______________________________________________ Server-devel mailing list Server-devel@lists.laptop.org http://lists.laptop.org/listinfo/server-devel