Hi all, I overlooked it, but like I assumed there really was a memory issue and Java invoked the oomkiller:
------------------------------------------------- Nov 17 01:02:51 spacewalk1 kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Nov 17 01:02:51 spacewalk1 kernel: java cpuset=/ mems_allowed=0 Nov 17 01:02:51 spacewalk1 kernel: Pid: 2823, comm: java Not tainted 2.6.32-279.9.1.el6.x86_64 #1 Nov 17 01:02:51 spacewalk1 kernel: Call Trace: Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810c4c71>] ? cpuset_print_task_mems_allowed+0x91/0xb0 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811173e0>] ? dump_header+0x90/0x1b0 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81214a0c>] ? security_real_capable_noaudit+0x3c/0x70 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81117862>] ? oom_kill_process+0x82/0x2a0 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811177a1>] ? select_bad_process+0xe1/0x120 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81117ca0>] ? out_of_memory+0x220/0x3c0 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811279be>] ? __alloc_pages_nodemask+0x89e/0x940 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8115c51a>] ? alloc_pages_current+0xaa/0x110 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff811147e7>] ? __page_cache_alloc+0x87/0x90 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8112a40b>] ? __do_page_cache_readahead+0xdb/0x210 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8112a561>] ? ra_submit+0x21/0x30 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81115b13>] ? filemap_fault+0x4c3/0x500 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8113ef14>] ? __do_fault+0x54/0x510 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8113f4c7>] ? handle_pte_fault+0xf7/0xb50 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810a467e>] ? futex_wake+0x10e/0x120 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81140104>] ? handle_mm_fault+0x1e4/0x2b0 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810a65e0>] ? do_futex+0x100/0xb60 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810444c9>] ? __do_page_fault+0x139/0x480 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81278bec>] ? rb_erase+0x1bc/0x310 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff814fddd0>] ? thread_return+0x4e/0x76e Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff8150380e>] ? do_page_fault+0x3e/0xa0 Nov 17 01:02:51 spacewalk1 kernel: [<ffffffff81500bc5>] ? page_fault+0x25/0x30 ------------------------------------------------- Further: ------------------------------------------------- Nov 17 01:02:51 spacewalk1 kernel: Out of memory: Kill process 2934 (java) score 118 or sacrifice child Nov 17 01:02:51 spacewalk1 kernel: Killed process 2934, UID 0, (java) total-vm:1889112kB, anon-rss:193328kB, file-rss:228kB Nov 17 01:02:51 spacewalk1 wrapper[2909]: JVM exited unexpectedly. Nov 17 01:02:51 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9). Nov 17 01:02:55 spacewalk1 wrapper[2909]: Launching a JVM... Nov 17 01:03:26 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM. Nov 17 01:03:26 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated Nov 17 01:03:26 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9). Nov 17 01:03:31 spacewalk1 wrapper[2909]: Launching a JVM... Nov 17 01:04:00 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM. Nov 17 01:04:00 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated Nov 17 01:04:00 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9). Nov 17 01:04:04 spacewalk1 wrapper[2909]: Launching a JVM... Nov 17 01:04:34 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM. Nov 17 01:04:34 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated Nov 17 01:04:34 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9). Nov 17 01:04:38 spacewalk1 wrapper[2909]: Launching a JVM... Nov 17 01:05:07 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM. Nov 17 01:05:08 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated Nov 17 01:05:08 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9). Nov 17 01:05:12 spacewalk1 wrapper[2909]: Launching a JVM... Nov 17 01:05:41 spacewalk1 wrapper[2909]: Startup failed: Timed out waiting for signal from JVM. Nov 17 01:05:41 spacewalk1 wrapper[2909]: JVM did not exit on request, terminated Nov 17 01:05:41 spacewalk1 wrapper[2909]: JVM exited in response to signal SIGKILL (9). Nov 17 01:05:41 spacewalk1 wrapper[2909]: There were 5 failed launches in a row, each lasting less than 300 seconds. Giving up. Nov 17 01:05:41 spacewalk1 wrapper[2909]: There may be a configuration problem: please check the logs. Nov 17 01:05:41 spacewalk1 wrapper[2909]: <-- Wrapper Stopped ------------------------------------------------- The box has 2GB RAM (what is the minimal requirement according to https://fedorahosted.org/spacewalk/wiki/HowToInstall) and is currently only managing ~10 hosts. So after all, maybe this is a Spacewalk issue. Regards, Wolfgang ----- Original Message ----- From: "Paul Robert Marino" <[email protected]> To: [email protected] Sent: Monday, 19 November, 2012 5:05:04 PM Subject: Re: [Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed well here is the thing some one restarted the database after it was killed by a SIG 9 thats not something that happens on its own. So it was either an admin or a rouge app, either way it wasn't spacewalk. I am curious however if it was on fedora 17 there is a chance systemd may have respawned it but I'm not sure On Mon, Nov 19, 2012 at 10:26 AM, Wolfgang Neudorfer <[email protected]> wrote: > Hello Paul, > > nobody was logged in and the host is only reachable from a very small network > range. I think I can say that nobody did "anything naughty". > > I cannot outrule that there was a memory issue and oomkiller started it's > madness - but I don't see anything related to this in /var/log/messages. > > Any other ideas? > > Regards, > > Wolfgang > > ----- Original Message ----- > From: "Paul Robert Marino" <[email protected]> > To: [email protected] > Sent: Monday, 19 November, 2012 3:35:56 PM > Subject: Re: [Spacewalk-list] Spacewalk 1.7 w/ postgresql crashed > > > > > Postgresql was killed with a -9 which means some one hard killed the process > then restarted it. Looks like some one was doing something naughty on your > box. > This is not a spacewalk problem this is a sysadmin who made a mistake then > didn't fess to it. > On Nov 19, 2012 4:18 AM, "Wolfgang Neudorfer" < [email protected] > wrote: > > > Hi, > > starting Saturday 17/11/2012 01:46, our Spacewalk server started to send out > multiple mails per minute (probably on each connection attempt of a client?) > like this: > > ------------------------------------------------- > RHN TRACEBACK from spacewalk1: > > Exception reported from spacewalk1 > Time: Sat Nov 17 01:45:30 2012 > Exception type <class 'spacewalk.server.rhnSQL.sql_base.SQLConnectError'> > Request object information: > URI: /XMLRPC > Remote Host: 192.168.254.xxx > Server Name: spacewalk1:443 > Headers passed in: > Accept-Encoding: identity > CONTENT_LENGTH: 2325 > CONTENT_TYPE: text/xml > DOCUMENT_ROOT: /var/www/html > GATEWAY_INTERFACE: CGI/1.1 > HTTPS: 1 > HTTP_ACCEPT_ENCODING: identity > HTTP_HOST: spacewalk1 > HTTP_USER_AGENT: rhn.rpclib.py/$Revision$ > HTTP_X_CLIENT_VERSION: 1 > HTTP_X_INFO: RPC Processor (C) Red Hat, Inc (version $Revision$) > HTTP_X_RHN_TRANSPORT_CAPABILITY: follow-redirects=3 > HTTP_X_TRANSPORT_INFO: Extended Capabilities Transport (C) Red Hat, Inc > (version $Revision$) > Host: tsasecspacewalk1.sec > PATH_INFO: > QUERY_STRING: > REMOTE_ADDR: 192.168.254.xxx > REMOTE_PORT: 59649 > REQUEST_METHOD: POST > REQUEST_URI: /XMLRPC > SCRIPT_FILENAME: /usr/share/rhn/wsgi/xmlrpc.py > SCRIPT_NAME: /XMLRPC > SCRIPT_URI: https://tsasecspacewalk1.sec/XMLRPC > SCRIPT_URL: /XMLRPC > SERVER_ADDR: 192.168.254.xxx > SERVER_ADMIN: root@localhost > SERVER_NAME: spacewalk1 > SERVER_PORT: 443 > SERVER_PROTOCOL: HTTP/1.1 > SERVER_SIGNATURE: <address>Apache Server at spacewalk1 Port 443</address> > > SERVER_SOFTWARE: Apache > User-Agent: rhn.rpclib.py/$Revision$ > X-Client-Version: 1 > X-Info: RPC Processor (C) Red Hat, Inc (version $Revision$) > X-RHN-Transport-Capability: follow-redirects=3 > X-Transport-Info: Extended Capabilities Transport (C) Red Hat, Inc (version > $Revision$) > mod_wsgi.application_group: tsasecspacewalk1.sec|/xmlrpc > mod_wsgi.callable_object: application > mod_wsgi.handler_script: > mod_wsgi.input_chunked: 0 > mod_wsgi.listener_host: > mod_wsgi.listener_port: 443 > mod_wsgi.process_group: > mod_wsgi.request_handler: wsgi-script > mod_wsgi.script_reloading: 1 > mod_wsgi.version: (3, 2) > wsgi.errors: <mod_wsgi.Log object at 0x7f8e4a83d370> > wsgi.file_wrapper: <built-in method file_wrapper of mod_wsgi.Adapter object > at 0x7f8e4a83c300> > wsgi.input: <mod_wsgi.Input object at 0x7f8e4a83d330> > wsgi.multiprocess: True > wsgi.multithread: False > wsgi.run_once: False > wsgi.url_scheme: https > wsgi.version: (1, 1) > ------------------------------------------------- > > Apparently, something happend to the postgres server. In the log I see: > > ------------------------------------------------- > LOG: server process (PID 31999) was terminated by signal 9: Killed > LOG: terminating any other active server processes > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > > ... (the last 2 lines appear multiple times) > > FATAL: the database system is in recovery mode > FATAL: the database system is in recovery mode > FATAL: the database system is in recovery mode > FATAL: the database system is in recovery mode > > ... (this line apprears multiple times) > ------------------------------------------------- > > The harddisk was not full, also RAM was ok. I restarted the host and > Spacewalk seems to be fine. I can login an all hosts are there. > > Any hints? I am running Spacewalk 1.7 on CentOS x64 6.3 with PostgresSQL > 8.4.13. > > Thanks, > > Wolfgang > > _______________________________________________ > Spacewalk-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/spacewalk-list > > _______________________________________________ > Spacewalk-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/spacewalk-list > > _______________________________________________ > Spacewalk-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/spacewalk-list _______________________________________________ Spacewalk-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/spacewalk-list _______________________________________________ Spacewalk-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/spacewalk-list
