Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
Hi, 2011/12/17 Michal Schmidt mschm...@redhat.com: Chen Jie wrote: Let me explain it further. First syslogv() in dbus may block, while rsyslog.service is starting and meanwhile the kernel socket buffer was full. Attachment syslog-test.c was a program simulates the situation. So while rsyslog.service is starting, on the systemd side: 1. [ExecStartPre] stops systemd-kmsg-syslogd.service 2. It now running bus_init() to publish its API to dbus 3. Step 2 timeout, because dbus-daemon didn't reply in time. Why dbus-daemon didn't response quickly? Because it blocked on syslogv(), which was waiting for someone consumes the message(the kernel socket buffer was full), but sadly the consumer -- rsyslog didn't be started because systemd blocked. So systemd waits for dbus, and dbus waits for startup of rsyslog, rsyslog waits for systemd to start it. Thanks for debugging this. You are right. This kind of a deadlock is possible. Does the deadlock go away if you just modify rsyslog.service so that instead of stopping systemd-kmsg-syslogd in ExecStartPre it would do it in ExecStartPost? Yeah, it do the trick. IMHO, We need to put functions that may block in separate threads, for example bus_init(), shutdown_connection(), log_meta(). Oh, I'm sure this works, but I'm scared of the additional complexity of threads. Agree, but IMHO, from the architectural point of view, smooth running of systemd should not depend on quick response of outside processes(rsyslogd, dbus). Regards, -- Chen Jie ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
- Original Message - Does the deadlock go away if you just modify rsyslog.service so that instead of stopping systemd-kmsg-syslogd in ExecStartPre it would do it in ExecStartPost? Yeah, it do the trick. Great. People who are hitting this deadlock can use this as a temprorary workaround. IMHO, We need to put functions that may block in separate threads, for example bus_init(), shutdown_connection(), log_meta(). Oh, I'm sure this works, but I'm scared of the additional complexity of threads. Agree, but IMHO, from the architectural point of view, smooth running of systemd should not depend on quick response of outside processes(rsyslogd, dbus). Yes. systemd at least partially defends itself from a hanging syslog by setting SO_SNDTIMEO to 5 seconds. Maybe even that is too much and we could just make the socket completely non-blocking and just fallback to kmsg when we get EAGAIN. I believe there is a way to connect and register to DBus fully asynchronously too. I'm looking into this. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
Michal Schmidt wrote: systemd at least partially defends itself from a hanging syslog by setting SO_SNDTIMEO to 5 seconds. Maybe even that is too much and we could just make the socket completely non-blocking and just fallback to kmsg when we get EAGAIN. I believe there is a way to connect and register to DBus fully asynchronously too. I'm looking into this. It's now implemented in git. Please give it a try. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
Hi, 2011/12/20 Michal Schmidt mschm...@redhat.com: Michal Schmidt wrote: systemd at least partially defends itself from a hanging syslog by setting SO_SNDTIMEO to 5 seconds. Maybe even that is too much and we could just make the socket completely non-blocking and just fallback to kmsg when we get EAGAIN. I believe there is a way to connect and register to DBus fully asynchronously too. I'm looking into this. It's now implemented in git. Please give it a try. Thank you for the work. Just as the commit message mentioned -- dbus_connection_open_private() itself is still synchronous, I did find some dbus_connection_open_private() failures in my debugging before(though at a very low occurrence rate). -- that's why I chose the thread way. Also, the shutdown_connection() may block in theory, since it invokes dbus_connection_flush() Regards, -- Chen Jie ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
Chen Jie wrote: Let me explain it further. First syslogv() in dbus may block, while rsyslog.service is starting and meanwhile the kernel socket buffer was full. Attachment syslog-test.c was a program simulates the situation. So while rsyslog.service is starting, on the systemd side: 1. [ExecStartPre] stops systemd-kmsg-syslogd.service 2. It now running bus_init() to publish its API to dbus 3. Step 2 timeout, because dbus-daemon didn't reply in time. Why dbus-daemon didn't response quickly? Because it blocked on syslogv(), which was waiting for someone consumes the message(the kernel socket buffer was full), but sadly the consumer -- rsyslog didn't be started because systemd blocked. So systemd waits for dbus, and dbus waits for startup of rsyslog, rsyslog waits for systemd to start it. Thanks for debugging this. You are right. This kind of a deadlock is possible. Does the deadlock go away if you just modify rsyslog.service so that instead of stopping systemd-kmsg-syslogd in ExecStartPre it would do it in ExecStartPost? IMHO, We need to put functions that may block in separate threads, for example bus_init(), shutdown_connection(), log_meta(). Oh, I'm sure this works, but I'm scared of the additional complexity of threads. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
2011/12/2 Chen Jie ch...@lemote.com: Hey, After did many experiments, I found it was one iteration of manager_loop spent ~25s: * dbus.service: service_enter_start_post() - service_enter_running() -- service_set_state(s, SERVICE_RUNNING) --- unit_notify() bus_init() - bus_init_api() bus_ini_api() failed to add_match for 'NameOwnerChanged' (reason: timeout, and the default timeout value in dbus lib was 25s) In another hand, dbus-daemon failed to reply in time because it tried to log a message, meanwhile rsyslog.service was starting: 1. In ExecStartPre stage(/bin/systemctl stop systemd-kmsg-syslogd.service), systemctl failed to receive replies in time because the manager_loop of systemd was blocked! or 2. After ExecStartPre stage, manager_loop was blocked for 25s, and then do ExecStart. Then dbus-daemon prompted Failed to activate service 'org.freedesktop.systemd1': timed out, and then entered a busy-loop in pending_activation_failed() http://cgit.freedesktop.org/dbus/dbus/tree/bus/activation.c?id=7dbfa45153167ee53487630f8109d32860cb6263#n1280 (An _dbus_return_val_if_fail(reply_serial != 0, FALSE) failure was observed in gdb, still don't how this happened) Let me explain it further. First syslogv() in dbus may block, while rsyslog.service is starting and meanwhile the kernel socket buffer was full. Attachment syslog-test.c was a program simulates the situation. So while rsyslog.service is starting, on the systemd side: 1. [ExecStartPre] stops systemd-kmsg-syslogd.service 2. It now running bus_init() to publish its API to dbus 3. Step 2 timeout, because dbus-daemon didn't reply in time. Why dbus-daemon didn't response quickly? Because it blocked on syslogv(), which was waiting for someone consumes the message(the kernel socket buffer was full), but sadly the consumer -- rsyslog didn't be started because systemd blocked. So systemd waits for dbus, and dbus waits for startup of rsyslog, rsyslog waits for systemd to start it. IMHO, We need to put functions that may block in separate threads, for example bus_init(), shutdown_connection(), log_meta(). Attachment 0001-Add-new-API-manager_add_work_thread.patch and 0002-dbus.c-hooks-to-the-manager_add_work_thread-API.patch are two patches that make bus_init run in thread and hence not block the manager_loop. Regards, -- Chen Jie #include stdio.h #include unistd.h #include syslog.h #include sys/time.h #include stdint.h int main(void) { int c, i; struct timeval tv1, tv2; openlog(slog, LOG_PID | LOG_PERROR, LOG_DAEMON); printf(Now edit /lib/systemd/system/rsyslog.service.\n Add a line \ExecStartPre=/bin/sleep 30\ just after the original ExecStartPre\n Run \systemctl daemon-reload systemctl restart rsyslog.service\\n Then press enter to continue? See how many time it may spend on sending a log\n); c = getc(stdin); printf(Starting the test...\n); for (i = 0; i 20; i++) { gettimeofday(tv1, NULL); syslog(LOG_INFO, iteration %d, i); gettimeofday(tv2, NULL); if (tv2.tv_sec - tv1.tv_sec 2) { printf(In %d iteration: syslog() spend more than 2 secs (%lu,%lu) - (%lu,%lu)\n, i, tv1.tv_sec, tv1.tv_usec, tv2.tv_sec, tv2.tv_usec); break; } } printf(Test finished.\n); closelog(); return 0; } From 78a56b263844b1db7fb29a7f6c0c4a4b0d07ad80 Mon Sep 17 00:00:00 2001 From: cee1 fykc...@gmail.com Date: Wed, 14 Dec 2011 14:25:39 +0800 Subject: [PATCH 1/2] Add new API: manager_add_work_thread() The work thread was used to finish work in a seperate thread. It can be specified a notify function, which will be called in manager_loop, When process watch of type WATCH_WORK_THREAD. --- src/manager.c | 197 - src/manager.h | 22 ++- 2 files changed, 217 insertions(+), 2 deletions(-) diff --git a/src/manager.c b/src/manager.c index 67a..387e130 100644 --- a/src/manager.c +++ b/src/manager.c @@ -36,6 +36,7 @@ #include sys/types.h #include sys/stat.h #include dirent.h +#include pthread.h #ifdef HAVE_AUDIT #include libaudit.h @@ -71,6 +72,8 @@ #define NOTIFY_SOCKET_SYSTEM /run/systemd/notify #define NOTIFY_SOCKET_USER @/org/freedesktop/systemd1/notify +static void wait_all_work_threads_done(Manager *m); + static int manager_setup_notify(Manager *m) { union { struct sockaddr sa; @@ -217,6 +220,27 @@ static int manager_setup_signals(Manager *m) { return 0; } +static int manager_setup_work_threads(Manager *m) { +int p[2] = { -1, -1 }; +struct epoll_event ev; + +if (pipe2(p, O_NONBLOCK | O_CLOEXEC) 0) +return -errno; + +m-work_thread_watch.type = WATCH_WORK_THREAD; +m-work_thread_watch.fd = p[0]; +m-work_thread_notify_fd = p[1]; + +zero(ev); +ev.events = EPOLLIN; +ev.data.ptr = m-work_thread_watch; + +if (epoll_ctl(m-epoll_fd, EPOLL_CTL_ADD, m-work_thread_watch.fd, ev) 0) +
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
nobody reply? 于 2011年11月30日 17:49, Chen Jie 写道: Hi all, I found my system boot stops at terminal login screen sometimes under systemd v29 + dbus 1.4.16. I tried to login, after supplied username and password, it blocked with a blinking cursor. A primary debug shows: * dbus-daemon was doing a busy-loop in pending_activation_failed() (see http://lists.freedesktop.org/archives/dbus/2011-November/014790.html) * There was a message Failed to activate service 'org.freedesktop.systemd1': timed out in syslog (see the full syslog at http://lists.freedesktop.org/archives/dbus/attachments/2025/e9c204bb/attachment-0001.obj) After I added more log funcs with gettimeofday generated timestamp (tv_sec, tv_usec), I found: * bus_init_api() calling dbus_bus_add_match failed with timeout -- the function starts at (1322641754,371276), ends at (1322641779,396943) * In dbus side, I found it spends ~25s to send a log to syslog: timestamp: (1322641754,372197) bus_context_log (activation-context, DBUS_SYSTEM_LOG_INFO, Activating systemd to hand-off: service name='%s' unit='%s', service_name, entry-systemd_service);// bus_activation_activate_service() @bus/activation.c timestamp: (1322641780,555) I guess there was something wrong with startup of rsyslog.service, and I found: * (1322641752,26061): [rsyslog.service]service_enter_start_pre() * (1322641753,382491): systemd-kmsg-syslogd[705]: systemd-kmsg-syslogd stopped as pid 705 * (1322641779,692927): [rsyslog.service]service_enter_start() Start rsyslog.service will stop systemd-kmsg-syslogd.service first, so I added more log funcs in systemctl: * systemctl[1216]: start_unit_one(): (1322641753,229083)-(1322641753,383118) * systemctl[1216]: wait_filter() (1322641753,465392)Got D-Bus request: org.freedesktop.systemd1.Manager.JobRemoved() on /org/freedesktop/systemd1 * systemctl[1216]: wait_for_jobs(): (1322641753,468987) finished while (!set_isempty(s) dbus_connection_read_write_dispatch(bus, -1)) systemctl stop systemd-kmsg-syslogd.service didn't much time. The content of rsyslog.service: [Unit] Description=System Logging Service [Service] ExecStartPre=/bin/systemctl stop systemd-kmsg-syslogd.service ExecStart=/usr/sbin/rsyslogd -n -c5 Sockets=syslog.socket [Install] WantedBy=multi-user.target Any idea? BTW, the gap between systemd-kmsg-syslogd.service stopped and rsyslog not ready will make it lost some log messages. Regards, --- Chen Jie ___ dbus mailing list d...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dbus -- Best Regards Tang Ke (Application develop of software department) Tel:0086-512-52308628 Fax:0086-512-52308688 Phone:18962393077 E-mail:ta...@lemote.com MSN:ta...@lemote.com, mumut...@gmail.com Web: http://www.lemote.com http://www.lemote.com/ JiangSu ZhongKe Lemote Technology Co.,Ltd MengLan Industry Park,YuShan,ChangShu City,JiangSu,China attachment: tangk.vcf___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
Hi, Thanks for the reply. 2011/11/30 Michal Schmidt mschm...@redhat.com: On 11/30/2011 10:49 AM, Chen Jie wrote: (see the full syslog at http://lists.freedesktop.org/archives/dbus/attachments/2025/e9c204bb/attachment-0001.obj) The kernel modules failing to load indicate a problem with your kernel installation. Please sort this problem out first before attempting to debug anything else. I didn't install the modules of matched version for convenient. I've built-in basic modules, and this problem can be easily reproduced on a sanity kernel (just do a repeat reboot). I added more logs, and found two paths may cause the bus_context_log() to spend ~25s -- then Failed to add_match for 'NameOwnerChanged' in bus_init_api(): 1. ExecStartPre=/bin/systemctl stop systemd-kmsg-syslogd.service spent too much time in need_daemon_reload(), it encountered timeout at http://cgit.freedesktop.org/systemd/tree/src/systemctl.c?id=v29#n1080 2. In http://cgit.freedesktop.org/systemd/tree/src/manager.c?id=v29#n2051, a previous UNIT_VTABLE(u)-sigchld_event spent too much time ((1322740559, 397870)-(1322740585,528000)) BTW, the gap between systemd-kmsg-syslogd.service stopped and rsyslog not ready will make it lost some log messages. No, the socket remains open all the time and the messages will be buffered. Got it, I begin to remember the socket-activation mechanism. But I do lost some messages in systemctl(ExecStartPre of rsyslog.service), and after I adding a log_set_target(LOG_TARGET_KMSG), the messages appear in syslog. Maybe log_parse_environment() set log target to 'console'? Regards, -- Chen Jie ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
Hi all, I found my system boot stops at terminal login screen sometimes under systemd v29 + dbus 1.4.16. I tried to login, after supplied username and password, it blocked with a blinking cursor. A primary debug shows: * dbus-daemon was doing a busy-loop in pending_activation_failed() (see http://lists.freedesktop.org/archives/dbus/2011-November/014790.html) * There was a message Failed to activate service 'org.freedesktop.systemd1': timed out in syslog (see the full syslog at http://lists.freedesktop.org/archives/dbus/attachments/2025/e9c204bb/attachment-0001.obj) After I added more log funcs with gettimeofday generated timestamp (tv_sec, tv_usec), I found: * bus_init_api() calling dbus_bus_add_match failed with timeout -- the function starts at (1322641754,371276), ends at (1322641779,396943) * In dbus side, I found it spends ~25s to send a log to syslog: timestamp: (1322641754,372197) bus_context_log (activation-context, DBUS_SYSTEM_LOG_INFO, Activating systemd to hand-off: service name='%s' unit='%s', service_name, entry-systemd_service);// bus_activation_activate_service() @bus/activation.c timestamp: (1322641780,555) I guess there was something wrong with startup of rsyslog.service, and I found: * (1322641752,26061): [rsyslog.service]service_enter_start_pre() * (1322641753,382491): systemd-kmsg-syslogd[705]: systemd-kmsg-syslogd stopped as pid 705 * (1322641779,692927): [rsyslog.service]service_enter_start() Start rsyslog.service will stop systemd-kmsg-syslogd.service first, so I added more log funcs in systemctl: * systemctl[1216]: start_unit_one(): (1322641753,229083)-(1322641753,383118) * systemctl[1216]: wait_filter() (1322641753,465392)Got D-Bus request: org.freedesktop.systemd1.Manager.JobRemoved() on /org/freedesktop/systemd1 * systemctl[1216]: wait_for_jobs(): (1322641753,468987) finished while (!set_isempty(s) dbus_connection_read_write_dispatch(bus, -1)) systemctl stop systemd-kmsg-syslogd.service didn't much time. The content of rsyslog.service: [Unit] Description=System Logging Service [Service] ExecStartPre=/bin/systemctl stop systemd-kmsg-syslogd.service ExecStart=/usr/sbin/rsyslogd -n -c5 Sockets=syslog.socket [Install] WantedBy=multi-user.target Any idea? BTW, the gap between systemd-kmsg-syslogd.service stopped and rsyslog not ready will make it lost some log messages. Regards, --- Chen Jie ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes
On 11/30/2011 10:49 AM, Chen Jie wrote: (see the full syslog at http://lists.freedesktop.org/archives/dbus/attachments/2025/e9c204bb/attachment-0001.obj) The kernel modules failing to load indicate a problem with your kernel installation. Please sort this problem out first before attempting to debug anything else. BTW, the gap between systemd-kmsg-syslogd.service stopped and rsyslog not ready will make it lost some log messages. No, the socket remains open all the time and the messages will be buffered. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel