Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes

2011-12-19 Thread Chen Jie
Hi,

2011/12/17 Michal Schmidt mschm...@redhat.com:
 Chen Jie wrote:
 Let me explain it further.
 First syslogv() in dbus may block, while rsyslog.service is starting
 and meanwhile the kernel socket buffer was full.
 Attachment syslog-test.c was a program simulates the situation.

 So while rsyslog.service is starting, on the systemd side:
 1. [ExecStartPre] stops systemd-kmsg-syslogd.service
 2. It now running bus_init() to publish its API to dbus
 3. Step 2 timeout, because dbus-daemon didn't reply in time.

 Why dbus-daemon didn't response quickly? Because it blocked on
 syslogv(), which was waiting for someone consumes the message(the
 kernel socket buffer was full), but sadly the consumer -- rsyslog
 didn't be started because systemd blocked.

 So systemd waits for dbus, and dbus waits for startup of rsyslog,
 rsyslog waits for systemd to start it.

 Thanks for debugging this.
 You are right. This kind of a deadlock is possible.

 Does the deadlock go away if you just modify rsyslog.service so that
 instead of stopping systemd-kmsg-syslogd in ExecStartPre it would
 do it in ExecStartPost?
Yeah, it do the trick.


 IMHO, We need to put functions that may block in separate threads,
 for
 example bus_init(), shutdown_connection(), log_meta().

 Oh, I'm sure this works, but I'm scared of the additional
 complexity of threads.
Agree, but IMHO, from the architectural point of view, smooth running
of systemd should not depend on quick response of outside
processes(rsyslogd, dbus).



Regards,
-- Chen Jie
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes

2011-12-19 Thread Chen Jie
Hi,

2011/12/20 Michal Schmidt mschm...@redhat.com:
 Michal Schmidt wrote:
 systemd at least partially defends itself from a hanging syslog
 by setting SO_SNDTIMEO to 5 seconds. Maybe even that is too much and
 we could just make the socket completely non-blocking and just
 fallback to kmsg when we get EAGAIN.

 I believe there is a way to connect and register to DBus fully
 asynchronously too. I'm looking into this.

 It's now implemented in git. Please give it a try.
Thank you for the work.

Just as the commit message mentioned --
dbus_connection_open_private() itself is still synchronous, I did
find some dbus_connection_open_private() failures in my debugging
before(though at a very low occurrence rate). -- that's why I chose
the thread way.

Also, the shutdown_connection() may block in theory, since it invokes
dbus_connection_flush()



Regards,
-- Chen Jie
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes

2011-12-14 Thread Chen Jie
2011/12/2 Chen Jie ch...@lemote.com:
 Hey,
 After did many experiments, I found it was one iteration of
 manager_loop spent ~25s:
 * dbus.service: service_enter_start_post()
  - service_enter_running()
  -- service_set_state(s, SERVICE_RUNNING)
  --- unit_notify()
   bus_init() - bus_init_api()
 bus_ini_api()  failed to add_match for 'NameOwnerChanged' (reason:
 timeout, and the default timeout value in dbus lib was 25s)

 In another hand, dbus-daemon failed to reply in time because it tried
 to log a message, meanwhile rsyslog.service was starting:
 1. In ExecStartPre stage(/bin/systemctl stop
 systemd-kmsg-syslogd.service), systemctl failed to receive replies in
 time because the manager_loop of systemd was blocked!   or
 2. After ExecStartPre stage, manager_loop was blocked for 25s, and
 then do ExecStart.

 Then dbus-daemon prompted Failed to activate service
 'org.freedesktop.systemd1': timed out, and then entered a busy-loop
 in pending_activation_failed()
 http://cgit.freedesktop.org/dbus/dbus/tree/bus/activation.c?id=7dbfa45153167ee53487630f8109d32860cb6263#n1280
 (An _dbus_return_val_if_fail(reply_serial != 0, FALSE)  failure was
 observed in gdb, still don't how this happened)

Let me explain it further.
First syslogv() in dbus may block, while rsyslog.service is starting
and meanwhile the kernel socket buffer was full.
Attachment syslog-test.c was a program simulates the situation.

So while rsyslog.service is starting, on the systemd side:
1. [ExecStartPre] stops systemd-kmsg-syslogd.service
2. It now running bus_init() to publish its API to dbus
3. Step 2 timeout, because dbus-daemon didn't reply in time.

Why dbus-daemon didn't response quickly? Because it blocked on
syslogv(), which was waiting for someone consumes the message(the
kernel socket buffer was full), but sadly the consumer -- rsyslog
didn't be started because systemd blocked.

So systemd waits for dbus, and dbus waits for startup of rsyslog,
rsyslog waits for systemd to start it.

IMHO, We need to put functions that may block in separate threads, for
example bus_init(), shutdown_connection(), log_meta().

Attachment 0001-Add-new-API-manager_add_work_thread.patch and
0002-dbus.c-hooks-to-the-manager_add_work_thread-API.patch are two
patches that make bus_init run in thread and hence not block the
manager_loop.



Regards,
-- Chen Jie
#include stdio.h
#include unistd.h
#include syslog.h
#include sys/time.h
#include stdint.h

int main(void)
{
	int c, i;
	struct timeval tv1, tv2;

	openlog(slog, LOG_PID | LOG_PERROR, LOG_DAEMON);

	printf(Now edit /lib/systemd/system/rsyslog.service.\n
	   Add a line \ExecStartPre=/bin/sleep 30\ just after the original ExecStartPre\n
	   Run \systemctl daemon-reload  systemctl restart rsyslog.service\\n
	   Then press enter to continue? See how many time it may spend on sending a log\n);
	c = getc(stdin);

	printf(Starting the test...\n);
	for (i = 0; i  20; i++) {
		gettimeofday(tv1, NULL);
		syslog(LOG_INFO, iteration %d, i);
		gettimeofday(tv2, NULL);

		if (tv2.tv_sec - tv1.tv_sec  2) {
			printf(In %d iteration: syslog() spend more than 2 secs (%lu,%lu) - (%lu,%lu)\n,
			   i, tv1.tv_sec, tv1.tv_usec, tv2.tv_sec, tv2.tv_usec);
			break;
		}
	}
	printf(Test finished.\n);

	closelog();

	return 0;
}
From 78a56b263844b1db7fb29a7f6c0c4a4b0d07ad80 Mon Sep 17 00:00:00 2001
From: cee1 fykc...@gmail.com
Date: Wed, 14 Dec 2011 14:25:39 +0800
Subject: [PATCH 1/2] Add new API: manager_add_work_thread()

The work thread was used to finish work in a seperate thread.
It can be specified a notify function, which will be called in manager_loop,
When process watch of type WATCH_WORK_THREAD.
---
 src/manager.c |  197 -
 src/manager.h |   22 ++-
 2 files changed, 217 insertions(+), 2 deletions(-)

diff --git a/src/manager.c b/src/manager.c
index 67a..387e130 100644
--- a/src/manager.c
+++ b/src/manager.c
@@ -36,6 +36,7 @@
 #include sys/types.h
 #include sys/stat.h
 #include dirent.h
+#include pthread.h
 
 #ifdef HAVE_AUDIT
 #include libaudit.h
@@ -71,6 +72,8 @@
 #define NOTIFY_SOCKET_SYSTEM /run/systemd/notify
 #define NOTIFY_SOCKET_USER @/org/freedesktop/systemd1/notify
 
+static void wait_all_work_threads_done(Manager *m);
+
 static int manager_setup_notify(Manager *m) {
 union {
 struct sockaddr sa;
@@ -217,6 +220,27 @@ static int manager_setup_signals(Manager *m) {
 return 0;
 }
 
+static int manager_setup_work_threads(Manager *m) {
+int p[2] = { -1, -1 };
+struct epoll_event ev;
+
+if (pipe2(p, O_NONBLOCK | O_CLOEXEC)  0)
+return -errno;
+
+m-work_thread_watch.type = WATCH_WORK_THREAD;
+m-work_thread_watch.fd = p[0];
+m-work_thread_notify_fd = p[1];
+
+zero(ev);
+ev.events = EPOLLIN;
+ev.data.ptr = m-work_thread_watch;
+
+if (epoll_ctl(m-epoll_fd, EPOLL_CTL_ADD, m-work_thread_watch.fd, ev)  0

Re: [systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes

2011-12-01 Thread Chen Jie
Hi,

Thanks for the reply.

2011/11/30 Michal Schmidt mschm...@redhat.com:
 On 11/30/2011 10:49 AM, Chen Jie wrote:

 (see the full syslog at

 http://lists.freedesktop.org/archives/dbus/attachments/2025/e9c204bb/attachment-0001.obj)


 The kernel modules failing to load indicate a problem with your kernel
 installation. Please sort this problem out first before attempting to debug
 anything else.
I didn't install the modules of matched version for convenient. I've
built-in basic modules, and this problem can be easily reproduced on a
sanity kernel (just do a repeat reboot).

I added more logs, and found two paths may cause the
bus_context_log() to spend ~25s -- then Failed to add_match for
'NameOwnerChanged' in bus_init_api():
1. ExecStartPre=/bin/systemctl stop systemd-kmsg-syslogd.service
spent too much time in need_daemon_reload(), it encountered timeout at
http://cgit.freedesktop.org/systemd/tree/src/systemctl.c?id=v29#n1080

2. In http://cgit.freedesktop.org/systemd/tree/src/manager.c?id=v29#n2051,
a previous UNIT_VTABLE(u)-sigchld_event spent too much time
((1322740559, 397870)-(1322740585,528000))



 BTW, the gap between systemd-kmsg-syslogd.service stopped and rsyslog
 not ready will make it lost some log messages.


 No, the socket remains open all the time and the messages will be buffered.
Got it, I begin to remember the socket-activation mechanism. But I do
lost some messages in systemctl(ExecStartPre of rsyslog.service), and
after I adding a log_set_target(LOG_TARGET_KMSG), the messages appear
in syslog. Maybe log_parse_environment() set log target to 'console'?



Regards,
-- Chen Jie
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd+dbus: system boot stops at terminal login screen sometimes

2011-11-30 Thread Chen Jie
Hi all,

I found my system boot stops at terminal login screen sometimes under
systemd v29 + dbus 1.4.16. I tried to login, after supplied username
and password, it blocked with a blinking cursor.

A primary debug shows:
* dbus-daemon was doing a busy-loop in pending_activation_failed()
(see http://lists.freedesktop.org/archives/dbus/2011-November/014790.html)
* There was a message Failed to activate service
'org.freedesktop.systemd1': timed out in syslog (see the full syslog
at 
http://lists.freedesktop.org/archives/dbus/attachments/2025/e9c204bb/attachment-0001.obj)

After I added more log funcs with gettimeofday generated timestamp
(tv_sec, tv_usec), I found:
* bus_init_api() calling dbus_bus_add_match failed with timeout -- the
function starts at (1322641754,371276), ends at  (1322641779,396943)
* In dbus side, I found it spends ~25s to send a log to syslog:
timestamp: (1322641754,372197)
bus_context_log (activation-context,
   DBUS_SYSTEM_LOG_INFO, Activating
systemd to hand-off: service name='%s' unit='%s',
   service_name,
   entry-systemd_service);//
bus_activation_activate_service() @bus/activation.c
timestamp: (1322641780,555)

I guess there was something wrong with startup of rsyslog.service, and I found:
* (1322641752,26061): [rsyslog.service]service_enter_start_pre()
* (1322641753,382491): systemd-kmsg-syslogd[705]: systemd-kmsg-syslogd
stopped as pid 705
* (1322641779,692927): [rsyslog.service]service_enter_start()

Start rsyslog.service will stop systemd-kmsg-syslogd.service first, so
I added more log funcs in systemctl:
* systemctl[1216]: start_unit_one(): (1322641753,229083)-(1322641753,383118)
* systemctl[1216]: wait_filter() (1322641753,465392)Got D-Bus request:
org.freedesktop.systemd1.Manager.JobRemoved() on
/org/freedesktop/systemd1
* systemctl[1216]: wait_for_jobs(): (1322641753,468987) finished
while (!set_isempty(s) dbus_connection_read_write_dispatch(bus,
-1))
systemctl stop systemd-kmsg-syslogd.service didn't much time.


The content of rsyslog.service:
[Unit]
Description=System Logging Service

[Service]
ExecStartPre=/bin/systemctl stop systemd-kmsg-syslogd.service
ExecStart=/usr/sbin/rsyslogd -n -c5
Sockets=syslog.socket

[Install]
WantedBy=multi-user.target


Any idea?

BTW, the gap between systemd-kmsg-syslogd.service stopped and rsyslog
not ready will make it lost some log messages.



Regards,
--- Chen Jie
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Question about Job dev-tty1.device/start timed out ?

2011-10-20 Thread Chen Jie
Hi all,

I occasionally got no getty on tty[1-6]. When I went through the dmesg and
found the following:

[   40.679687] 28systemd[1]: Job dev-tty1.device/start timed out.
[   40.683593] 31systemd[1]: Job dev-tty1.device/start finished,
result=timeout
[   40.683593] 31systemd[1]: Job getty@tty1.service/start finished,
result=dependency
[   40.859375] 29systemd[1]: Job getty@tty1.service/start failed with
result 'dependency'.
[   40.925781] 29systemd[1]: Job dev-tty1.device/start failed with result
'timeout'.
[   40.988281] 31systemd[1]: getty.target changed dead - active
[   41.046875] 31systemd[1]: Job getty.target/start finished, result=done


What may cause dev-ttyN.device timed out?




Regards,
Chen Jie
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] udev took too long to start

2011-05-22 Thread Chen Jie
2011/5/20 Lennart Poettering lenn...@poettering.net:
 On Fri, 20.05.11 17:40, Chen Jie (ch...@lemote.com) wrote:

 2011/5/19 Chen Jie ch...@lemote.com:
  2011/5/18 Kay Sievers kay.siev...@vrfy.org:
 
  Completely different! That's an Intel Core Duo 1.4 GHz laptop:
   systemd-analyze blame | grep udev
   87ms udev-trigger.service
   13ms udev.service
  I updated systemd(to v26) and udev(to 168), still got ~1s startup time
  of udev.service plus udev-trigger.service.
 
  May be some distribution added udev rules cause this? I'll do a
  bootchart to see the detail.
 See the attachment.

 There are a ton of things udev seems to be calling. A lot of it doesn't
 look right. i.e. systemctl being called from udev doesn't look
 right. And ps? grep? hdparm?? sync??? alsa-utils looks wrong too? mount?
Did some investigation, here is the result:
1. grep is invoked by 80-networking.rules, which is added to help to
parse debian's network setting.
-- dpkg -S /lib/udev/net.agent - udev: /lib/udev/net.agent

2. hdparm is called by z60_hdparm.rules, imported by deb package hdparm.
-- z60_hdparm.rules is a symbol link to /etc/udev/hdparm.rules

3. alsa-utils is called by 80-alsa.rules, imported by deb package alsa-utils.
-- dpkg -S /lib/udev/alsa-utils - alsa-utils: /lib/udev/alsa-utils

4. mount is called by 50-udev-default.rules, I guess it was shipped
with vanilla udev, since the same rule exists on my gentoo box...
-- 50-udev-default.rules:102:  RUN+=/bin/mount -t fusectl fusectl
/sys/fs/fuse/connections


 The exim script looks really borked too. And the mkdir/rm sprinkled
 around dbus/rsyslog is suspicous too.

 I think the distro you are using is a bit too hack-happy... Your
 downstream udev maintainers really should spend some time on cleaning up
 those udev rules. Upstream udev doesn't ship that nonsense!

 Lennart

 --
 Lennart Poettering - Red Hat, Inc.
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] udev took too long to start

2011-05-19 Thread Chen Jie
2011/5/18 Kay Sievers kay.siev...@vrfy.org:

 Completely different! That's an Intel Core Duo 1.4 GHz laptop:
  systemd-analyze blame | grep udev
  87ms udev-trigger.service
  13ms udev.service
I updated systemd(to v26) and udev(to 168), still got ~1s startup time
of udev.service plus udev-trigger.service.

May be some distribution added udev rules cause this? I'll do a
bootchart to see the detail.


 Care to provide the full output of:
  time udevadm test /class/block/sda1
 ?
 real    0m0.146s
 user    0m0.084s
 sys     0m0.048s
 See udevadm_sda2.txt in detail.

 It seems not slow. But it looks a bit weird. You are sure there is
 nothing missing in the file?
It was gathered by redirecting the output, shouldn't miss anything...
What's wrong with the log? No scsi_id invocation message present?

 We never call scsi_id, ata_id in your output, but get the links from it?
  udev_rules_new: rules use 153420 bytes tokens (12785 * 12 bytes),
 29178 bytes buffer
  ...
  udev_rules_apply_to_event: LINK
 'disk/by-id/scsi-SATA_SAMSUNG_HM121HCS12SJD0S208504-part2'
 /lib/udev/rules.d/60-persistent-storage.rules:99
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] systemd-analyze in git

2011-04-14 Thread Chen Jie
Good utility, thanks.

2011/3/31 Lennart Poettering lenn...@poettering.net

 Heya,

 I just added systemd-analyze to git. It's a little Python tool which
 can help you figuring out why your boot is not as fast as it could be.

 If you run systemd-analyze blame it will print a list of all units
 that have been started since boot, ordered by the time they took to
 start up. If your service appears right at the top of the list, then
 there's probably something to fix. But don't misunderstand this tool. It
 will simply look how long each service took to start, it doesn't give
 you any hint on why it did take so long. For example it could be because
 the service was waiting for another service or for user input (for
 example in the cryptsetup case). So, take the output with a grain of
 salt and then it is quite interesting. Let the blame game begin!

 If you run systemd-analyze plot  plot.svg  eog plot.svg then you'll
 look at a graphical plot of the boot processes showing how long we had
 to had to wait for which unit to start up. The startup time will be
 visible as a bright red bar. The runtime of a unit will be shown in
 light pink. The output is a bit like bootchart, but on a very different
 level, as this tells you something about the ordering and the units in
 systemd. To optimize your boot process you probably want to look at both
 outputs.

 For this to work properly you need to run systemd from git. Older
 versions of systemd do not keep track of timestamps properly, and this
 is fixed in git.

 On an older X300 with SSD and a reasonably complete Linux installation
 systemd needs less than 1s to spawn all services at boot and enter idle
 mode. If your installation takes longer, then you probably have a lot of
 room for optimization. (1s to idle does not mean the bootup was
 complete. It just means systemd started all processes it needed to start
 and the machine might still be busy with those. But early boot/fsck and
 stuff is all finished as are all SysV scripts, and graphical.target is
 reached).

 Lennart

 --
 Lennart Poettering - Red Hat, Inc.
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Improve boot-time of systemd

2011-03-19 Thread Chen Jie
2011/3/18 Kay Sievers kay.siev...@vrfy.org:
 On Fri, Mar 18, 2011 at 14:40, Gustavo Sverzut Barbieri
 barbi...@profusion.mobi wrote:
 On Fri, Mar 18, 2011 at 10:52 AM, Andrey Borzenkov arvidj...@mail.ru wrote:
 On Fri, Mar 18, 2011 at 12:35 PM, fykc...@gmail.com fykc...@gmail.com 
 wrote:

 1. What can readahead affect boot-time?
 Sadly observed negative affect -- boot-time increases at least 1s.

 From subjective feelings (no real measurements) I confirm it.

 I noticed it as well, even with my Macbook Pro i7 with 8Gb RAM and
 128SSD using btrfs.

 It's ~0.5 sec faster here with readahead on a SSD.
Each time runs readahead-replay may cause readahead-collect to record
more blocks to read ahead -- size of /.readahead never reduces.

Also I found /.readahead recorded files like .xession-errors which
is written only, thus should not be read ahead.

Maybe adjust readahead-collect-done.timer to 5s will help.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel