On Mon, 18.02.13 19:52, Holger Hans Peter Freyther (hol...@freyther.de) wrote:
> > On Mon, Feb 18, 2013 at 08:47:44AM -0800, David Strauss wrote: > > Dear David, all, > > > > It's possible that a lot of it could also allocate on the stack or use > > stack-style management of a pool in heap. This is pretty > > performance-critical code, and we're seeing similar CPU overhead. > > I think upstream doesn't care about the CPU overhead of the > journald. I can assure you that that's definitely not true. We do care about performance. That said, the journal is relatively new and obviously not optimized in all details for all machines, and we knowingly and frequently pick the more readable codepath rather than the faster one because readable and manageable code is important to us. You need to understand that on x86 journald (and systemd) are virtually "free" CPU-wise, these days. We develop on x86. And hence optimizing things was never the most important issue for us, because the codepaths never showed up badly anywhere anyway. That we develop on x86 doesn't mean we wouldn't care for embedded CPUs. We do, and we have optimized some things in this area already, but nobody yet gave this love to the journald component of systemd. I mean, udev's entire history is more or less one of optimizations. And in systemd you find quite a few changes that are about optimization and nothing else. So, if you found something that isn't as fast as it should be and shows up in profiling, then we really should optimize that, and am happy to look into this. However, as I don't even possess any machine with a CPU where these issues show up I am not sure I will be able to do this all by myself... For many string operations that might end up in inner loop we try to use alloca() these days, however for some that's really hard to get right, for example for sprintf() related loops... > I browsed through journald-server.c:dispatch_message_real and there are > several things one could do... there are certainly a lot more items. We should really base this all on profiling data. Without that premature optimizations will just make the code unreadable. > 1.) Several things are done over and over again... e.g. calls to the > have_effective_cap.. it is unlikely that it will change after the journald > has started. Does this really show up negatively in profiling runs? > 2.) Avoid allocation part one... > > r = get_process_cmdline(ucred->pid, 0, false, &t); > if (r >= 0) { > cmdline = strappend("_CMDLINE=", t); > free(t); > > if (cmdline) > IOVEC_SET_STRING(iovec[n++], cmdline); > } > > So instead of the strappend (which will do strdup..) one could make sure > that 't' has enough space to prepend the _CMDLINE= or the _EXE, or the > other strings. If this is indeed something that shows up negative in profiling we could just extend get_process_cmdline() with a prefix char* arg, that if non-NULL is prepended implicitly to the returned string. > 3.) Looking at code like... get_process_cmdline > > asprintf(&p, "/proc/%lu/cmdline", (unsigned long) pid) > > So first of all /proc/%lu/ is created over and over again for exe, comm, > cmdline, sessionid, cgroup... With a different API one could easily avoid > this. It would probably be best to just use strings on the stack for this anyway... > 4.) Still at the same method.. > > if (max_length == 0) { > size_t len = 1; > while ((c = getc(f)) != EOF) { > k = realloc(r, len+1); > if (k == NULL) { > > is most likely not the most effective way to handle the allocation. Or > maybe call get_process_cmdline with a max length? Actually, if you look in the glibc sources you will find that it does some exponentional size increasing anyway internally of realloc(). After realizing this we recently droppped a couple of our own exponential realloc() loops in favour of the simpler to read linear realloc(foo, len+1) loops. I'd be really surprised if this would show up in any profiling. If it did glibc's own exponential increas code would not work... > 5.) I wonder if some of the information could be sent from the systemd > to avoid the work in the journald... So it's a bit of a security thing too: the journal distuingishes "trusted" fields from "untrusted" ones. The former are collected by journald and read directly from the kernel and can thus not be faked. The latter are passed in from the clients, and can thus be faked data. The "trusted" fields are prefixed with an "_", and clients can never send fields prefixed with "_". > I will end up commenting out most of dispatch_message_real and check if > the CPU down drops to a low single digit. I would assume it will. This would certainly be interesting. Lennart -- Lennart Poettering - Red Hat, Inc. _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel