On 11-07-11 03:29 PM, [email protected] wrote:
> > we do have some logic in the imfile to allow it to combine multiple line
> logs into one log entry,

Interesting.

> it has a problem right now that it doesn't
> sanitize the newlines so when you forward it things will probably get
> broken up again.

OK.

One thing I did notice when I sent a multiline message to rsyslogd just
using logger is that it converted the newline to a literal "#012".  Is
that what rsyslogd does with a multiline message?  Is it configurable?
Do most people really just have long syslog lines with a bunch of
"#012"s in them?

most people don't put multi-line messages in syslog :-) technically the syslog spec says that a new line starts a new message.

rsyslog defaults to escaping all control characters, so in a case where you do manage to get newlines inside a message, they should be escaped to #012 so that everything you send the message to will not break up the message.

> similar logic could probably be added to the kmesg input, but it is
> going to be tricky.

Yeah.  I don't think I imagined it would always be perfect.

> the problem that you have is how to clearly identify the beginning of a
> new message.

Indeed.  I was thinking of using time as a heuristic.  Perhaps since for
a given spate of messages, like an OOM or other stack trace, the
messages all come in so tightly packed (in terms of time) I wonder if
can be reasonable to allow an amount of time to expire to mark the end
of a multiline message.

the problem is that you can get _lots_ of messages in a short amount of time, and they may not all be related. also, it's actually pretty expensive to lookup the time when you are receiving a message.

> in imfile it can do three things
> > 1. each line is a separate log > > 2. if a line starts with whitespace, it's part of a earlier log entry

An interesting heuristic.  I wonder if we can apply this to the kernel,
or have the kernel fixed to abide by some such indicator.

> 3. log entries are separated by a blank line

Another interesting mechanism.  Although that does mean banning empty
lines elsewhere, not that I know that there are any used anywhere else.

> now, #2 and #3 both mean that the log entry cannot be sent until the
> next log entry arrives

Indeed.  With time however you only have to wait long enough.

> with messages from the kernel, if they have timestamps on the front of
> them added by the kernel, it is going to be hard to tell if it is a
> continuation or not, but without those timestampes (or if you teach
> kmesg input to skip them), I think that approach #2 will work.

Yeah, I need to look more deeply into how the kernel is pushing messages
up to userspace and what they look like in that channel.

> I would suggest dumping dmesg output from a few systems and try running
> them through the imfile mode #2 and see how the results look.

Yeah, that's not going to work just given the first sample I looked at
(sorry for the wrapping -- stupid thunderbird):

[ 7680.300076] INFO: task dbus-daemon:3613 blocked for more than 120
seconds.
[ 7680.300082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 7680.300085] dbus-daemon     D f48035b0     0  3613      1 0x00000000
[ 7680.300091]  eff19dc0 00000086 f4803584 f48035b0 d9ea3af8 000006cf
f081daec c183a8c0
[ 7680.300098]  d9e2bc15 000006cf f081dae8 c183a8c0 c183a8c0 f48068c0
f081d860 f0b80ca0
[ 7680.300104]  c105685c f8eaa4e0 eff19d80 c1509dad eff19dbc f8e99444
0000b4c0 00000028
[ 7680.300111] Call Trace:
[ 7680.300122]  [<c105685c>] ? irq_exit+0x3c/0x80
[ 7680.300139]  [<c1509dad>] ? _raw_spin_lock+0xd/0x10
[ 7680.300166]  [<f8e99444>] ? rpcauth_lookup_credcache+0x134/0x1f0 [sunrpc]
[ 7680.300190]  [<f8f2921d>] ? nfs_access_get_cached+0x5d/0x160 [nfs]
[ 7680.300195]  [<c1508dc6>] __mutex_lock_slowpath+0xd6/0x140
[ 7680.300207]  [<f8f29342>] ? nfs_do_access+0x22/0xc0 [nfs]
[ 7680.300211]  [<c1508935>] mutex_lock+0x25/0x40
[ 7680.300216]  [<c1131ee7>] do_lookup+0x157/0x260
[ 7680.300219]  [<c11327bb>] link_path_walk+0x13b/0xb10
[ 7680.300224]  [<c1249ec3>] ? apparmor_file_alloc_security+0x23/0x50
[ 7680.300227]  [<c1249ec3>] ? apparmor_file_alloc_security+0x23/0x50
[ 7680.300231]  [<c1133404>] do_path_lookup+0x44/0x120
[ 7680.300234]  [<c113414f>] do_filp_open+0x18f/0x6e0
[ 7680.300238]  [<c113384a>] ? user_path_at+0x4a/0x80
[ 7680.300242]  [<c127bc12>] ? copy_to_user+0x42/0x60
[ 7680.300246]  [<c1136d50>] ? filldir+0x0/0xe0
[ 7680.300250]  [<c113fee9>] ? vfsmount_lock_local_unlock+0x19/0x20
[ 7680.300255]  [<c1126736>] do_sys_open+0x56/0x120
[ 7680.300258]  [<c112853d>] ? fput+0x1d/0x30
[ 7680.300261]  [<c112682e>] sys_open+0x2e/0x40
[ 7680.300265]  [<c150a1d4>] syscall_call+0x7/0xb

> I think that the kernel is pretty consistant about having every new
> message start at the beginning of a line, but I'm not sure that it
> indents everything (especially oops and stack trace messages)

Right.  It seems to return to position "0" for the Call Trace: line
before the stack trace as well as other places in what I would consider
a single message.

even with this being 3-4 messages, it would still be far more coherant than all of the lines arriving seprately (especially since separate lines may get reordered in transit)


> even if it doesn't indent them, you could add logic to the kmesg input
> module to watch for,

Indeed, I had wondered about teaching the kmesg input what various
multiline messages look like too, combined with time, it might be fairly
effective.

> but that
> could become very messy (and change from kernel version to kernel
> version),

Agreed.

> if the simple indentation approach does not work, I would
> suggest asking on the kernel mailing list before starting any other work.

Yes, good idea.  I'm not holding my breath, but worth a query.  :-)

do you want to make this query, or do you want me to?

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to