Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-21 Thread Stephen Hemminger
On Fri, 17 Apr 2015 13:17:12 -0400 (EDT)
David Miller da...@davemloft.net wrote:

 From: Tejun Heo t...@kernel.org
 Date: Fri, 17 Apr 2015 12:28:26 -0400
 
  On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
  If the sender side can wait for retransmission, why can't we use
  userspace programs (e.g. rsyslogd)?
  
  Because the system may be oopsing, ooming or threshing excessively
  rendering the userland inoperable and that's exactly when we want
  those log messages to be transmitted out of the system.
 
 If userland cannot run properly, it is almost certain that neither will
 your complex reliability layer logic.
 
 I tend to agree with Tetsuo, that in-kernel netconsole should remain
 as simple as possible and once it starts to have any smarts and less
 trivial logic the job belongs in userspace.

Keep existing netconsole as simple as possible. It is not meant as
reliable, secure logging.

Those who do not understand TCP are doomed to reinvent it
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-20 Thread David Laight
From: Of Rob Landley
 Sent: 19 April 2015 08:25
 On Thu, Apr 16, 2015 at 6:03 PM, Tejun Heo t...@kernel.org wrote:
  In a lot of configurations, netconsole is a useful way to collect
  system logs; however, all netconsole does is simply emitting UDP
  packets for the raw messages and there's no way for the receiver to
  find out whether the packets were lost and/or reordered in flight.
 
 Except a modern nonsaturated LAN shouldn't do that.
 
 If you have two machines plugged into a hub, and that's _all_ that's
 plugged in, packets should never get dropped. This was the original
 use case of netconsole was that the sender and the receiver were
 plugged into the same router.
 
 However, even on a quite active LAN the days of ethernet doing CDMA
 requiring retransmits are long gone, even 100baseT routers have been
 cacheing and retransmitting data internally so each connection can go
 at the full 11 megabytes/second with the backplane running fast enough
 to keep them all active at the same time. (That's why it's so hard to
 find a _hub_ anymore, it's all routers
...

Most machines are plugged into switches (not routers), many of them
will send 'pause' frames which the host mac may act on.
In which case packet loss is not expected (unless you have broadcast storms
when all bets are off).

Additionally, within a local network you shouldn't really get any packet
loss since no segments should be 100% loaded.
So for testing it is not unreasonable to expect no lost packets in netconsole
traffic.

David




Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-20 Thread Tejun Heo
Hello, Rob.

On Sun, Apr 19, 2015 at 02:25:09AM -0500, Rob Landley wrote:
 If you have two machines plugged into a hub, and that's _all_ that's
 plugged in, packets should never get dropped. This was the original
 use case of netconsole was that the sender and the receiver were
 plugged into the same router.

Development aid on local network hasn't been the only use case for a
very long time now.  I haven't seen too many large scale setups and
two of them were using netconsole as a way to collect kernel messages
cluster-wide and having issues with lost messages.  One was running it
over a separate lower speed network from the main one which they used
for most managerial tasks including deployment and packet losses
weren't that unusual.

The other is running on the same network but the log collector isn't
per-rack so the packets end up getting routed through congested parts
of the network again experiencing messages losses.

 So are you trying to program around a problem you've actually _seen_,
 or are you attempting to reinvent TCP/IP yet again based on top of UDP
 (Drink!) because of a purely theoretical issue?

At larger scale, the problem is very real.  Let's forget about the
reliability part.  The main thing is being able to identify message
sequences so that the receiver can put the message streams back
together.

That said, once that's there, whether the reliability part is done
with TCP doesn't make that much of difference as it'd still need to
put back the two message streams together, but again this doesn't
matter.  Let's just ignore this part.

  printk already keeps log metadata which contains enough information to
  make netconsole reliable.  This patchset does the followings.
 
 Adds a giant amount of complexity without quite explaining why.

The only signficant complexity is on the receiver side and it doesn't
even have to be in the kernel.  CON_EXTENDED and emitting extended
messages are pretty straight-forward changes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-19 Thread Rob Landley
On Thu, Apr 16, 2015 at 6:03 PM, Tejun Heo t...@kernel.org wrote:
 In a lot of configurations, netconsole is a useful way to collect
 system logs; however, all netconsole does is simply emitting UDP
 packets for the raw messages and there's no way for the receiver to
 find out whether the packets were lost and/or reordered in flight.

Except a modern nonsaturated LAN shouldn't do that.

If you have two machines plugged into a hub, and that's _all_ that's
plugged in, packets should never get dropped. This was the original
use case of netconsole was that the sender and the receiver were
plugged into the same router.

However, even on a quite active LAN the days of ethernet doing CDMA
requiring retransmits are long gone, even 100baseT routers have been
cacheing and retransmitting data internally so each connection can go
at the full 11 megabytes/second with the backplane running fast enough
to keep them all active at the same time. (That's why it's so hard to
find a _hub_ anymore, it's all routers. The point of routers is they
cache internally and send the packet only out the connection they
should go, based on an internal routing table because they listened to
incoming packets to figure out who lives where and do arp-like
things.)

And of course gigabit is a point to point protocol that has nothing to
do with conventional ethernet at _all_ other than the name, as far as
I know it can't _not_ do this.

So are you trying to program around a problem you've actually _seen_,
or are you attempting to reinvent TCP/IP yet again based on top of UDP
(Drink!) because of a purely theoretical issue?

Or are you trying to route netconsole, unencapsulated, across the
internet? Of course the internet itself refusing to drop packets but
instead buffering and retransmitting them even when doing so turns out
to be a really bad idea is sort of where this whole bufferbloat
thing came from. So again, even in that context, is this a problem
you've actually _seen_?

 printk already keeps log metadata which contains enough information to
 make netconsole reliable.  This patchset does the followings.

Adds a giant amount of complexity without quite explaining why.

Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-18 Thread Tetsuo Handa
Tejun Heo wrote:
  If we can assume that scheduler is working, adding a kernel thread that
  does
  
while (1) {
read messages with metadata from /dev/kmsg
send them using UDP network
}
  
  might be easier than modifying netconsole module.
 
 But, I mean, if we are gonna do that in kernel, we better do it
 properly where it belongs.  What's up with easier than modifying
 netconsole module?  Why is netconsole special?  And how would the
 above be any less complex than a single timer function?  What am I
 missing?

User space daemon is sometimes disturbed unexpectedly due to

  (a) SIGKILL by OOM-killer
  (b) spurious ptrace() by somebody
  (c) spurious signals such as SIGSTOP / SIGINT
  (d) stalls triggered by page faults under OOM condition
  (e) other problems such as scheduler being not working

We have built-in protection for (a) named /proc/$pid/oom_score_adj , but
we need to configure access control modules for protecting (b) and (c),
and we don't have protection for (d). Thinking from OOM stall discussion,
(d) is fatal when trying to obtain kernel messages under problematic
condition. I thought that a kernel thread that does

  while (1) {
  read messages with metadata from /dev/kmsg
  send them using UDP network
  }

is automatically protected from (a), (b), (c) and (d), and it could be
implemented outside of netconsole module.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tetsuo Handa
Tejun Heo wrote:
 On Sat, Apr 18, 2015 at 03:03:46AM +0900, Tetsuo Handa wrote:
  packet will be sufficient for finding out whether the packets were lost 
  and/or
  reordered in flight.
  
printk(Hello);
 = netconsole sends  Hello using UDP
printk(netconsole);
 = netconsole sends 0001 netconsole using UDP
printk(world\n);
 = netconsole sends 0002 world\n using UDP
  
  It might be nice to allow administrator to prefix a sequence number
  to netconsole messages for those who are using special receiver
  program (e.g. ncrx) which checks that sequence number.
 
 That said, this is pretty much what the first 12 patches do (except
 for the last printk patch, which can be taken out).  We already have
 sequencing and established format to expose them to userland - try cat
 /dev/kmsg, which btw is what local loggers on modern systems use
 anyway.  Why introduce netconsole's own version of metadata?

I didn't mean to introduce netconsole's own version of metadata.
I meant we don't need to implement in-kernel retry logic.

If we can assume that scheduler is working, adding a kernel thread that
does

  while (1) {
  read messages with metadata from /dev/kmsg
  send them using UDP network
  }

might be easier than modifying netconsole module.

 
 Thanks.
 
 -- 
 tejun
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
Hello,

On Sat, Apr 18, 2015 at 03:20:41AM +0900, Tetsuo Handa wrote:
 I didn't mean to introduce netconsole's own version of metadata.
 I meant we don't need to implement in-kernel retry logic.

Hmmm?  I'm not really following where this discussion is headed.  No,
we don't have to put it in the kernel.  We can punt the retry part to
userland as I wrote in another message at some cost to robustness.

 If we can assume that scheduler is working, adding a kernel thread that
 does
 
   while (1) {
   read messages with metadata from /dev/kmsg
   send them using UDP network
   }
 
 might be easier than modifying netconsole module.

But, I mean, if we are gonna do that in kernel, we better do it
properly where it belongs.  What's up with easier than modifying
netconsole module?  Why is netconsole special?  And how would the
above be any less complex than a single timer function?  What am I
missing?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
Just a bit of addition.

On Fri, Apr 17, 2015 at 01:37:54PM -0400, Tejun Heo wrote:
 Upto patch 12, it's just the same mechanism transferring extended
 messages.  It doesn't add any smartness to netconsole per-se except
 that it can now emit messages with metadata headers.  What do you
 think about them?

So, as long as netconsole can send messages with metadata header,
moving the reliability part to userland is trivial.  All that's
necessary is a program which follows /dev/kmsg, keeps the unacked
sequences and implement the same retransmission mechanism.  It'd be
less reobust in certain failure scenarios and a bit more cumbersome to
set up but nothing major and if we do that there'd be no reason to
keep the userland part in the kernel tree.

If the retransmission and timer parts are bothering, moving those to
userland sounds like an acceptable compromise.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
Hello, David.

On Fri, Apr 17, 2015 at 01:17:12PM -0400, David Miller wrote:
 If userland cannot run properly, it is almost certain that neither will
 your complex reliability layer logic.

* The bulk of patches are to pipe extended log messages to console
  drivers and let netconsole relay them to the receiver (and quite a
  bit of refactoring in the process), which, regardless of the
  reliability logic, is beneficial as we're currently losing
  structured logging (dictionary) and other metadata over consoles and
  regardless of where the reliability logic is implemented, it's a lot
  easier to have messages IDs.

* The only thing necessary for reliable transmission are timer and
  netpoll.  There sure are cases where they go down too but there's a
  pretty big gap between those two going down and userland getting
  hosed, but where to put the retransmission and reliability logic
  definitely is debatable.

* That said, the reliability part of the patch series are just two
  patches - 13 and 14, both of which are actually pretty simple.

 I tend to agree with Tetsuo, that in-kernel netconsole should remain
 as simple as possible and once it starts to have any smarts and less
 trivial logic the job belongs in userspace.

Upto patch 12, it's just the same mechanism transferring extended
messages.  It doesn't add any smartness to netconsole per-se except
that it can now emit messages with metadata headers.  What do you
think about them?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tetsuo Handa
Tejun Heo wrote:
 Hello, David.
 
 On Fri, Apr 17, 2015 at 01:17:12PM -0400, David Miller wrote:
  If userland cannot run properly, it is almost certain that neither will
  your complex reliability layer logic.
 
 * The bulk of patches are to pipe extended log messages to console
   drivers and let netconsole relay them to the receiver (and quite a
   bit of refactoring in the process), which, regardless of the
   reliability logic, is beneficial as we're currently losing
   structured logging (dictionary) and other metadata over consoles and
   regardless of where the reliability logic is implemented, it's a lot
   easier to have messages IDs.
 
 * The only thing necessary for reliable transmission are timer and
   netpoll.  There sure are cases where they go down too but there's a
   pretty big gap between those two going down and userland getting
   hosed, but where to put the retransmission and reliability logic
   definitely is debatable.
 
 * That said, the reliability part of the patch series are just two
   patches - 13 and 14, both of which are actually pretty simple.
 
  I tend to agree with Tetsuo, that in-kernel netconsole should remain
  as simple as possible and once it starts to have any smarts and less
  trivial logic the job belongs in userspace.
 
 Upto patch 12, it's just the same mechanism transferring extended
 messages.  It doesn't add any smartness to netconsole per-se except
 that it can now emit messages with metadata headers.  What do you
 think about them?

So, this patchset aims for obtaining kernel messages under problematic
condition. You have to hold messages until ack is delivered. This means
that printk buffer can become full before burst messages (e.g. SysRq-t)
are acked due to packet loss in the network.

printk() cannot wait for ack. Trying to wait for ack would break something.
How can you transmit subsequent kernel messages which failed to enqueue
due to waiting for ack for previous kernel messages?

 
 Thanks.
 
 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
On Sat, Apr 18, 2015 at 03:03:46AM +0900, Tetsuo Handa wrote:
 If you tolerate loss of kernel messages, adding sequence number to each UDP

Well, there's a difference between accepting loss when log buffer
overflows and when any packets get lost.

 packet will be sufficient for finding out whether the packets were lost and/or
 reordered in flight.
 
   printk(Hello);
= netconsole sends  Hello using UDP
   printk(netconsole);
= netconsole sends 0001 netconsole using UDP
   printk(world\n);
= netconsole sends 0002 world\n using UDP
 
 It might be nice to allow administrator to prefix a sequence number
 to netconsole messages for those who are using special receiver
 program (e.g. ncrx) which checks that sequence number.

That said, this is pretty much what the first 12 patches do (except
for the last printk patch, which can be taken out).  We already have
sequencing and established format to expose them to userland - try cat
/dev/kmsg, which btw is what local loggers on modern systems use
anyway.  Why introduce netconsole's own version of metadata?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread David Miller
From: Tejun Heo t...@kernel.org
Date: Fri, 17 Apr 2015 12:28:26 -0400

 On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
 If the sender side can wait for retransmission, why can't we use
 userspace programs (e.g. rsyslogd)?
 
 Because the system may be oopsing, ooming or threshing excessively
 rendering the userland inoperable and that's exactly when we want
 those log messages to be transmitted out of the system.

If userland cannot run properly, it is almost certain that neither will
your complex reliability layer logic.

I tend to agree with Tetsuo, that in-kernel netconsole should remain
as simple as possible and once it starts to have any smarts and less
trivial logic the job belongs in userspace.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
On Sat, Apr 18, 2015 at 02:43:30AM +0900, Tetsuo Handa wrote:
  Upto patch 12, it's just the same mechanism transferring extended
  messages.  It doesn't add any smartness to netconsole per-se except
  that it can now emit messages with metadata headers.  What do you
  think about them?
 
 So, this patchset aims for obtaining kernel messages under problematic
 condition. You have to hold messages until ack is delivered. This means
 that printk buffer can become full before burst messages (e.g. SysRq-t)
 are acked due to packet loss in the network.
 
 printk() cannot wait for ack. Trying to wait for ack would break something.
 How can you transmit subsequent kernel messages which failed to enqueue
 due to waiting for ack for previous kernel messages?

Well, if log buffer overflows and the messages aren't at the logging
target yet, they're lost.  It's the same as doing dmesg on localhost,
isn't it?  This doesn't have much to do with where the reliability
logic is implemented and is exactly the same with local logging too.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tetsuo Handa
Tejun Heo wrote:
  printk() cannot wait for ack. Trying to wait for ack would break something.
  How can you transmit subsequent kernel messages which failed to enqueue
  due to waiting for ack for previous kernel messages?
 
 Well, if log buffer overflows and the messages aren't at the logging
 target yet, they're lost.  It's the same as doing dmesg on localhost,
 isn't it?  This doesn't have much to do with where the reliability
 logic is implemented and is exactly the same with local logging too.

If you tolerate loss of kernel messages, adding sequence number to each UDP
packet will be sufficient for finding out whether the packets were lost and/or
reordered in flight.

  printk(Hello);
   = netconsole sends  Hello using UDP
  printk(netconsole);
   = netconsole sends 0001 netconsole using UDP
  printk(world\n);
   = netconsole sends 0002 world\n using UDP

It might be nice to allow administrator to prefix a sequence number
to netconsole messages for those who are using special receiver
program (e.g. ncrx) which checks that sequence number.

 
 Thanks.
 
 -- 
 tejun
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tetsuo Handa
Tejun Heo wrote:
 * Implement netconsole retransmission support.  Matching rx socket on
   the source port is automatically created for extended targets and
   the log receiver can request retransmission by sending reponse
   packets.  This is completely decoupled from the main write path and
   doesn't make netconsole less robust when things start go south.

If the sender side can wait for retransmission, why can't we use
userspace programs (e.g. rsyslogd)?

For me, netconsole is mainly for saving kernel messages which cannot be
waited for a few deciseconds (e.g. system reset by softdog's timeout) and
for saving kernel messages (e.g. SysRq-t) during disk I/O hang up.
I have a logger for receiving netconsole messages at
http://sourceforge.jp/projects/akari/scm/svn/tree/head/branches/udplogger/
and things I expect for netconsole are shown below.

  (a) spool the message to send up to 1 line or adimn configured size
  so that total number of small UDP packets can be reduced

  (b) don't hesitate to send the spooled message immediately if either
  kernel panic or system reset is in progress

  (c) allow different console log level for different console drivers
  so that I can send kernel messages via netconsole without making
  local console noisy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
Hello,

On Sat, Apr 18, 2015 at 12:35:06AM +0900, Tetsuo Handa wrote:
 If the sender side can wait for retransmission, why can't we use
 userspace programs (e.g. rsyslogd)?

Because the system may be oopsing, ooming or threshing excessively
rendering the userland inoperable and that's exactly when we want
those log messages to be transmitted out of the system.  This will get
log out as long as timer and netpoll are running which is the case
under a lot of circumstances.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread David Miller
From: Tejun Heo t...@kernel.org
Date: Fri, 17 Apr 2015 15:52:38 -0400

 Hello,
 
 On Fri, Apr 17, 2015 at 02:55:37PM -0400, David Miller wrote:
  * The bulk of patches are to pipe extended log messages to console
drivers and let netconsole relay them to the receiver (and quite a
bit of refactoring in the process), which, regardless of the
reliability logic, is beneficial as we're currently losing
structured logging (dictionary) and other metadata over consoles and
regardless of where the reliability logic is implemented, it's a lot
easier to have messages IDs.
 
 I do not argue against cleanups and good restructuring of the existing
 code.  But you have decided to mix that up with something that is not
 exactly non-controversial.
 
 Is the controlversial part referring to sending extended messages or
 the reliability part or both?

Anything outside of the non-side-effect cleanups.

 Hmmm... yeah, probably would have been a better idea.  FWIW, the
 patches are stacked roughly in the order of escalating
 controversiness.  Will split the series up.

Thanks.

 Sure, if irq handling is hosed, this won't work but I think there are
 enough other failure modes like oopsing while holding a mutex or
 falling into infinite loop while holding task_list lock (IIRC we had
 something simliar a while ago due to iterator bug).

If you oops while holding a mutex, unless it's the console mutex the
logging process can schedule and likely get the message transmitted.

What we're going to keep discussing is the fact that in return for all
of your unnecessary added complexity, we get something that only applies
in an extremely narrow scope of situations.

That is a very poor value proposition.

It took nearly two decades to get rid of all of the races and locking
problems with current netpoll/netconsole, and it's as simple as can
possibly be.  I do not want to even think about having to worry about
a reliability layer on top of it, that's just too much.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread David Miller
From: Tejun Heo t...@kernel.org
Date: Fri, 17 Apr 2015 13:37:54 -0400

 Hello, David.
 
 On Fri, Apr 17, 2015 at 01:17:12PM -0400, David Miller wrote:
 If userland cannot run properly, it is almost certain that neither will
 your complex reliability layer logic.
 
 * The bulk of patches are to pipe extended log messages to console
   drivers and let netconsole relay them to the receiver (and quite a
   bit of refactoring in the process), which, regardless of the
   reliability logic, is beneficial as we're currently losing
   structured logging (dictionary) and other metadata over consoles and
   regardless of where the reliability logic is implemented, it's a lot
   easier to have messages IDs.

I do not argue against cleanups and good restructuring of the existing
code.  But you have decided to mix that up with something that is not
exactly non-controversial.

You'd do well to seperate the cleanups from the fundamental changes,
so they can be handled separately.

 * The only thing necessary for reliable transmission are timer and
   netpoll.  There sure are cases where they go down too but there's a
   pretty big gap between those two going down and userland getting
   hosed, but where to put the retransmission and reliability logic
   definitely is debatable.

I fundamentally disagree, exactly on this point.

If you take an OOPS in a software interrupt handler (basically, all of
the networking receive paths and part of the transmit paths, for
example) you're not going to be taking timer interrupts.

And that's the value of netconsole, the chance (albeit not %100) of
getting messages in those scenerios.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-17 Thread Tejun Heo
Hello,

On Fri, Apr 17, 2015 at 02:55:37PM -0400, David Miller wrote:
  * The bulk of patches are to pipe extended log messages to console
drivers and let netconsole relay them to the receiver (and quite a
bit of refactoring in the process), which, regardless of the
reliability logic, is beneficial as we're currently losing
structured logging (dictionary) and other metadata over consoles and
regardless of where the reliability logic is implemented, it's a lot
easier to have messages IDs.
 
 I do not argue against cleanups and good restructuring of the existing
 code.  But you have decided to mix that up with something that is not
 exactly non-controversial.

Is the controlversial part referring to sending extended messages or
the reliability part or both?

 You'd do well to seperate the cleanups from the fundamental changes,
 so they can be handled separately.

Hmmm... yeah, probably would have been a better idea.  FWIW, the
patches are stacked roughly in the order of escalating
controversiness.  Will split the series up.

  * The only thing necessary for reliable transmission are timer and
netpoll.  There sure are cases where they go down too but there's a
pretty big gap between those two going down and userland getting
hosed, but where to put the retransmission and reliability logic
definitely is debatable.
 
 I fundamentally disagree, exactly on this point.
 
 If you take an OOPS in a software interrupt handler (basically, all of
 the networking receive paths and part of the transmit paths, for
 example) you're not going to be taking timer interrupts.

Sure, if irq handling is hosed, this won't work but I think there are
enough other failure modes like oopsing while holding a mutex or
falling into infinite loop while holding task_list lock (IIRC we had
something simliar a while ago due to iterator bug).  Whether being
more robust in those cases is worthwhile is definitely debatable.  I
thought the added complexity was small enough but the judgement can
easily fall on the other side.

 And that's the value of netconsole, the chance (albeit not %100) of
 getting messages in those scenerios.

None of the changes harm that in any way.  Anyways, I'll split up the
extended message and the rest.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html