Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Mon, 2014-03-10 at 15:30 -0400, David Miller wrote: > From: Eric Paris > Date: Fri, 07 Mar 2014 17:52:02 -0500 > > > The second user Eric patched, audit_send_list(), can grow without bound. > > The number of skb's is going to be the size of the number of audit rules > > that root loaded. We run the list of rules, generate an skb per rule, > > and add all of them to an skb_buff_head. We then pass the skb_buff_head > > to a kthread so that current will be able to read/drain the socket. > > There really is no limit to how big the skb_buff_head could possibly > > grow. This doesn't necessarily absolutely have to be lossless but it > > can actually quite reasonably be a whole lot of data that needs to get > > sent. I know of no way to deliver unbounded lengths of data to the > > current task via netlink without blocking on more space in the socket. > > Even if the socket rmem was MAX_INT, how can we deliver more? The rule > > size is unbounded. How do I get an unbounded amount of data onto this > > side of the socket when I have to generate it all during the request... > > This is what netlink dumps are for. It is how we are able to dump > routing tables with millions of routes to userspace. > > By using normal netlink requests and netlink_unicast() for this, you > are ignoring an entire mechanism in netlink designed specifically to > handle this kind of situation. > > Netlink dumps track state and build one or more SKBs (as necessary), > one by one, to form the reply. It implements flow control, state > tracking for iteration, optimized SKB sizing and allocation, etc. Awesome. I'll see what I can find! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Eric Paris Date: Fri, 07 Mar 2014 17:52:02 -0500 > The second user Eric patched, audit_send_list(), can grow without bound. > The number of skb's is going to be the size of the number of audit rules > that root loaded. We run the list of rules, generate an skb per rule, > and add all of them to an skb_buff_head. We then pass the skb_buff_head > to a kthread so that current will be able to read/drain the socket. > There really is no limit to how big the skb_buff_head could possibly > grow. This doesn't necessarily absolutely have to be lossless but it > can actually quite reasonably be a whole lot of data that needs to get > sent. I know of no way to deliver unbounded lengths of data to the > current task via netlink without blocking on more space in the socket. > Even if the socket rmem was MAX_INT, how can we deliver more? The rule > size is unbounded. How do I get an unbounded amount of data onto this > side of the socket when I have to generate it all during the request... This is what netlink dumps are for. It is how we are able to dump routing tables with millions of routes to userspace. By using normal netlink requests and netlink_unicast() for this, you are ignoring an entire mechanism in netlink designed specifically to handle this kind of situation. Netlink dumps track state and build one or more SKBs (as necessary), one by one, to form the reply. It implements flow control, state tracking for iteration, optimized SKB sizing and allocation, etc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Eric Paris epa...@redhat.com Date: Fri, 07 Mar 2014 17:52:02 -0500 The second user Eric patched, audit_send_list(), can grow without bound. The number of skb's is going to be the size of the number of audit rules that root loaded. We run the list of rules, generate an skb per rule, and add all of them to an skb_buff_head. We then pass the skb_buff_head to a kthread so that current will be able to read/drain the socket. There really is no limit to how big the skb_buff_head could possibly grow. This doesn't necessarily absolutely have to be lossless but it can actually quite reasonably be a whole lot of data that needs to get sent. I know of no way to deliver unbounded lengths of data to the current task via netlink without blocking on more space in the socket. Even if the socket rmem was MAX_INT, how can we deliver more? The rule size is unbounded. How do I get an unbounded amount of data onto this side of the socket when I have to generate it all during the request... This is what netlink dumps are for. It is how we are able to dump routing tables with millions of routes to userspace. By using normal netlink requests and netlink_unicast() for this, you are ignoring an entire mechanism in netlink designed specifically to handle this kind of situation. Netlink dumps track state and build one or more SKBs (as necessary), one by one, to form the reply. It implements flow control, state tracking for iteration, optimized SKB sizing and allocation, etc. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Mon, 2014-03-10 at 15:30 -0400, David Miller wrote: From: Eric Paris epa...@redhat.com Date: Fri, 07 Mar 2014 17:52:02 -0500 The second user Eric patched, audit_send_list(), can grow without bound. The number of skb's is going to be the size of the number of audit rules that root loaded. We run the list of rules, generate an skb per rule, and add all of them to an skb_buff_head. We then pass the skb_buff_head to a kthread so that current will be able to read/drain the socket. There really is no limit to how big the skb_buff_head could possibly grow. This doesn't necessarily absolutely have to be lossless but it can actually quite reasonably be a whole lot of data that needs to get sent. I know of no way to deliver unbounded lengths of data to the current task via netlink without blocking on more space in the socket. Even if the socket rmem was MAX_INT, how can we deliver more? The rule size is unbounded. How do I get an unbounded amount of data onto this side of the socket when I have to generate it all during the request... This is what netlink dumps are for. It is how we are able to dump routing tables with millions of routes to userspace. By using normal netlink requests and netlink_unicast() for this, you are ignoring an entire mechanism in netlink designed specifically to handle this kind of situation. Netlink dumps track state and build one or more SKBs (as necessary), one by one, to form the reply. It implements flow control, state tracking for iteration, optimized SKB sizing and allocation, etc. Awesome. I'll see what I can find! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Steve Grubb Date: Fri, 07 Mar 2014 22:27:28 -0500 > On Friday, March 07, 2014 07:48:01 PM David Miller wrote: >> From: Eric Paris >> Date: Fri, 07 Mar 2014 17:52:02 -0500 >> >> > Audit is non-tolerant to failure and loss. >> >> Netlink is not a loss-less transport. > > Perhaps. But in all our testing over the years its been very good. What I really meant by that was that there is flow control. You can push as much data reliably over it as you want, but you have to block when the socket limits are hit. And I'd say you might as well make the creator of the event do the blocking rather than making other threads do this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Fri, 2014-03-07 at 19:48 -0500, David Miller wrote: > From: Eric Paris > Date: Fri, 07 Mar 2014 17:52:02 -0500 > > > Audit is non-tolerant to failure and loss. > > Netlink is not a loss-less transport. I'm happy to accept that (and know it to be true). How can I better architect things? It seems Eric is complaining that when we get a request for info, we queue that info up, and then use a kthread to make it available when the task next calls recv. By using blocking sockets in the kthread we have no problem with the size of the socket read buf. If we switch to non-blocking sockets how can we possibly queue up more than rmem size of data? (honestly, if userspace used INT_MAX it is almost certainly overkill for even the largest rulesets, but theoretically, it's not...) Is our design somehow wrong? Flawed? Mind you it's pretty dumb that we do basically the same thing in 3 different audit code path, but the architecture doesn't seem crazy to me. Then again, I'm not brilliant by any stretch! +--+ | | | auditctl (audit tool run by root) | | netlink send netlink receive | +--+ +^ || v+ ++++ | kernel audit generate skbs || send skbs to userspace | ++++ +^ |++ | +--->| send skbs to a kthread |+-+ ++ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Friday, March 07, 2014 07:48:01 PM David Miller wrote: > From: Eric Paris > Date: Fri, 07 Mar 2014 17:52:02 -0500 > > > Audit is non-tolerant to failure and loss. > > Netlink is not a loss-less transport. Perhaps. But in all our testing over the years its been very good. -Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Eric Paris Date: Fri, 07 Mar 2014 17:52:02 -0500 > Audit is non-tolerant to failure and loss. Netlink is not a loss-less transport. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
As usual Eric, your commentary is anything but useful. However your technical thoughts are not off the mark. Can we stick to those? On Wed, 2014-03-05 at 10:06 -0800, Eric W. Biederman wrote: > Steve Grubb writes: > > > On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote: > >> From: ebied...@xmission.com (Eric W. Biederman) > >> Date: Tue, 04 Mar 2014 14:41:16 -0800 > >> > >> > If we really want the ability to always appened to the queue of skb's > >> > is to just have a version of netlink_send_skb that ignores the queued > >> > limits. Of course an evil program then could force the generation of > >> > enough audit records to DOS the kernel, but we seem to be in that > >> > situation now. Shrug. > >> > >> There is never a valid reason to bypass the socket limits. Audit does have some pretty crazy/wacky/dumb ways of doing things. And they've worked really well. I'm the first to agree that doesn't make them right. But also that I don't know how to do them better. I'm happy to try if I know how. Audit is non-tolerant to failure and loss. Many users of audit would prefer the system panic() than lose a message. If someone shows me how to do it better I'll happily admit there are likely places where what we do is just a 'little' too strong/foolish/crazy. Note that ALL users of these functions must have at least 1 capability (CAP_AUDIT_CONTROL). So if there is a malicious app, it is a root malicious app... The kernel audit has 3 different 'things' that send skbs to userspace. All of them work a little crazy, but similar crazy. The current task calls into the kernel via netlink, and the kernel then builds one or more skb(s) and passes those skb(s) (via differing mechanisms) to a kthread which in turn calls netlink_unicast(,,,0) sending the response back to the current task in 2 of the three cases. In all cases, since the timeout is infinite, we assume that the only possible reason this call to netlink_unicast() will fail is because the other end of the socket went away. Simple drawing of 2 of the 3 cases. +--+ | | | auditctl (audit tool run by root) | | netlink send netlink receive | +--+ +^ || v+ ++++ | kernel audit generate skbs || send skbs to userspace | ++++ +^ |++ | +--->| send skbs to a kthread |+-+ ++ The most important of the 3 cases and the one that people care the absolute most about 'things cannot be lost' is the actual audit messages. Messages like 'process A just did action B to object C'. These are handled by means of the current process generating an skb and passing those to an audit specific queue. This audit internal queue depth is controllable by userspace. If we overflow this queue we may call panic() (admin choice, obviously non-default). Again, the kthread on the other end of that queue assumes that all calls to netlink_unicast(,,,0) will eventually succeed (unless the receiving task died). It is actually imperative that the current process be blocked until the message is on track to userspace. Even Eric isn't trying to change this one case in his patch. This is the one case where the task receiving the skb is not (likely) the current task (but could be) The other two, the ones Eric patched, are much more flexible. In both cases a userspace task ask the kernel for a specific piece of information (by sending a netlink message). current is going to be the task draining the netlink queue. This is the reason the send is being punted to a kthread. So current can read from the netlink socket. In one case audit_send_reply_thread() the response is small and can't really grow without bound. Converting to a nonblocking socket might well make sense here. The second user Eric patched, audit_send_list(), can grow without bound. The number of skb's is going to be the size of the number of audit rules that root loaded. We run the list of rules, generate an skb per rule, and add all of them to an skb_buff_head. We then pass the skb_buff_head to a kthread so that current will be able to read/drain the socket. There really is no limit to how big the skb_buff_head could possibly grow. This doesn't necessarily absolutely have to be lossless but it can actually quite
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
As usual Eric, your commentary is anything but useful. However your technical thoughts are not off the mark. Can we stick to those? On Wed, 2014-03-05 at 10:06 -0800, Eric W. Biederman wrote: Steve Grubb sgr...@redhat.com writes: On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote: From: ebied...@xmission.com (Eric W. Biederman) Date: Tue, 04 Mar 2014 14:41:16 -0800 If we really want the ability to always appened to the queue of skb's is to just have a version of netlink_send_skb that ignores the queued limits. Of course an evil program then could force the generation of enough audit records to DOS the kernel, but we seem to be in that situation now. Shrug. There is never a valid reason to bypass the socket limits. Audit does have some pretty crazy/wacky/dumb ways of doing things. And they've worked really well. I'm the first to agree that doesn't make them right. But also that I don't know how to do them better. I'm happy to try if I know how. Audit is non-tolerant to failure and loss. Many users of audit would prefer the system panic() than lose a message. If someone shows me how to do it better I'll happily admit there are likely places where what we do is just a 'little' too strong/foolish/crazy. Note that ALL users of these functions must have at least 1 capability (CAP_AUDIT_CONTROL). So if there is a malicious app, it is a root malicious app... The kernel audit has 3 different 'things' that send skbs to userspace. All of them work a little crazy, but similar crazy. The current task calls into the kernel via netlink, and the kernel then builds one or more skb(s) and passes those skb(s) (via differing mechanisms) to a kthread which in turn calls netlink_unicast(,,,0) sending the response back to the current task in 2 of the three cases. In all cases, since the timeout is infinite, we assume that the only possible reason this call to netlink_unicast() will fail is because the other end of the socket went away. Simple drawing of 2 of the 3 cases. +--+ | | | auditctl (audit tool run by root) | | netlink send netlink receive | +--+ +^ || v+ ++++ | kernel audit generate skbs || send skbs to userspace | ++++ +^ |++ | +---| send skbs to a kthread |+-+ ++ The most important of the 3 cases and the one that people care the absolute most about 'things cannot be lost' is the actual audit messages. Messages like 'process A just did action B to object C'. These are handled by means of the current process generating an skb and passing those to an audit specific queue. This audit internal queue depth is controllable by userspace. If we overflow this queue we may call panic() (admin choice, obviously non-default). Again, the kthread on the other end of that queue assumes that all calls to netlink_unicast(,,,0) will eventually succeed (unless the receiving task died). It is actually imperative that the current process be blocked until the message is on track to userspace. Even Eric isn't trying to change this one case in his patch. This is the one case where the task receiving the skb is not (likely) the current task (but could be) The other two, the ones Eric patched, are much more flexible. In both cases a userspace task ask the kernel for a specific piece of information (by sending a netlink message). current is going to be the task draining the netlink queue. This is the reason the send is being punted to a kthread. So current can read from the netlink socket. In one case audit_send_reply_thread() the response is small and can't really grow without bound. Converting to a nonblocking socket might well make sense here. The second user Eric patched, audit_send_list(), can grow without bound. The number of skb's is going to be the size of the number of audit rules that root loaded. We run the list of rules, generate an skb per rule, and add all of them to an skb_buff_head. We then pass the skb_buff_head to a kthread so that current will be able to read/drain the socket. There really is no limit to how big the skb_buff_head could possibly grow. This doesn't necessarily absolutely have to be lossless but it can actually quite reasonably be a whole
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Eric Paris epa...@redhat.com Date: Fri, 07 Mar 2014 17:52:02 -0500 Audit is non-tolerant to failure and loss. Netlink is not a loss-less transport. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Friday, March 07, 2014 07:48:01 PM David Miller wrote: From: Eric Paris epa...@redhat.com Date: Fri, 07 Mar 2014 17:52:02 -0500 Audit is non-tolerant to failure and loss. Netlink is not a loss-less transport. Perhaps. But in all our testing over the years its been very good. -Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Fri, 2014-03-07 at 19:48 -0500, David Miller wrote: From: Eric Paris epa...@redhat.com Date: Fri, 07 Mar 2014 17:52:02 -0500 Audit is non-tolerant to failure and loss. Netlink is not a loss-less transport. I'm happy to accept that (and know it to be true). How can I better architect things? It seems Eric is complaining that when we get a request for info, we queue that info up, and then use a kthread to make it available when the task next calls recv. By using blocking sockets in the kthread we have no problem with the size of the socket read buf. If we switch to non-blocking sockets how can we possibly queue up more than rmem size of data? (honestly, if userspace used INT_MAX it is almost certainly overkill for even the largest rulesets, but theoretically, it's not...) Is our design somehow wrong? Flawed? Mind you it's pretty dumb that we do basically the same thing in 3 different audit code path, but the architecture doesn't seem crazy to me. Then again, I'm not brilliant by any stretch! +--+ | | | auditctl (audit tool run by root) | | netlink send netlink receive | +--+ +^ || v+ ++++ | kernel audit generate skbs || send skbs to userspace | ++++ +^ |++ | +---| send skbs to a kthread |+-+ ++ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Steve Grubb sgr...@redhat.com Date: Fri, 07 Mar 2014 22:27:28 -0500 On Friday, March 07, 2014 07:48:01 PM David Miller wrote: From: Eric Paris epa...@redhat.com Date: Fri, 07 Mar 2014 17:52:02 -0500 Audit is non-tolerant to failure and loss. Netlink is not a loss-less transport. Perhaps. But in all our testing over the years its been very good. What I really meant by that was that there is flow control. You can push as much data reliably over it as you want, but you have to block when the socket limits are hit. And I'd say you might as well make the creator of the event do the blocking rather than making other threads do this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
Steve Grubb writes: > On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote: >> From: ebied...@xmission.com (Eric W. Biederman) >> Date: Tue, 04 Mar 2014 14:41:16 -0800 >> >> > If we really want the ability to always appened to the queue of skb's >> > is to just have a version of netlink_send_skb that ignores the queued >> > limits. Of course an evil program then could force the generation of >> > enough audit records to DOS the kernel, but we seem to be in that >> > situation now. Shrug. >> >> There is never a valid reason to bypass the socket limits. >> >> It protects the system from things going out of control. >> >> Netlink packet sends can fail, and audit should cope with that >> event instead of trying to bludgeon it into not happening. > > I am not sure what exactly is the problem with the code that was being > patched. The audit code has been stable and working pretty well for everyone > for the last 5-6 years. But lately there has been a lot of kernel code churn > and I am concerned that the changes are being made without a complete > understanding of the design goals. The problem is that the audit code in the kernel is not using netlink correctly and which leads to mistakes when the code is updated because the code in the kernel is unnecessarily complex, and inefficient. > The audit system has to be very reliable. It can't lose any event or record. > The people that really depend on it would rather have access denied to the > system than lose any event. This is the reason it goes to such > lengths. But the designers of the code didn't really go to any lengths at all. There are enormous and cumbersome hacks to avoid fixing the kernel interfaces to do what audit wants to use the kernel interfaces for. That is the audit code for transmitting replies is stupid and is causing serious maintenance issues, by growing hacks upon hacks instead of dealing with the core issues. The first approximation of what you want should have been be to set the netlink socket recvbuf to as large as the can be: setsockopt(sk, SOL_SOCKET, SO_RCVBUFFORCE, INTMAX); If/when 2GiB is shown to be not enough people should should have worked on the netlink layer to allow even larger piles of audit messages to be in flight, assuming you really do want to bring down the system when the audit client is just too slow. > If I understand the patch, it looks like its affecting code that adds, > deletes, > or lists audit rules. This code is quite old and working well. It didn't at > first. We found a lot of problems by writing scripts like: Working well does not mean implemented well, and for a security feature not implemented well means broken. And frankly since I can't see a single setsockopt to set the received buffer size for audit netlink sockets I have to stronlgy suspect that people were working around userpsace bugs with stupid kernel code. > #!/bin/bash > while [ 1 ] ; > do > echo "Inserting syscalls..." > sys=1 > while [ $sys -lt 100 ] > do > sys=`expr $sys + 1` > echo "$sys" > auditctl -a entry,always -S $sys > done > > echo "Listing..." > auditctl -l > echo "Deleting..." > auditctl -D > done > > with another shell running: > > #!/bin/bash > while [ 1 ] ; > do > echo "Listing..." > auditctl -l > done > > and another shell running: > > #!/bin/bash > while [ 1 ] ; > do > echo "Disabling..." > auditctl -e 0 > echo "Enabling..." > auditctl -e 1 > done > > You can probably imagine more stress tests. But the proposed code should be > well tested similar to this. Honestly since no one cares enough to maintain the kernel code properly I really think we should just rip audit out instead trying to present userspace with the delusion that the code works, and will continue to work properly. My apologies for being grumpy. I have no intention of breaking userspace (which is why my last patch was an rfc) but at the same time I care absolutely nothing about audit. It solves none of my problems and works at all of the wrong layers to be interesting for me. I just happened to notice that the code has been broken since 3.14-rc1 and no one understands or cares enough about audit to even accept trivial kernel patches to fix the bugs. No offense but I am not interested in testing code I care nothing about, especially when there is no one to pick up the patches. My RFC patch was supposed to be a word to the wise. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote: > From: ebied...@xmission.com (Eric W. Biederman) > Date: Tue, 04 Mar 2014 14:41:16 -0800 > > > If we really want the ability to always appened to the queue of skb's > > is to just have a version of netlink_send_skb that ignores the queued > > limits. Of course an evil program then could force the generation of > > enough audit records to DOS the kernel, but we seem to be in that > > situation now. Shrug. > > There is never a valid reason to bypass the socket limits. > > It protects the system from things going out of control. > > Netlink packet sends can fail, and audit should cope with that > event instead of trying to bludgeon it into not happening. I am not sure what exactly is the problem with the code that was being patched. The audit code has been stable and working pretty well for everyone for the last 5-6 years. But lately there has been a lot of kernel code churn and I am concerned that the changes are being made without a complete understanding of the design goals. The audit system has to be very reliable. It can't lose any event or record. The people that really depend on it would rather have access denied to the system than lose any event. This is the reason it goes to such lengths. If I understand the patch, it looks like its affecting code that adds, deletes, or lists audit rules. This code is quite old and working well. It didn't at first. We found a lot of problems by writing scripts like: #!/bin/bash while [ 1 ] ; do echo "Inserting syscalls..." sys=1 while [ $sys -lt 100 ] do sys=`expr $sys + 1` echo "$sys" auditctl -a entry,always -S $sys done echo "Listing..." auditctl -l echo "Deleting..." auditctl -D done with another shell running: #!/bin/bash while [ 1 ] ; do echo "Listing..." auditctl -l done and another shell running: #!/bin/bash while [ 1 ] ; do echo "Disabling..." auditctl -e 0 echo "Enabling..." auditctl -e 1 done You can probably imagine more stress tests. But the proposed code should be well tested similar to this. Thanks, -Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote: From: ebied...@xmission.com (Eric W. Biederman) Date: Tue, 04 Mar 2014 14:41:16 -0800 If we really want the ability to always appened to the queue of skb's is to just have a version of netlink_send_skb that ignores the queued limits. Of course an evil program then could force the generation of enough audit records to DOS the kernel, but we seem to be in that situation now. Shrug. There is never a valid reason to bypass the socket limits. It protects the system from things going out of control. Netlink packet sends can fail, and audit should cope with that event instead of trying to bludgeon it into not happening. I am not sure what exactly is the problem with the code that was being patched. The audit code has been stable and working pretty well for everyone for the last 5-6 years. But lately there has been a lot of kernel code churn and I am concerned that the changes are being made without a complete understanding of the design goals. The audit system has to be very reliable. It can't lose any event or record. The people that really depend on it would rather have access denied to the system than lose any event. This is the reason it goes to such lengths. If I understand the patch, it looks like its affecting code that adds, deletes, or lists audit rules. This code is quite old and working well. It didn't at first. We found a lot of problems by writing scripts like: #!/bin/bash while [ 1 ] ; do echo Inserting syscalls... sys=1 while [ $sys -lt 100 ] do sys=`expr $sys + 1` echo $sys auditctl -a entry,always -S $sys done echo Listing... auditctl -l echo Deleting... auditctl -D done with another shell running: #!/bin/bash while [ 1 ] ; do echo Listing... auditctl -l done and another shell running: #!/bin/bash while [ 1 ] ; do echo Disabling... auditctl -e 0 echo Enabling... auditctl -e 1 done You can probably imagine more stress tests. But the proposed code should be well tested similar to this. Thanks, -Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
Steve Grubb sgr...@redhat.com writes: On Tuesday, March 04, 2014 07:21:52 PM David Miller wrote: From: ebied...@xmission.com (Eric W. Biederman) Date: Tue, 04 Mar 2014 14:41:16 -0800 If we really want the ability to always appened to the queue of skb's is to just have a version of netlink_send_skb that ignores the queued limits. Of course an evil program then could force the generation of enough audit records to DOS the kernel, but we seem to be in that situation now. Shrug. There is never a valid reason to bypass the socket limits. It protects the system from things going out of control. Netlink packet sends can fail, and audit should cope with that event instead of trying to bludgeon it into not happening. I am not sure what exactly is the problem with the code that was being patched. The audit code has been stable and working pretty well for everyone for the last 5-6 years. But lately there has been a lot of kernel code churn and I am concerned that the changes are being made without a complete understanding of the design goals. The problem is that the audit code in the kernel is not using netlink correctly and which leads to mistakes when the code is updated because the code in the kernel is unnecessarily complex, and inefficient. The audit system has to be very reliable. It can't lose any event or record. The people that really depend on it would rather have access denied to the system than lose any event. This is the reason it goes to such lengths. But the designers of the code didn't really go to any lengths at all. There are enormous and cumbersome hacks to avoid fixing the kernel interfaces to do what audit wants to use the kernel interfaces for. That is the audit code for transmitting replies is stupid and is causing serious maintenance issues, by growing hacks upon hacks instead of dealing with the core issues. The first approximation of what you want should have been be to set the netlink socket recvbuf to as large as the can be: setsockopt(sk, SOL_SOCKET, SO_RCVBUFFORCE, INTMAX); If/when 2GiB is shown to be not enough people should should have worked on the netlink layer to allow even larger piles of audit messages to be in flight, assuming you really do want to bring down the system when the audit client is just too slow. If I understand the patch, it looks like its affecting code that adds, deletes, or lists audit rules. This code is quite old and working well. It didn't at first. We found a lot of problems by writing scripts like: Working well does not mean implemented well, and for a security feature not implemented well means broken. And frankly since I can't see a single setsockopt to set the received buffer size for audit netlink sockets I have to stronlgy suspect that people were working around userpsace bugs with stupid kernel code. #!/bin/bash while [ 1 ] ; do echo Inserting syscalls... sys=1 while [ $sys -lt 100 ] do sys=`expr $sys + 1` echo $sys auditctl -a entry,always -S $sys done echo Listing... auditctl -l echo Deleting... auditctl -D done with another shell running: #!/bin/bash while [ 1 ] ; do echo Listing... auditctl -l done and another shell running: #!/bin/bash while [ 1 ] ; do echo Disabling... auditctl -e 0 echo Enabling... auditctl -e 1 done You can probably imagine more stress tests. But the proposed code should be well tested similar to this. Honestly since no one cares enough to maintain the kernel code properly I really think we should just rip audit out instead trying to present userspace with the delusion that the code works, and will continue to work properly. My apologies for being grumpy. I have no intention of breaking userspace (which is why my last patch was an rfc) but at the same time I care absolutely nothing about audit. It solves none of my problems and works at all of the wrong layers to be interesting for me. I just happened to notice that the code has been broken since 3.14-rc1 and no one understands or cares enough about audit to even accept trivial kernel patches to fix the bugs. No offense but I am not interested in testing code I care nothing about, especially when there is no one to pick up the patches. My RFC patch was supposed to be a word to the wise. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: ebied...@xmission.com (Eric W. Biederman) Date: Tue, 04 Mar 2014 14:41:16 -0800 > If we really want the ability to always appened to the queue of skb's > is to just have a version of netlink_send_skb that ignores the queued > limits. Of course an evil program then could force the generation of > enough audit records to DOS the kernel, but we seem to be in that > situation now. Shrug. There is never a valid reason to bypass the socket limits. It protects the system from things going out of control. Netlink packet sends can fail, and audit should cope with that event instead of trying to bludgeon it into not happening. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Tue, 04 Mar 2014 14:41:16 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: > David Miller writes: > > > From: Andrew Morton > > Date: Tue, 4 Mar 2014 13:30:04 -0800 > > > >> On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. > >> Biederman) wrote: > >> > >>> > >>> Modify audit_send_reply to directly use a non-blocking send and > >>> to return an error on failure (if anyone cares). > >>> > >>> Modify audit_list_rules_send to use audit_send_reply and give up > >>> if we can not send a packet. > >>> > >>> Merge audit_list_rules into iaudit_list_rules_send as the code > >>> is now sufficiently simple to not justify to callers. > >>> > >>> Kill audit_send_list, audit_send_reply_thread because using > >>> a separate thread for replies is not needed when sending > >>> packets syncrhonously. > >> > >> Nothing much seems to be happening here? > > Well you picked up the patch that fixes the worst of the bugs that I was > complaining about. Beyond that I don't know what makes sense. Oh, so I did. I wasn't planning on merging it myself, hoping that someone who hasaclue will step in. Help. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
David Miller writes: > From: Andrew Morton > Date: Tue, 4 Mar 2014 13:30:04 -0800 > >> On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) >> wrote: >> >>> >>> Modify audit_send_reply to directly use a non-blocking send and >>> to return an error on failure (if anyone cares). >>> >>> Modify audit_list_rules_send to use audit_send_reply and give up >>> if we can not send a packet. >>> >>> Merge audit_list_rules into iaudit_list_rules_send as the code >>> is now sufficiently simple to not justify to callers. >>> >>> Kill audit_send_list, audit_send_reply_thread because using >>> a separate thread for replies is not needed when sending >>> packets syncrhonously. >> >> Nothing much seems to be happening here? Well you picked up the patch that fixes the worst of the bugs that I was complaining about. Beyond that I don't know what makes sense. >> In an earlier email you said "While reading through 3.14-rc1 I found a >> pretty siginficant mishandling of network namespaces in the recent >> audit changes." Were those recent audit changes a post-3.14 thing? And >> what is the user-visible effect of the mishandling? > > I took a quick look at this patch yesterday, and my only suspicion is > that threads are created currently by this code purely to cons up a > blockable context for the netlink code. > > Perhaps it wants to avoid any netlink packet drops from being possible. > > I'm not so sure. Looking at the history (see below) the stated reason from David Woodhouse is to prevent the removal of packets from the packet queues when we have reached our rcvbuf limit. The reasoning doesn't make a lick of sense to me and I was hoping the folks dealing with audit would comment. If we really want the ability to always appened to the queue of skb's is to just have a version of netlink_send_skb that ignores the queued limits. Of course an evil program then could force the generation of enough audit records to DOS the kernel, but we seem to be in that situation now. Shrug. > Anyways, some investigation into perhaps figuring out the intentions of > the original implementation would be nice. I had looked briefly and missed a few things. Here is the complete history in all of it's confusion. In brief. The code came in using non-blocking netlink writes. David Woodhouse made the writes blocking. Al Viro, and then Eric Paris patch the code to deal with deadlocks cause by the blocking writes. commit 4527a30f157fca45c102ea49a2fb34a4502264eb Author: akpm Date: Mon Apr 12 20:29:12 2004 + [PATCH] Light-weight Auditing Framework From: Rik Faith This patch provides a low-overhead system-call auditing framework for Linux that is usable by LSM components (e.g., SELinux). This is an update of the patch discussed in this thread: http://marc.theaimsgroup.com/?t=10781588811=1=2 In brief, it provides for netlink-based logging of audit records that have been generated in other parts of the kernel (e.g., SELinux) as well as the ability to audit system calls, either independently (using simple filtering) or as a compliment to the audit record that another part of the kernel generated. The main goals were to provide system call auditing with 1) as low overhead as possible, and 2) without duplicating functionality that is already provided by SELinux (and/or other security infrastructures). This framework will work "stand-alone", but is not designed to provide, e.g., CAPP functionality without another security component in place. This updated patch includes changes from feedback I have received, including the ability to compile without CONFIG_NET (and better use of tabs, so use -w if you diff against the older patch). Please see http://people.redhat.com/faith/audit/ for an early example user-space client (auditd-0.4.tar.gz) and instructions on how to try it. My future intentions at the kernel level include improving filtering (e.g., syscall personality/exit codes) and syscall support for more architectures. First, though, I'm going to work on documentation, a (real) audit daemon, and patches for other user-space tools so that people can play with the framework and understand how it can be used with and without SELinux. Update: Light-weight Auditing Framework receive filter fixes From: Rik Faith Since audit_receive_filter() is only called with audit_netlink_sem held, it cannot race with either audit_del_rule() or audit_add_rule(), so the list_for_each_entry_rcu()s may be replaced by list_for_each_entry()s, and the rcu_read_{un,}lock()s removed. A fix for this is part of the attached patch. Other features of the attached patch are: 1) generalized the ability to test for inequality 2) added syscall exit status reporting and testing 3) added ability to
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Andrew Morton Date: Tue, 4 Mar 2014 13:30:04 -0800 > On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) > wrote: > >> >> Modify audit_send_reply to directly use a non-blocking send and >> to return an error on failure (if anyone cares). >> >> Modify audit_list_rules_send to use audit_send_reply and give up >> if we can not send a packet. >> >> Merge audit_list_rules into iaudit_list_rules_send as the code >> is now sufficiently simple to not justify to callers. >> >> Kill audit_send_list, audit_send_reply_thread because using >> a separate thread for replies is not needed when sending >> packets syncrhonously. > > Nothing much seems to be happening here? > > In an earlier email you said "While reading through 3.14-rc1 I found a > pretty siginficant mishandling of network namespaces in the recent > audit changes." Were those recent audit changes a post-3.14 thing? And > what is the user-visible effect of the mishandling? I took a quick look at this patch yesterday, and my only suspicion is that threads are created currently by this code purely to cons up a blockable context for the netlink code. Perhaps it wants to avoid any netlink packet drops from being possible. I'm not so sure. Anyways, some investigation into perhaps figuring out the intentions of the original implementation would be nice. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: > > Modify audit_send_reply to directly use a non-blocking send and > to return an error on failure (if anyone cares). > > Modify audit_list_rules_send to use audit_send_reply and give up > if we can not send a packet. > > Merge audit_list_rules into iaudit_list_rules_send as the code > is now sufficiently simple to not justify to callers. > > Kill audit_send_list, audit_send_reply_thread because using > a separate thread for replies is not needed when sending > packets syncrhonously. Nothing much seems to be happening here? In an earlier email you said "While reading through 3.14-rc1 I found a pretty siginficant mishandling of network namespaces in the recent audit changes." Were those recent audit changes a post-3.14 thing? And what is the user-visible effect of the mishandling? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: Modify audit_send_reply to directly use a non-blocking send and to return an error on failure (if anyone cares). Modify audit_list_rules_send to use audit_send_reply and give up if we can not send a packet. Merge audit_list_rules into iaudit_list_rules_send as the code is now sufficiently simple to not justify to callers. Kill audit_send_list, audit_send_reply_thread because using a separate thread for replies is not needed when sending packets syncrhonously. Nothing much seems to be happening here? In an earlier email you said While reading through 3.14-rc1 I found a pretty siginficant mishandling of network namespaces in the recent audit changes. Were those recent audit changes a post-3.14 thing? And what is the user-visible effect of the mishandling? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: Andrew Morton a...@linux-foundation.org Date: Tue, 4 Mar 2014 13:30:04 -0800 On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: Modify audit_send_reply to directly use a non-blocking send and to return an error on failure (if anyone cares). Modify audit_list_rules_send to use audit_send_reply and give up if we can not send a packet. Merge audit_list_rules into iaudit_list_rules_send as the code is now sufficiently simple to not justify to callers. Kill audit_send_list, audit_send_reply_thread because using a separate thread for replies is not needed when sending packets syncrhonously. Nothing much seems to be happening here? In an earlier email you said While reading through 3.14-rc1 I found a pretty siginficant mishandling of network namespaces in the recent audit changes. Were those recent audit changes a post-3.14 thing? And what is the user-visible effect of the mishandling? I took a quick look at this patch yesterday, and my only suspicion is that threads are created currently by this code purely to cons up a blockable context for the netlink code. Perhaps it wants to avoid any netlink packet drops from being possible. I'm not so sure. Anyways, some investigation into perhaps figuring out the intentions of the original implementation would be nice. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
David Miller da...@davemloft.net writes: From: Andrew Morton a...@linux-foundation.org Date: Tue, 4 Mar 2014 13:30:04 -0800 On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: Modify audit_send_reply to directly use a non-blocking send and to return an error on failure (if anyone cares). Modify audit_list_rules_send to use audit_send_reply and give up if we can not send a packet. Merge audit_list_rules into iaudit_list_rules_send as the code is now sufficiently simple to not justify to callers. Kill audit_send_list, audit_send_reply_thread because using a separate thread for replies is not needed when sending packets syncrhonously. Nothing much seems to be happening here? Well you picked up the patch that fixes the worst of the bugs that I was complaining about. Beyond that I don't know what makes sense. In an earlier email you said While reading through 3.14-rc1 I found a pretty siginficant mishandling of network namespaces in the recent audit changes. Were those recent audit changes a post-3.14 thing? And what is the user-visible effect of the mishandling? I took a quick look at this patch yesterday, and my only suspicion is that threads are created currently by this code purely to cons up a blockable context for the netlink code. Perhaps it wants to avoid any netlink packet drops from being possible. I'm not so sure. Looking at the history (see below) the stated reason from David Woodhouse is to prevent the removal of packets from the packet queues when we have reached our rcvbuf limit. The reasoning doesn't make a lick of sense to me and I was hoping the folks dealing with audit would comment. If we really want the ability to always appened to the queue of skb's is to just have a version of netlink_send_skb that ignores the queued limits. Of course an evil program then could force the generation of enough audit records to DOS the kernel, but we seem to be in that situation now. Shrug. Anyways, some investigation into perhaps figuring out the intentions of the original implementation would be nice. I had looked briefly and missed a few things. Here is the complete history in all of it's confusion. In brief. The code came in using non-blocking netlink writes. David Woodhouse made the writes blocking. Al Viro, and then Eric Paris patch the code to deal with deadlocks cause by the blocking writes. commit 4527a30f157fca45c102ea49a2fb34a4502264eb Author: akpm akpm Date: Mon Apr 12 20:29:12 2004 + [PATCH] Light-weight Auditing Framework From: Rik Faith fa...@redhat.com This patch provides a low-overhead system-call auditing framework for Linux that is usable by LSM components (e.g., SELinux). This is an update of the patch discussed in this thread: http://marc.theaimsgroup.com/?t=10781588811r=1w=2 In brief, it provides for netlink-based logging of audit records that have been generated in other parts of the kernel (e.g., SELinux) as well as the ability to audit system calls, either independently (using simple filtering) or as a compliment to the audit record that another part of the kernel generated. The main goals were to provide system call auditing with 1) as low overhead as possible, and 2) without duplicating functionality that is already provided by SELinux (and/or other security infrastructures). This framework will work stand-alone, but is not designed to provide, e.g., CAPP functionality without another security component in place. This updated patch includes changes from feedback I have received, including the ability to compile without CONFIG_NET (and better use of tabs, so use -w if you diff against the older patch). Please see http://people.redhat.com/faith/audit/ for an early example user-space client (auditd-0.4.tar.gz) and instructions on how to try it. My future intentions at the kernel level include improving filtering (e.g., syscall personality/exit codes) and syscall support for more architectures. First, though, I'm going to work on documentation, a (real) audit daemon, and patches for other user-space tools so that people can play with the framework and understand how it can be used with and without SELinux. Update: Light-weight Auditing Framework receive filter fixes From: Rik Faith fa...@redhat.com Since audit_receive_filter() is only called with audit_netlink_sem held, it cannot race with either audit_del_rule() or audit_add_rule(), so the list_for_each_entry_rcu()s may be replaced by list_for_each_entry()s, and the rcu_read_{un,}lock()s removed. A fix for this is part of the attached patch. Other features of the attached patch are: 1) generalized the ability to test for inequality 2) added syscall exit status reporting and testing 3) added
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
On Tue, 04 Mar 2014 14:41:16 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: David Miller da...@davemloft.net writes: From: Andrew Morton a...@linux-foundation.org Date: Tue, 4 Mar 2014 13:30:04 -0800 On Fri, 28 Feb 2014 20:50:19 -0800 ebied...@xmission.com (Eric W. Biederman) wrote: Modify audit_send_reply to directly use a non-blocking send and to return an error on failure (if anyone cares). Modify audit_list_rules_send to use audit_send_reply and give up if we can not send a packet. Merge audit_list_rules into iaudit_list_rules_send as the code is now sufficiently simple to not justify to callers. Kill audit_send_list, audit_send_reply_thread because using a separate thread for replies is not needed when sending packets syncrhonously. Nothing much seems to be happening here? Well you picked up the patch that fixes the worst of the bugs that I was complaining about. Beyond that I don't know what makes sense. Oh, so I did. I wasn't planning on merging it myself, hoping that someone who hasaclue will step in. Help. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough
From: ebied...@xmission.com (Eric W. Biederman) Date: Tue, 04 Mar 2014 14:41:16 -0800 If we really want the ability to always appened to the queue of skb's is to just have a version of netlink_send_skb that ignores the queued limits. Of course an evil program then could force the generation of enough audit records to DOS the kernel, but we seem to be in that situation now. Shrug. There is never a valid reason to bypass the socket limits. It protects the system from things going out of control. Netlink packet sends can fail, and audit should cope with that event instead of trying to bludgeon it into not happening. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/