Re: aio poll, io_pgetevents and a new in-kernel poll API V3
On 01/18/2018 07:51 PM, Avi Kivity wrote: On 01/18/2018 05:46 PM, Jeff Moyer wrote: FYI, this kernel has issues. It will boot up, but I don't have networking, and even rebooting doesn't succeed. I'm looking into it. FWIW, I'm running an older version of this patchset on my desktop with no problems so far. (A Fedora 27)
Re: aio poll, io_pgetevents and a new in-kernel poll API V3
On 01/18/2018 05:46 PM, Jeff Moyer wrote: FYI, this kernel has issues. It will boot up, but I don't have networking, and even rebooting doesn't succeed. I'm looking into it. FWIW, I'm running an older version of this patchset on my desktop with no problems so far. -Jeff Christoph Hellwigwrites: Hi all, this series adds support for the IOCB_CMD_POLL operation to poll for the readyness of file descriptors using the aio subsystem. The API is based on patches that existed in RHAS2.1 and RHEL3, which means it already is supported by libaio. To implement the poll support efficiently new methods to poll are introduced in struct file_operations: get_poll_head and poll_mask. The first one returns a wait_queue_head to wait on (lifetime is bound by the file), and the second does a non-blocking check for the POLL* events. This allows aio poll to work without any additional context switches, unlike epoll. To make the interface fully useful a new io_pgetevents system call is added, which atomically saves and restores the signal mask over the io_pgetevents system call. It it the logical equivalent to pselect and ppoll for io_pgetevents. The corresponding libaio changes for io_pgetevents support and documentation, as well as a test case will be posted in a separate series. The changes were sponsored by Scylladb, and improve performance of the seastar framework up to 10%, while also removing the need for a privileged SCHED_FIFO epoll listener thread. The patches are on top of Als __poll_t annoations, so I've also prepared a git branch on top of those here: git://git.infradead.org/users/hch/vfs.git aio-poll.3 Gitweb: http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.3 Libaio changes: https://pagure.io/libaio.git io-poll Seastar changes (not updated for the new io_pgetevens ABI yet): https://github.com/avikivity/seastar/commits/aio Changes since V2: - removed a double initialization - new vfs_get_poll_head helper - document that ->get_poll_head can return NULL - call ->poll_mask before sleeping - various ACKs - add conversion of random to ->poll_mask - add conversion of af_alg to ->poll_mask - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL - reshuffled the series so that prep patches and everything not requiring the new in-kernel poll API is in the beginning Changes since V1: - handle the NULL ->poll case in vfs_poll - dropped the file argument to the ->poll_mask socket operation - replace the ->pre_poll socket operation with ->get_poll_head as in the file operations -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majord...@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: mailto:"a...@kvack.org;>a...@kvack.org
Re: [kvm-devel] [virtio-net][PATCH] Don't arm tx hrtimer with a constant 500us each transmit
Rusty Russell wrote: On Wednesday 12 December 2007 23:54:00 Dor Laor wrote: commit 763769621d271d92204ed27552d75448587c1ac0 Author: Dor Laor [EMAIL PROTECTED] Date: Wed Dec 12 14:52:00 2007 +0200 [virtio-net][PATCH] Don't arm tx hrtimer with a constant 50us each transmit The current start_xmit sets 500us hrtimer to kick the host. The problem is that if another xmit happens before the timer was fired then the first xmit will have to wait additional 500us. This patch does not re-arm the timer if there is existing one. This will shorten the latency for tx. Hi Dor! Yes, I pondered this when I wrote the code. On the one hand, it's a low-probability pathological corner case, on the other, your patch reduces the number of timer reprograms in the normal case. One thing that came up in our discussions is to let the host do the timer processing instead of the guest. When tx exit mitigation is enabled, the guest bumps the queue pointer, but carefully refrains from kicking the host. The host polls the tx pointer using a timer, kicking itself periodically; if polling yields no packets it disables tx exit mitigation. This saves the guest the bother of programming the timer, which presumably requires an exit if the timer is the closest one to expiration. [btw, this can be implemented in virtqueue rather than virtio-net, no?] -- Any sufficiently difficult bug is indistinguishable from a feature. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: On Thu, 2007-04-12 at 06:32 +0300, Avi Kivity wrote: I hadn't considered an always-blocking (or unbuffered) networking API. It's very counter to current APIs, but does make sense with things like syslets. Without syslets, I don't think it's very useful as you need some artificial threads to keep things humming along. (How would userspace specify it? O_DIRECT when opening the tap?) TBH, I hadn't thought that far. Tap already has those IFF_NO_PI etc flags, but it might make sense to just be the default. From userspace's POV it's not a semantic change. OK, just tested: I can get 230,000 packets (28 byte UDP) through the tun device in a second (130,000 actually out the 100-base-T NIC, 100,000 dropped). If the tun driver's write() blocks until the skb is destroyed, it's 4,000 packets. So your intuition was right: skb_free latency on xmit (at least for this e1000) is far too large for anything but an async solution. Will ponder further. I think aio_write (but done copyless-lessly) is the way to go. Not only is the infrastructure there, but the API already allows for multiple packet submission and for batching completions. Fitting into that framework ought to be easier than starting yet another one. It still misses scatter/gather and integration with fd-based notification, but there are patches around for that. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote: Nope. Being async is critical for copyless networking: - in the transmit path, so need to stop the sender (guest) from touching the memory until it's on the wire. This means 100% of packets sent will be blocked. Hi Avi, You keep saying stuff like this, and I keep ignoring it. OK, I'll bite: Why would we try to prevent the sender from altering the packets? To avoid data corruption. The guest wants to send a packet. It calls write(), which causes an skb to be allocated, data to be copied into it, the entire networking stack gets into gear, and the guest-side driver instructs the device to send the packet. With async operations, the saga continues like this: the host-side driver allocates an skb, get_page()s and attaches the data to the new skb, this skb crosses the bridge, trickles into the real ethernet device, gets queued there, sent, interrupts fire, triggering async completion. On this completion, we send a virtual interrupt to the guest, which tells it to destroy the skb and reclaim the pages attached to it. Without async operations, we don't have a hook to notify the guest when to reclaim the skb. If we do it too soon, the skb can be reclaimed and the memory reused before the real device gets to see it, so we end up sending data that we did not intend. The only way to avoid it is to copy the data somewhere safe, but that is exactly what we don't want to do. - multiple packets per operation (for interrupt mitigation) (like lio_listio) The benefits for interrupt mitigation are less clear to me in a virtual environment (scheduling tends to make it happen anyway); I'd want to benchmark it. Yes, the guest will probably submit multiple packets in one hypercall. It would be nice for the userspace driver to be able to submit them to the host kernel in one syscall. Some kind of batching to reduce syscall overhead, perhaps, but TSO would go a fair way towards that anyway (probably not enough). For some workloads, sure. - scatter/gather packets (iovecs) Yes, and this is already present in the tap device. Anthony suggested a slightly nasty hack for multiple sg packets in one writev()/readv, which could also give us batching. No need for hacks if we get list aio support one day. - configurable wakeup (by packet count/timeout) for queue management I'm not convinced that this is a showstopper, though. It probably isn't. It's free with aio though. - hacks (tso) I'd usually go for a batch interface over TSO, but if the card we're sending to actually does TSO then TSO will probably win. Sure, if tso helps a regular host then it should help one that happens to be running a virtual machine. Most of these can be provided by a combination of the pending aio work, the pending aio/fd integration, and the not-so-pending tap aio work. As the first two are available as patches and the third is limited to the tap device, it is not unreasonable to try it out. Maybe it will turn out not to be as difficult as I predicted just a few lines above. Indeed, I don't think we're asking for a revolution a-la VJ-style channels. But I'm still itching to get back to that, and this might yet provide an excuse 8) I'll be happy if this can be made to work. It will make the paravirt guest-side driver work in kvm-less setups, which are useful for testing, and of course reduction in kernel code is beneficial. It will be slower that in-kernel, but if we get the batching right, perhaps not significantly slower. I'm mostly concerned that this depends on code that has eluded merging for such a long time. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Evgeniy Polyakov wrote: On Mon, Apr 09, 2007 at 04:38:18PM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote: But I don't get this we can enhance the kernel but not userspace vibe 8( I've been waiting for network aio since ~2003. If it arrives in the next few days, I'm all for it; much more than kvm can use it profitably. But I'm not going to write that interface myself. Hmm, you missed at least two implementations of network aio in the previous year, and now with syslets we can have third one. I meant, network aio in the mainline kernel. I am aware of the various out-of-tree implementations. But it looks from this discussion, that it will not prevent from changing in-kernel driver - place a hook into skb allocation path and allocate data from opposing memory - get pages from another side and put them into fragments, then copy headers into skb-data. I don't understand this (opposing memory, another side?). Can you elaborate? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Evgeniy Polyakov wrote: But it looks from this discussion, that it will not prevent from changing in-kernel driver - place a hook into skb allocation path and allocate data from opposing memory - get pages from another side and put them into fragments, then copy headers into skb-data. I don't understand this (opposing memory, another side?). Can you elaborate? You want to implement zero-copy network device between host and guest, if I understood this thread correctly? So, for sending part, device allocates pages from receiver's memory (or from shared memory), receiver gets an 'interrupt' and got pages from own memory, which are attached to new skb and transferred up to the network stack. It can be extended to use shared ring of pages. This is what Xen does. It is actually less performant than copying, IIRC. The problem with flipping pages around is that physical addresses are cached both in the kvm mmu and in the on-chip tlbs, necessitating expensive page table walks and tlb invalidation IPIs. Note that for sending from the guest an external host can be done copylessly, and for the receive side using a dma engine (like I/OAT) can reduce the cost of the copy. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Evgeniy Polyakov wrote: This is what Xen does. It is actually less performant than copying, IIRC. The problem with flipping pages around is that physical addresses are cached both in the kvm mmu and in the on-chip tlbs, necessitating expensive page table walks and tlb invalidation IPIs. Hmm, I'm not familiar with Xen driver, but similar technique was used with zero-copy network sniffer some time ago, substituting userspace pages with pages containing skb data was about 25-50% faster than copying 1500 bytes in general, and in order of 10 times faster in some cases. Check a link please in case we are talking about different ideas: http://marc.info/?l=linux-netdevm=112262743505711w=2 I don't really understand what you're testing there. in particular, how can the copying time change so dramatically depending on whether you've just rebooted or not? -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Evgeniy Polyakov wrote: On Tue, Apr 10, 2007 at 03:17:45PM +0300, Avi Kivity ([EMAIL PROTECTED]) wrote: Check a link please in case we are talking about different ideas: http://marc.info/?l=linux-netdevm=112262743505711w=2 I don't really understand what you're testing there. in particular, how can the copying time change so dramatically depending on whether you've just rebooted or not? I tested page remapping time - i.e. time to replace a page in two different mappings - the same should be performed in host and guest kernels if such design is going to be used for communication. I can only explain after-reboot slow copy with empty caches - arbitrary kernel pages were copied into buffer (not the same data as in posted code). Doing this in kvm would be significantly more complex, as we'd need to use full reverse mapping to locate all guest mappings (we already reverse map writable pages for other reasons), so the 25-50% difference might be nullified or even turn into overhead. Here are the Xen numbers for reference. Xen probably has more overhead than kvm for such things, though, as it needs to do hypercalls from dom0 which is in-kernel for kvm. http://lists.xensource.com/archives/html/xen-devel/2007-03/msg01218.html -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: On Mon, 2007-04-09 at 16:38 +0300, Avi Kivity wrote: Moreover, some things just don't lend themselves to a userspace abstraction. If we want to expose tso (tcp segmentation offload), we can easily do so with a kernel driver since the kernel interfaces are all tso aware. Tacking on tso awareness to tun/tap is doable, but at the very least wierd. It is kinda weird, yes, but it certainly makes sense. All the arguments for tso apply in triplicate to userspace packet sends... Well, write() with a large buffer is a sort of tso device. The problem is tso breaks through several layers (like I'm advocating in the other thread :), pushing tcp functionality into ethernet. Well, we've seen worse. We're dealing with the tun/tap device here, not a socket. Hmm. tun actually has aio_write implemented, but it seems synchronous. So does the read path. If these are made truly asynchronous, and the write path is made in addition copyless, then we might have something workable. I still cringe at having a pagetable walk in order to deliver a 1500-byte packet. Right, now we're talking! However, it's not clear to me why creating an skb which references a kvm guest's memory doesn't need a pagetable walk, but a packet in (other) userspace memory does? Currently guest pages are stashed in a kernel array, as well as being mmap()ed into user space. That's not a very strong argument though, as I'd like to be map userspace memory into the guest, or map address_spaces to the guest, or something, so accessing guest physical memory will become more expensive in time. My conviction which started this discussion is that if we can offer an efficient interface for kvm, we should be able to offer an efficient interface for any (other) userspace. Fully agreed. It's mostly a question of who and when. Designing and implementing this interface is going to be difficult, require deep knowledge of Linux networking, and consume a lot of time. As to async, I'm not *so* worried about that for the moment, although it would probably be nicer to fail than to block. Otherwise we could simply set an skb destructor to wake us up. Nope. Being async is critical for copyless networking: - in the transmit path, so need to stop the sender (guest) from touching the memory until it's on the wire. This means 100% of packets sent will be blocked. - in the receive path, you could separate receive notification from the single copy that must be done (like poll() + read()), but to make use of dma engines you need to provide the end address beforehand. I think the first step is to see how much worse a decent userspace net driver is compared with the current in-kernel one. A userspace net interface needs to provide the following: - true async operations - multiple packets per operation (for interrupt mitigation) (like lio_listio) - scatter/gather packets (iovecs) - configurable wakeup (by packet count/timeout) for queue management - hacks (tso) Most of these can be provided by a combination of the pending aio work, the pending aio/fd integration, and the not-so-pending tap aio work. As the first two are available as patches and the third is limited to the tap device, it is not unreasonable to try it out. Maybe it will turn out not to be as difficult as I predicted just a few lines above. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: On Sun, 2007-04-08 at 08:36 +0300, Avi Kivity wrote: Rusty Russell wrote: Hi Avi, I don't think you've thought about this very hard. The receive copy is completely independent with whether the packet is going to the guest via a kernel driver or via userspace, so not relevant. A packet received in the kernel cannot be made available to userspace in a safe manner without a copy, as it will not be aligned with page boundaries, so userspace cannot examine the packet until after one copy has occured. Hi Avi! I'm a little puzzled by your response. Hmm... lguest's userspace network frontend does exactly as many copies as Ingo's in-host-kernel code. One from the Guest, one to the Guest. kvm pvnet is suboptimal now. The number of copies could be reduced by two (to zero), by constructing an skb that points to guest memory. Right now, this can only be done in-kernel. With current userspace networking interfaces, one cannot build a network device that has less than one copy on transmit, because sendmsg() *must* copy the data (as there is no completion notification). sendfilev(), even if it existed, cannot be used: it is copyless, but lacks completion notification. It is useful only on unchanging data like read-only files. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: On Mon, 2007-04-09 at 10:10 +0300, Avi Kivity wrote: Rusty Russell wrote: I'm a little puzzled by your response. Hmm... lguest's userspace network frontend does exactly as many copies as Ingo's in-host-kernel code. One from the Guest, one to the Guest. kvm pvnet is suboptimal now. The number of copies could be reduced by two (to zero), by constructing an skb that points to guest memory. Right now, this can only be done in-kernel. Sorry, you lost me here. You mean both input and output copies can be eliminated? Or are you talking about another two copies somewhere? On the transmit path, current kvm pvnet has two copies: 1. on the guest side, the driver copies the skb data into the shared ring 2. on the host side, the device copies the data from the ring into a newly allocated skb Both of these copies can be eliminated with a host-side kernel. With current userspace interfaces, only one copy can be eliminated. Similar logic applies to receive, except that one copy must remain. But I don't get this we can enhance the kernel but not userspace vibe 8( I've been waiting for network aio since ~2003. If it arrives in the next few days, I'm all for it; much more than kvm can use it profitably. But I'm not going to write that interface myself. Moreover, some things just don't lend themselves to a userspace abstraction. If we want to expose tso (tcp segmentation offload), we can easily do so with a kernel driver since the kernel interfaces are all tso aware. Tacking on tso awareness to tun/tap is doable, but at the very least wierd. With current userspace networking interfaces, one cannot build a network device that has less than one copy on transmit, because sendmsg() *must* copy the data (as there is no completion notification). Why are you talking about sendmsg()? Perhaps this is where we're getting tangled up. We're dealing with the tun/tap device here, not a socket. Hmm. tun actually has aio_write implemented, but it seems synchronous. So does the read path. If these are made truly asynchronous, and the write path is made in addition copyless, then we might have something workable. I still cringe at having a pagetable walk in order to deliver a 1500-byte packet. sendfilev(), even if it existed, cannot be used: it is copyless, but lacks completion notification. It is useful only on unchanging data like read-only files. Again, sendfile is a *much* harder problem than sending a single packet once, which is the question here. sendfile() is a *different* problem. It doesn't need completion because the data is assumed not to change under it. Consider that the guest may be issuing a megabyte-sized sendfile() which is broken into 17 tso frames. We need to preserve the large structures as much as possible or we end up repeating the simple single packet once path 700 times. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: On Thu, 2007-04-05 at 10:17 +0300, Avi Kivity wrote: Rusty Russell wrote: You didn't quote Anthony's point about it's more about there not being good enough userspace interfaces to do network IO. It's easier to write a kernel-space network driver, but it's not obviously the right thing to do until we can show that an efficient packet-level userspace interface isn't possible. I don't think that's been done, and it would be interesting to try. In the case of networking, the copyful interfaces on receive are driven by the hardware not knowing how to split the header from the data. On transmit I agree, it could be made copyless from userspace (somthing like sendfilev, only not file oriented). Hi Avi, I don't think you've thought about this very hard. The receive copy is completely independent with whether the packet is going to the guest via a kernel driver or via userspace, so not relevant. A packet received in the kernel cannot be made available to userspace in a safe manner without a copy, as it will not be aligned with page boundaries, so userspace cannot examine the packet until after one copy has occured. After userspace has determined what to do with the packet, another copy must take place to get it there. There's a counterexample, mmapped sockets, but that works only when all packets arriving on a card are exposed to the same process. This is useful for tcpdump or for what you outline below but is hardly generic. And if all packets from the card are going to the guest, you can deliver directly. Userspace or kernel, no difference. That is not the common case. Nor is it true when there is a mismatch between the card's capabilties and guest expectations and constraints. For example, guest memory is not physically contiguous so a NIC that won't do scatter/gather will require bouncing (or an iommu, but that's not here yet). And we have a sendfilev not file oriented: it's called writev 8) writev() cannot be made copyless for networking. One needs an async interface so the kernel can complete the write after the NIC acks the dma transfer, or a kernel driver. An in-kernel driver can avoid system call overhead and page references. But a better tap device helps more than just KVM. I'll believe it when I see it. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Ingo Molnar wrote: so right now the only option for a clean codebase is the KVM in-kernel code. I strongly disagree with this. Bad code in userspace is not an excuse for shoving stuff into the kernel, where maintaining it is much more expensive, and the cause of a mistake can be system crashes and data loss, affecting unrelated processes. If we move something into the kernel, we'd better have a really good reason for it. Qemu code _is_ crufty. We can do one of three things: 1. live with it 2. fork it and clean it up 3. clean it up incrementally and merge it upstream Currently we're doing (1). You're suggesting a variant of (2), fork plus move into the kernel. The right thing to do IMO is (3), but I don't see anybody volunteering. Qemu picked up additional committers recently and I believe they would be receptive to cleanups. [In the *pic/pit case, we have other reasons to push things into the kernel. But this code is crap, let's rewrite it in the kernel is not a justification I'll accept. I'd be much happier if we could quantify these other reasons.] -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Ingo Molnar wrote: * Avi Kivity [EMAIL PROTECTED] wrote: so right now the only option for a clean codebase is the KVM in-kernel code. I strongly disagree with this. are you disagreeing with my statement that the KVM kernel-side code is the only clean codebase here? To me this is a clear fact :) No, I agree with that. I just disagree with choosing to put the *pic code (or other code) into the kernel on *that* basis. The selection should be on design/performance issues alone, *not* the state of existing code. I only pointed out that the only clean codebase at the moment is the KVM in-kernel code - i did not make the argument (at all) that every new piece of KVM code should be done in the kernel. That would be stupid - do you think i'd advocate for example moving command line argument parsing into the kernel? No. But the difference in cruftiness between kvm and qemu code should not enter into the discussion of where to do things. and as i said in the mail: the kernel _is_ the best place to do this particular stuff. I agree with this, maybe for different reasons. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote: You didn't quote Anthony's point about it's more about there not being good enough userspace interfaces to do network IO. It's easier to write a kernel-space network driver, but it's not obviously the right thing to do until we can show that an efficient packet-level userspace interface isn't possible. I don't think that's been done, and it would be interesting to try. In the case of networking, the copyful interfaces on receive are driven by the hardware not knowing how to split the header from the data. On transmit I agree, it could be made copyless from userspace (somthing like sendfilev, only not file oriented). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/4] kevent: AIO, aio_sendfile() implementation.
David Miller wrote: From: Christoph Hellwig [EMAIL PROTECTED] Date: Wed, 26 Jul 2006 11:04:31 +0100 And to be honest, I don't think adding all this code is acceptable if it can't replace the existing aio code while keeping the interface. So while you interface looks pretty sane the implementation needs a lot of work still :) Networking and disk AIO have significantly different needs. Surely, there needs to be a unified polling interface to support single threaded designs. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: possible recursive locking in ATM layer
Arjan van de Ven wrote: From: Arjan van de Ven [EMAIL PROTECTED] Linux version 2.6.17-git22 ([EMAIL PROTECTED]) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #20 PREEMPT Tue Jul 4 10:35:04 CEST 2006 [ 2381.598609] = [ 2381.619314] [ INFO: possible recursive locking detected ] [ 2381.635497] - [ 2381.651706] atmarpd/2696 is trying to acquire lock: [ 2381.666354] (skb_queue_lock_key){-+..}, at: [c028c540] skb_migrate+0x24/0x6c [ 2381.688848] ok this is a real potential deadlock in a way, it takes two locks of 2 skbuffs without doing any kind of lock ordering; I think the following patch should fix it. Just sort the lock taking order by address of the skb.. it's not pretty but it's the best this can do in a minimally invasive way. Isn't it a deadlock only if skb_migrate(a, b) and skb_migrate(b, a) can be called concurrently? -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Van Jacobson's net channels and real-time
Ingo Oeser wrote: Hi Jörn, On Saturday, 22. April 2006 13:48, Jörn Engel wrote: Unless I completely misunderstand something, one of the main points of the netchannels if to have *zero* fields written to by both producer and consumer. Hmm, for me the main point was to keep the complete processing of a single packet within one CPU/Core where this is a non-issue. But the interrupt for a packet can be received by cpu 0 whereas the rest of processing proceeds on cpu 1; so it still helps to keep the producer index and consumer index on separate cachelines. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] ioat: DMA engine support
Andi Kleen wrote: Don't forget that there are benefits of not polluting the cache with the traffic for the incoming skbs. Is that a general benefit outside benchmarks? I would expect most real programs to actually do something with the data - and that usually involves needing it in cache. As an example, an NFS server reads some data pages using iSCSI and sends them using NFS/TCP (or vice versa). In the I/O AT case it might make sense to do a few prefetch()es of the userland data on the return-to-userspace code path. Some prefetches for user space might be a good idea yes As long as they can be turned off. Not all usespace applications want to touch the data immediately. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] ioat: DMA engine support
Andi Kleen wrote: As an example, an NFS server reads some data pages using iSCSI and sends them using NFS/TCP (or vice versa). For TX this can be done zero copy using a sendfile like setup. Yes, or with aio send for anonymous memory. For RX it may help - but my point was that most applications are not structured in this simple way. Agreed. But those that do care, care very much. The data mover applications, simply because they don't touch the data, expect very high bandwidth. As long as they can be turned off. Not all usespace applications want to touch the data immediately. Perhaps. And lots of others might. Of course the simple network benchmarks don't so the number on them look good. There are very real non-benchmark applications that want this. Just pointing out that it's not clear it will always be a big help. Agree it should default to in-cache. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html