Re: Infrastructure for zerocopy I/O
On Tue, 8 Dec 2015, Steinar H. Gunderson wrote: > Do you know what the status is for the LPM blacklisting patch we discussed > earlier? Last I heard, Baolu Lu was working on a more general solution. In its absence, maybe I should go ahead and submit the patch. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, 7 Dec 2015, Steinar H. Gunderson wrote: > On Mon, Dec 07, 2015 at 10:45:55AM -0500, Alan Stern wrote: > >> Here we are. Lightly tested, I believe all comments should be addressed. > > This looks quite good. I have only a couple of comments. > > Excellent. I'm sorry we've needed so many rounds; this is what happens when > you pick up someone else's patch against a code base you've never touched > before. :-) Of course. And this was not a particularly simple patch. > +static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count) > +{ > + struct usb_dev_state *ps = usbm->ps; > + unsigned long flags; > + > + spin_lock_irqsave(>lock, flags); > + --*count; > + if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) { > + list_del(>memlist); > + spin_unlock_irqrestore(>lock, flags); > + > + usb_free_coherent(ps->dev, usbm->size, usbm->mem, > + usbm->dma_handle); > + usbfs_decrease_memory_usage( > + usbm->size + sizeof(struct usb_memory)); > + kfree(usbm); > + } else { > + spin_unlock_irqrestore(>lock, flags); > + } > +} > +static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + struct usb_memory *usbm = NULL; > + struct usb_dev_state *ps = file->private_data; > + size_t size = vma->vm_end - vma->vm_start; > + void *mem; > + unsigned long flags; > + dma_addr_t dma_handle; > + int ret; > + > + ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory)); > + if (ret) { > + return ret; > + } > + > + usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL); > + if (!usbm) { > + usbfs_decrease_memory_usage( > + size + sizeof(struct usb_memory)); > + return -ENOMEM; > + } > + > + mem = usb_alloc_coherent(ps->dev, size, GFP_KERNEL, _handle); > + if (!mem) { > + usbfs_decrease_memory_usage( > + size + sizeof(struct usb_memory)); > + kfree(usbm); > + return -ENOMEM; > + } > + > + memset(mem, 0, size); > + > + usbm->mem = mem; > + usbm->dma_handle = dma_handle; > + usbm->size = size; > + usbm->ps = ps; > + usbm->vm_start = vma->vm_start; > + usbm->vma_use_count = 1; > + > + if (remap_pfn_range(vma, vma->vm_start, > + virt_to_phys(usbm->mem) >> PAGE_SHIFT, > + size, vma->vm_page_prot) < 0) { > + dec_usb_memory_use_count(usbm, >vma_use_count); > + return -EAGAIN; > + } I just noticed this -- usbm->memlist needs to be initialized before dec_usb_memory_use_count() can be called. INIT_LIST_HEAD before the "if" statement will fix the problem. The patch could use one more ingredient. In include/uapi/linux/usbdevicefs_h, add a #define for USBDEVFS_CAP_MMAP and fix up proc_get_capabilities() to report the new capability. In theory, user programs could just try an mmap() call and see if it works, but this seems cleaner. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Dec 08, 2015 at 11:07:55PM +0100, Steinar H. Gunderson wrote: > On Tue, Dec 08, 2015 at 05:01:46PM -0500, Alan Stern wrote: > > I don't see anything else to change. You can submit this to Greg KH > > and add: > > > > Acked-by: Alan Stern> > Great! > > How do I submit? Just do git send-email to some address? Yes, use scripts/get_maintainer.pl to get my email and this mailing list's address to use. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 8 Dec 2015, Steinar H. Gunderson wrote: > On Tue, Dec 08, 2015 at 03:26:28PM -0500, Alan Stern wrote: > > I just noticed this -- usbm->memlist needs to be initialized before > > dec_usb_memory_use_count() can be called. INIT_LIST_HEAD before the > > "if" statement will fix the problem. > > Done. > > > The patch could use one more ingredient. In > > include/uapi/linux/usbdevicefs_h, add a #define for USBDEVFS_CAP_MMAP > > and fix up proc_get_capabilities() to report the new capability. In > > theory, user programs could just try an mmap() call and see if it > > works, but this seems cleaner. > > Done. > > /* Steinar */ > > From 03318604cc6ad9def451784da407e6fcd6af4705 Mon Sep 17 00:00:00 2001 > From: "Steinar H. Gunderson"> Date: Thu, 26 Nov 2015 01:19:13 +0100 > Subject: [PATCH] Add support for usbfs zerocopy. > > Add a new interface for userspace to preallocate memory that can be > used with usbfs. This gives two primary benefits: > > - Zerocopy; data no longer needs to be copied between the userspace >and the kernel, but can instead be read directly by the driver from >userspace's buffers. This works for all kinds of transfers (even if >nonsensical for control and interrupt transfers); isochronous also >no longer need to memset() the buffer to zero to avoid leaking kernel data. > > - Once the buffers are allocated, USB transfers can no longer fail due to >memory fragmentation; previously, long-running programs could run into >problems finding a large enough contiguous memory chunk, especially on >embedded systems or at high rates. > > Memory is allocated by using mmap() against the usbfs file descriptor, > and similarly deallocated by munmap(). Once memory has been allocated, > using it as pointers to a bulk or isochronous operation means you will > automatically get zerocopy behavior. Note that this also means you cannot > modify outgoing data until the transfer is complete. The same holds for > data on the same cache lines as incoming data; DMA modifying them at the > same time could lead to your changes being overwritten. > > There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see > if the running kernel supports this functionality, if just trying mmap() is > not acceptable. > > Largely based on a patch by Markus Rechberger with some updates. The original > patch can be found at: > > http://sundtek.de/support/devio_mmap_v0.4.diff > > Signed-off-by: Steinar H. Gunderson > Signed-off-by: Markus Rechberger I don't see anything else to change. You can submit this to Greg KH and add: Acked-by: Alan Stern Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Dec 08, 2015 at 05:01:46PM -0500, Alan Stern wrote: > I don't see anything else to change. You can submit this to Greg KH > and add: > > Acked-by: Alan SternGreat! How do I submit? Just do git send-email to some address? Do you know what the status is for the LPM blacklisting patch we discussed earlier? /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Dec 08, 2015 at 03:26:28PM -0500, Alan Stern wrote: > I just noticed this -- usbm->memlist needs to be initialized before > dec_usb_memory_use_count() can be called. INIT_LIST_HEAD before the > "if" statement will fix the problem. Done. > The patch could use one more ingredient. In > include/uapi/linux/usbdevicefs_h, add a #define for USBDEVFS_CAP_MMAP > and fix up proc_get_capabilities() to report the new capability. In > theory, user programs could just try an mmap() call and see if it > works, but this seems cleaner. Done. /* Steinar */ >From 03318604cc6ad9def451784da407e6fcd6af4705 Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. Add a new interface for userspace to preallocate memory that can be used with usbfs. This gives two primary benefits: - Zerocopy; data no longer needs to be copied between the userspace and the kernel, but can instead be read directly by the driver from userspace's buffers. This works for all kinds of transfers (even if nonsensical for control and interrupt transfers); isochronous also no longer need to memset() the buffer to zero to avoid leaking kernel data. - Once the buffers are allocated, USB transfers can no longer fail due to memory fragmentation; previously, long-running programs could run into problems finding a large enough contiguous memory chunk, especially on embedded systems or at high rates. Memory is allocated by using mmap() against the usbfs file descriptor, and similarly deallocated by munmap(). Once memory has been allocated, using it as pointers to a bulk or isochronous operation means you will automatically get zerocopy behavior. Note that this also means you cannot modify outgoing data until the transfer is complete. The same holds for data on the same cache lines as incoming data; DMA modifying them at the same time could lead to your changes being overwritten. There's a new capability USBDEVFS_CAP_MMAP that userspace can query to see if the running kernel supports this functionality, if just trying mmap() is not acceptable. Largely based on a patch by Markus Rechberger with some updates. The original patch can be found at: http://sundtek.de/support/devio_mmap_v0.4.diff Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 230 +- include/uapi/linux/usbdevice_fs.h | 1 + 2 files changed, 206 insertions(+), 25 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 38ae877c..3349222 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,7 @@ struct usb_dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -79,6 +81,17 @@ struct usb_dev_state { u32 disabled_bulk_eps; }; +struct usb_memory { + struct list_head memlist; + int vma_use_count; + int urb_use_count; + u32 size; + void *mem; + dma_addr_t dma_handle; + unsigned long vm_start; + struct usb_dev_state *ps; +}; + struct async { struct list_head asynclist; struct usb_dev_state *ps; @@ -89,6 +102,7 @@ struct async { void __user *userbuffer; void __user *userurb; struct urb *urb; + struct usb_memory *usbm; unsigned int mem_usage; int status; u32 secid; @@ -157,6 +171,108 @@ static int connected(struct usb_dev_state *ps) ps->dev->state != USB_STATE_NOTATTACHED); } +static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count) +{ + struct usb_dev_state *ps = usbm->ps; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + --*count; + if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) { + list_del(>memlist); + spin_unlock_irqrestore(>lock, flags); + + usb_free_coherent(ps->dev, usbm->size, usbm->mem, + usbm->dma_handle); + usbfs_decrease_memory_usage( + usbm->size + sizeof(struct usb_memory)); + kfree(usbm); + } else { + spin_unlock_irqrestore(>lock, flags); + } +} + +static void usbdev_vm_open(struct vm_area_struct *vma) +{ + struct usb_memory *usbm = vma->vm_private_data; + unsigned long flags; + + spin_lock_irqsave(>ps->lock, flags); + ++usbm->vma_use_count; + spin_unlock_irqrestore(>ps->lock, flags); +} +
Re: Infrastructure for zerocopy I/O
On Sun, 6 Dec 2015, Steinar H. Gunderson wrote: > On Sun, Dec 06, 2015 at 01:06:08AM +0100, Steinar H. Gunderson wrote: > > I'll try to update your patch with the other suggestions tomorrow. Thanks! > > Here we are. Lightly tested, I believe all comments should be addressed. This looks quite good. I have only a couple of comments. > /* Steinar */ > > From 73833276a4f359c35edffc2a741dba57f2e82a12 Mon Sep 17 00:00:00 2001 > From: "Steinar H. Gunderson"> Date: Thu, 26 Nov 2015 01:19:13 +0100 > Subject: [PATCH] Add support for usbfs zerocopy. > > Add a new interface for userspace to preallocate memory that can be > used with usbfs. This gives two primary benefits: > > - Zerocopy; data no longer needs to be copied between the userspace >and the kernel, but can instead be read directly by the driver from >userspace's buffers. This works for both bulk and isochronous transfers; >isochronous also no longer need to memset() the buffer to zero to avoid >leaking kernel data. Actually it now works for control and interrupt too, although there's not much point using it for them. > - Once the buffers are allocated, USB transfers can no longer fail due to >memory fragmentation; previously, long-running programs could run into >problems finding a large enough contiguous memory chunk, especially on >embedded systems or at high rates. > @@ -1439,6 +1592,19 @@ static int proc_do_submiturb(struct usb_dev_state *ps, > struct usbdevfs_urb *uurb > goto error; > } > > + as->usbm = find_memory_area(ps, uurb); > + if (IS_ERR(as->usbm)) { > + ret = PTR_ERR(as->usbm); > + goto error; Hmmm, and then what will happen when the error-handling code does this: + if (as && as->usbm) + dec_usb_memory_use_count(as->usbm, >usbm->urb_use_count); ? > + } > + > + /* do not use SG buffers when memory mapped segments > + * are in use > + */ > + if (!as->usbm) { You mean "if (as->usbm)". > + num_sgs = 0; > + } Braces aren't needed. > + > u += sizeof(struct async) + sizeof(struct urb) + uurb->buffer_length + >num_sgs * sizeof(struct scatterlist); > ret = usbfs_increase_memory_usage(u); > @@ -1476,14 +1642,23 @@ static int proc_do_submiturb(struct usb_dev_state > *ps, struct usbdevfs_urb *uurb > totlen -= u; > } > } else if (uurb->buffer_length > 0) { > - as->urb->transfer_buffer = kmalloc(uurb->buffer_length, > - GFP_KERNEL); > - if (!as->urb->transfer_buffer) { > - ret = -ENOMEM; > - goto error; > + if (as->usbm) { > + unsigned long uurb_start = (unsigned long)uurb->buffer; > + > + as->urb->transfer_buffer = as->usbm->mem + > + (uurb_start - as->usbm->vm_start); > + } else { > + as->urb->transfer_buffer = kmalloc(uurb->buffer_length, > + GFP_KERNEL); > + if (!as->urb->transfer_buffer) { > + ret = -ENOMEM; > + goto error; > + } > } > > - if (!is_in) { > + if (as->usbm) { > + ; /* Transfer buffer is okay as it is */ > + } else if (!is_in) { > if (copy_from_user(as->urb->transfer_buffer, > uurb->buffer, > uurb->buffer_length)) { It looks odd repeating the "if (as->usbm)" test like this. You can merge the stuff here into the "else" clause of the preceding code. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Dec 07, 2015 at 10:45:55AM -0500, Alan Stern wrote: >> Here we are. Lightly tested, I believe all comments should be addressed. > This looks quite good. I have only a couple of comments. Excellent. I'm sorry we've needed so many rounds; this is what happens when you pick up someone else's patch against a code base you've never touched before. :-) >> - Zerocopy; data no longer needs to be copied between the userspace >>and the kernel, but can instead be read directly by the driver from >>userspace's buffers. This works for both bulk and isochronous transfers; >>isochronous also no longer need to memset() the buffer to zero to avoid >>leaking kernel data. > Actually it now works for control and interrupt too, although there's > not much point using it for them. Updated description. >> +as->usbm = find_memory_area(ps, uurb); >> +if (IS_ERR(as->usbm)) { >> +ret = PTR_ERR(as->usbm); >> +goto error; > Hmmm, and then what will happen when the error-handling code does this: Urg, you pointed this out earlier and I didn't fix it. I'm setting it to NULL in the error path now. >> +} >> + >> +/* do not use SG buffers when memory mapped segments >> + * are in use >> + */ >> +if (!as->usbm) { > You mean "if (as->usbm)". Yes. Test now inverted. >> +num_sgs = 0; >> +} > Braces aren't needed. Went to the dentist to take them off. > It looks odd repeating the "if (as->usbm)" test like this. You can merge > the stuff here into the "else" clause of the preceding code. Merged. Updated patch below. /* Steinar */ >From ec536f8abc16d9a920a0891db3bc9b6c05c2c43f Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. Add a new interface for userspace to preallocate memory that can be used with usbfs. This gives two primary benefits: - Zerocopy; data no longer needs to be copied between the userspace and the kernel, but can instead be read directly by the driver from userspace's buffers. This works for all kinds of transfers (even if nonsensical for control and interrupt transfers); isochronous also no longer need to memset() the buffer to zero to avoid leaking kernel data. - Once the buffers are allocated, USB transfers can no longer fail due to memory fragmentation; previously, long-running programs could run into problems finding a large enough contiguous memory chunk, especially on embedded systems or at high rates. Memory is allocated by using mmap() against the usbfs file descriptor, and similarly deallocated by munmap(). Once memory has been allocated, using it as pointers to a bulk or isochronous operation means you will automatically get zerocopy behavior. Note that this also means you cannot modify outgoing data until the transfer is complete. The same holds for data on the same cache lines as incoming data; DMA modifying them at the same time could lead to your changes being overwritten. Largely based on a patch by Markus Rechberger with some updates. The original patch can be found at: http://sundtek.de/support/devio_mmap_v0.4.diff Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 227 ++- 1 file changed, 203 insertions(+), 24 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 38ae877c..5faf1a0 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,7 @@ struct usb_dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -79,6 +81,17 @@ struct usb_dev_state { u32 disabled_bulk_eps; }; +struct usb_memory { + struct list_head memlist; + int vma_use_count; + int urb_use_count; + u32 size; + void *mem; + dma_addr_t dma_handle; + unsigned long vm_start; + struct usb_dev_state *ps; +}; + struct async { struct list_head asynclist; struct usb_dev_state *ps; @@ -89,6 +102,7 @@ struct async { void __user *userbuffer; void __user *userurb; struct urb *urb; + struct usb_memory *usbm; unsigned int mem_usage; int status; u32 secid; @@ -157,6 +171,107 @@ static int connected(struct usb_dev_state *ps) ps->dev->state != USB_STATE_NOTATTACHED); } +static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count) +{ + struct usb_dev_state *ps = usbm->ps; + unsigned long
Re: Infrastructure for zerocopy I/O
On Sun, 6 Dec 2015, Steinar H. Gunderson wrote: > >>> You don't really need to do it earlier. An unnecessary calculation of > >>> num_sgs won't hurt. > >> Is it unnecessary? I thought the test was really to force num_sgs==0 for > >> the > >> DMA case, not to save a little arithmetic. > > Well, if you calculate num_sgs and then force it to 0 anyway, the > > calculation was unnecessary, right? > > But the logic never calculates num_sgs if usbm == NULL? The if terminates > early and num_sgs stays 0. At this point you don't know whether as->usbm will end up being NULL. > >> - num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE); > >> - if (num_sgs == 1 || num_sgs > ps->dev->bus->sg_tablesize) > >> - num_sgs = 0; > >> + /* do not use SG buffers when memory mapped segments > >> + * are in use > >> + */ > >> + if (as->usbm) { > >> + num_sgs = DIV_ROUND_UP(uurb->buffer_length, > >> + USB_SG_SIZE); > >> + if (num_sgs == 1 || > >> + num_sgs > ps->dev->bus->sg_tablesize) { > >> + num_sgs = 0; > >> + } > >> + } > > No, no. Leave this part of the code unchanged. as hasn't even been > > allocated yet. > > Good point. I did warn you about the lack of testing... :-) > > By “unchanged”, do you mean unchanged from previous patch or from the > original? Because this was here all along from Markus' version of the patch. Unchanged from the original. After all, your patch gets applied to the original code, not on top of Markus's patch or any of your earlier patches. Therefore anything your patch doesn't touch will remain the same as the original code, not the same as in Markus's patch. > > As pointed out repeatedly by the kbuild test robot, this should be > > IS_ERR, not IS_ERR_VALUE. Also, you need to set as->usbm back to NULL > > before jumping. > > > > At this point, set num_sgs to 0 if as->usbm is non-NULL. > > Didn't you just point out that this would be redundant calculation? No, I pointed out that the calculation of num_sgs earlier would be unnecessary. But it's also unavoidable, since you don't know at the time whether it will turn out to be necessary or not. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Sun, Dec 06, 2015 at 01:06:08AM +0100, Steinar H. Gunderson wrote: > I'll try to update your patch with the other suggestions tomorrow. Thanks! Here we are. Lightly tested, I believe all comments should be addressed. /* Steinar */ >From 73833276a4f359c35edffc2a741dba57f2e82a12 Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. Add a new interface for userspace to preallocate memory that can be used with usbfs. This gives two primary benefits: - Zerocopy; data no longer needs to be copied between the userspace and the kernel, but can instead be read directly by the driver from userspace's buffers. This works for both bulk and isochronous transfers; isochronous also no longer need to memset() the buffer to zero to avoid leaking kernel data. - Once the buffers are allocated, USB transfers can no longer fail due to memory fragmentation; previously, long-running programs could run into problems finding a large enough contiguous memory chunk, especially on embedded systems or at high rates. Memory is allocated by using mmap() against the usbfs file descriptor, and similarly deallocated by munmap(). Once memory has been allocated, using it as pointers to a bulk or isochronous operation means you will automatically get zerocopy behavior. Note that this also means you cannot modify outgoing data until the transfer is complete. The same holds for data on the same cache lines as incoming data; DMA modifying them at the same time could lead to your changes being overwritten. Largely based on a patch by Markus Rechberger with some updates. The original patch can be found at: http://sundtek.de/support/devio_mmap_v0.4.diff Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 202 --- 1 file changed, 192 insertions(+), 10 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 38ae877c..40f988e 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,7 @@ struct usb_dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -79,6 +81,17 @@ struct usb_dev_state { u32 disabled_bulk_eps; }; +struct usb_memory { + struct list_head memlist; + int vma_use_count; + int urb_use_count; + u32 size; + void *mem; + dma_addr_t dma_handle; + unsigned long vm_start; + struct usb_dev_state *ps; +}; + struct async { struct list_head asynclist; struct usb_dev_state *ps; @@ -89,6 +102,7 @@ struct async { void __user *userbuffer; void __user *userurb; struct urb *urb; + struct usb_memory *usbm; unsigned int mem_usage; int status; u32 secid; @@ -157,6 +171,107 @@ static int connected(struct usb_dev_state *ps) ps->dev->state != USB_STATE_NOTATTACHED); } +static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count) +{ + struct usb_dev_state *ps = usbm->ps; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + --*count; + if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) { + list_del(>memlist); + spin_unlock_irqrestore(>lock, flags); + + usb_free_coherent(ps->dev, usbm->size, usbm->mem, + usbm->dma_handle); + usbfs_decrease_memory_usage( + usbm->size + sizeof(struct usb_memory)); + kfree(usbm); + } else { + spin_unlock_irqrestore(>lock, flags); + } +} + +static void usbdev_vm_open(struct vm_area_struct *vma) +{ + struct usb_memory *usbm = vma->vm_private_data; + unsigned long flags; + + spin_lock_irqsave(>ps->lock, flags); + ++usbm->vma_use_count; + spin_unlock_irqrestore(>ps->lock, flags); +} + +static void usbdev_vm_close(struct vm_area_struct *vma) +{ + struct usb_memory *usbm = vma->vm_private_data; + + dec_usb_memory_use_count(usbm, >vma_use_count); +} + +struct vm_operations_struct usbdev_vm_ops = { + .open = usbdev_vm_open, + .close = usbdev_vm_close +}; + +static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct usb_memory *usbm = NULL; + struct usb_dev_state *ps = file->private_data; + size_t size = vma->vm_end - vma->vm_start; + void *mem; + unsigned long flags; + dma_addr_t
Re: Infrastructure for zerocopy I/O
On Sat, Dec 05, 2015 at 12:36:51PM -0500, Alan Stern wrote: > I meant that this "if" statement should test only uurb_start. If the > test succeeds, a nested "if" statement should test buffer_length and > return an error if it is too big. > > That's as opposed to the way you have it now, where if buffer_length is > too big you return NULL. The system would then try to treat the > coherent memory as a normal memory buffer, which may not be a good > idea. OK. Why is it a problem to treat it as a normal memory buffer? (I understand it would be slower, of course.) >> I can't see >> offhand another way to call it only once during the function, save for a bool >> that says if we called it or not. > Simply call it at the spot where you initialize as->usbm to NULL. Then > it will always run exactly once. OK, done. > You don't really need to do it earlier. An unnecessary calculation of > num_sgs won't hurt. Is it unnecessary? I thought the test was really to force num_sgs==0 for the DMA case, not to save a little arithmetic. > Comments below. I'll skip much of the patch because it looks pretty > good. Good, I guess we're finally converting on something acceptable to all parties :-) >> +ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory)); >> +if (ret) { >> +kfree(usbm); >> +return ret; >> +} > This part should come first. If allocating usbm would put us over the > memory limit, we want to know before we do the allocation. Done. >> +usbm->vm_start = vma->vm_start; >> +usbm->vma_use_count = 1; > I would move these two lines before the remap_pfn_range() call. Then > you can use >vma_use_count as the second argument to > dec_usb_memory_use_count() and avoid the awkward "if (count)" test at > the start of that routine. Done. > So this would be: > > if (uurb_start >= iter->vm_start && > uurb_start < iter->vm_start + iter->size) { > if (uurb->buffer_length > iter->vm_start + iter->size - > uurb_start) { > usbm = ERR_PTR(-EINVAL); > } else { > usbm = iter; > usbm->urb_use_count++; > } > break; > } > > (That's with the file's convention for continuation lines.) Done. > Much of this function is still subject to change, so I won't look at it > now. OK. /* Steinar */ -- Homepage: https://www.sesse.net/ >From 53f132d6fcd9117f50647e604487593adfe85990 Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. Add a new interface for userspace to preallocate memory that can be used with usbfs. This gives two primary benefits: - Zerocopy; data no longer needs to be copied between the userspace and the kernel, but can instead be read directly by the driver from userspace's buffers. This works for both bulk and isochronous transfers; isochronous also no longer need to memset() the buffer to zero to avoid leaking kernel data. - Once the buffers are allocated, USB transfers can no longer fail due to memory fragmentation; previously, long-running programs could run into problems finding a large enough contiguous memory chunk, especially on embedded systems or at high rates. Memory is allocated by using mmap() against the usbfs file descriptor, and similarly deallocated by munmap(). Once memory has been allocated, using it as pointers to a bulk or isochronous operation means you will automatically get zerocopy behavior. Note that this also means you cannot modify outgoing data until the transfer is complete. The same holds for data on the same cache lines as incoming data; DMA modifying them at the same time could lead to your changes being overwritten. Largely based on a patch by Markus Rechberger with some updates. The original patch can be found at: http://sundtek.de/support/devio_mmap_v0.4.diff Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 202 --- 1 file changed, 192 insertions(+), 10 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 38ae877c..6b253e8 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,7 @@ struct usb_dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -79,6 +81,17 @@ struct usb_dev_state {
Re: Infrastructure for zerocopy I/O
On Wed, Dec 02, 2015 at 10:54:52AM -0500, Alan Stern wrote: > Well, it's partly a matter of taste. But there's also a race: This > code adds usbm to ps->memory_list (making it available to URB > submissions running on other CPUs) before usbm has been completely > initialized (vm_start isn't set yet). OK. Joined the functions. > find_memory_area() is used in only one place. You can put everything > together if you move usbdev_mmap() and associated routines up near the > start of the file, after usbdev_read(). Done. I hope I got it the way you wanted. > (Yes, I did.) On further thought, testing uurb_start is sufficient. > If uurb->buffer_length then turns out to be too big, the function > should return an error (or rather, an ERR_PTR) and the URB submission > should fail. I don't understand “should” here. I read it as “you should change your function for this”, but this doesn't rhyme with that uurb_start should be sufficient. Do you want me to just remove the check? Is there something else that would guard against it? >> Done. (I saw you used a plural; is there more than this one?) > Well, alloc_urb_memory() also can fail. I can't find this function. What do you mean? >> New patch attached. However, it's not working; I'm getting several of these >> messages: >> >> [ 28.796244] DMAR: Allocating domain for 2-2 failed > I don't know what the reason is for that. It may be that your kernel > isn't configured to allocate as much coherent memory as you are asking > for. We'll have to investigate further, maybe ask somebody who knows > more about how the DMA subsystem works. I found out why; I had messed up the transfer_dma address calculation, subtracting the kernel memory address instead of the VM address. After that, the patch works again. >> +struct usb_memory { >> +struct list_head memlist; >> +int vma_use_count; >> +int usb_use_count; > This name has disturbed me. It seems like urb_use_count would be more > descriptive ("urb" rather than "usb"). Changed. > The GFP_DMA32 isn't necessary now. dma_zalloc_coherent() is smart > enough to allocate memory that is compatible with the device's DMA > mask. If the hardware can do 64-bit DMA then the buffer doesn't need > to be located in the first 4 GB. Done. > By the way, the convention in this file is to indent continuation > lines by two tab stops beyond the first line, not to align things with > an open paren or anything else. (Applies elsewhere too.) Done. (That's an unusual convention.) > I would allocate these two in the opposite order. Just because the > DMA routines involve more work than kzalloc/kfree. Done. My thought, by the way, was that the DMA routines were much more likely to fail. >> +usbm->mem = mem; >> +usbm->dma_handle = dma_handle; >> +usbm->size = size; >> +usbm->ps = ps; >> +spin_lock_irqsave(>lock, flags); >> +list_add_tail(>memlist, >memory_list); >> +spin_unlock_irqrestore(>lock, flags); > Like I mentioned earlier, it shouldn't be added to the list until it > is ready to be used. Done. >> static void free_async(struct async *as) >> { >> +struct usb_memory *usbm = NULL; > Not needed any more. Done. >> +if (find_memory_area(ps, uurb) == NULL) { > Don't call find_memory_area() twice! Save this result and use it > later. Also, move this call up before the "switch" statement so that > it will apply to all transfer types, not just bulk. > > Suitable adjustments will be needed in the rest of this routine, but > nothing that requires additional review comments. This part I don't understand. You want to populate usbm (by calling find_memory_area()) unconditionally, also for control transfers? I can't see offhand another way to call it only once during the function, save for a bool that says if we called it or not. New patch attached. Note that I haven't had the chance to retest after some of the latest changes. /* Steinar */ -- Homepage: https://www.sesse.net/ >From d03fc262072183887825f889996244e22ff74438 Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. Add a new interface for userspace to preallocate memory that can be used with usbfs. This gives two primary benefits: - Zerocopy; data no longer needs to be copied between the userspace and the kernel, but can instead be read directly by the driver from userspace's buffers. This works for both bulk and isochronous transfers; isochronous also no longer need to memset() the buffer to zero to avoid leaking kernel data. - Once the buffers are allocated, USB transfers can no longer fail due to memory fragmentation; previously, long-running programs could run into problems finding a large enough contiguous memory chunk, especially on embedded systems or at high rates. Memory is allocated by using mmap() against the usbfs file descriptor, and similarly deallocated by
Re: Infrastructure for zerocopy I/O
On Sat, 5 Dec 2015, Steinar H. Gunderson wrote: > > (Yes, I did.) On further thought, testing uurb_start is sufficient. > > If uurb->buffer_length then turns out to be too big, the function > > should return an error (or rather, an ERR_PTR) and the URB submission > > should fail. > > I don't understand “should” here. I read it as “you should change your > function for this”, but this doesn't rhyme with that uurb_start should be > sufficient. Do you want me to just remove the check? Is there something else > that would guard against it? I meant that this "if" statement should test only uurb_start. If the test succeeds, a nested "if" statement should test buffer_length and return an error if it is too big. That's as opposed to the way you have it now, where if buffer_length is too big you return NULL. The system would then try to treat the coherent memory as a normal memory buffer, which may not be a good idea. > >> Done. (I saw you used a plural; is there more than this one?) > > Well, alloc_urb_memory() also can fail. > > I can't find this function. What do you mean? Doesn't matter; it's not present in the current version of the patch. > > I would allocate these two in the opposite order. Just because the > > DMA routines involve more work than kzalloc/kfree. > > Done. My thought, by the way, was that the DMA routines were much more likely > to fail. No doubt that's true. The order doesn't really matter very much. But there is a similar issue in the same spot; see below. > >> + if (find_memory_area(ps, uurb) == NULL) { > > Don't call find_memory_area() twice! Save this result and use it > > later. Also, move this call up before the "switch" statement so that > > it will apply to all transfer types, not just bulk. > > > > Suitable adjustments will be needed in the rest of this routine, but > > nothing that requires additional review comments. > > This part I don't understand. You want to populate usbm (by calling > find_memory_area()) unconditionally, also for control transfers? I agree there's not much point in using this API for control or interrupt transfers. Still, we have to allow for the possibility that someone might do it, in order to avoid treating coherent memory like normal memory as mentioned above. > I can't see > offhand another way to call it only once during the function, save for a bool > that says if we called it or not. Simply call it at the spot where you initialize as->usbm to NULL. Then it will always run exactly once. You don't really need to do it earlier. An unnecessary calculation of num_sgs won't hurt. > New patch attached. Note that I haven't had the chance to retest after some > of the latest changes. Comments below. I'll skip much of the patch because it looks pretty good. > +static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + struct usb_memory *usbm = NULL; > + struct usb_dev_state *ps = file->private_data; > + size_t size = vma->vm_end - vma->vm_start; > + void *mem; > + unsigned long flags; > + dma_addr_t dma_handle; > + int ret; > + > + usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL); > + if (!usbm) > + return -ENOMEM; > + > + ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory)); > + if (ret) { > + kfree(usbm); > + return ret; > + } This part should come first. If allocating usbm would put us over the memory limit, we want to know before we do the allocation. > + > + mem = usb_alloc_coherent(ps->dev, size, GFP_KERNEL, _handle); > + if (!mem) { > + usbfs_decrease_memory_usage( > + usbm->size + sizeof(struct usb_memory)); > + kfree(usbm); > + return -ENOMEM; > + } > + > + memset(mem, 0, size); > + > + usbm->mem = mem; > + usbm->dma_handle = dma_handle; > + usbm->size = size; > + usbm->ps = ps; > + > + if (remap_pfn_range(vma, vma->vm_start, > + virt_to_phys(usbm->mem) >> PAGE_SHIFT, > + size, vma->vm_page_prot) < 0) { > + dec_usb_memory_use_count(usbm, NULL); > + return -EAGAIN; > + } > + > + usbm->vm_start = vma->vm_start; > + usbm->vma_use_count = 1; I would move these two lines before the remap_pfn_range() call. Then you can use >vma_use_count as the second argument to dec_usb_memory_use_count() and avoid the awkward "if (count)" test at the start of that routine. > +static struct usb_memory * > +find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb) > +{ > + struct usb_memory *usbm = NULL, *iter; > + unsigned long flags; > + unsigned long uurb_start = (unsigned long)uurb->buffer; > + > + spin_lock_irqsave(>lock, flags); > + list_for_each_entry(iter, >memory_list, memlist) { > + if (uurb_start >= iter->vm_start && > + uurb_start < iter->vm_start +
Re: Infrastructure for zerocopy I/O
On Sat, Dec 05, 2015 at 05:12:03PM -0500, Alan Stern wrote: > To tell the truth, I'm not sure whether it would be a problem or not. > That's why I said it "may" not be a good idea. Fair enough. >>> You don't really need to do it earlier. An unnecessary calculation of >>> num_sgs won't hurt. >> Is it unnecessary? I thought the test was really to force num_sgs==0 for the >> DMA case, not to save a little arithmetic. > Well, if you calculate num_sgs and then force it to 0 anyway, the > calculation was unnecessary, right? But the logic never calculates num_sgs if usbm == NULL? The if terminates early and num_sgs stays 0. Granted, the last version of the patch botched this and inverted the condition. I'll fix that. > BTW, in the future, could you put your patches in the body of the > emails instead of making them attachments? That's how we normally do > things on this mailing list; it makes replying and commenting on > patches easier. Will do. >> -num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE); >> -if (num_sgs == 1 || num_sgs > ps->dev->bus->sg_tablesize) >> -num_sgs = 0; >> +/* do not use SG buffers when memory mapped segments >> + * are in use >> + */ >> +if (as->usbm) { >> +num_sgs = DIV_ROUND_UP(uurb->buffer_length, >> +USB_SG_SIZE); >> +if (num_sgs == 1 || >> +num_sgs > ps->dev->bus->sg_tablesize) { >> +num_sgs = 0; >> +} >> +} > No, no. Leave this part of the code unchanged. as hasn't even been > allocated yet. Good point. I did warn you about the lack of testing... :-) By “unchanged”, do you mean unchanged from previous patch or from the original? Because this was here all along from Markus' version of the patch. > As pointed out repeatedly by the kbuild test robot, this should be > IS_ERR, not IS_ERR_VALUE. Also, you need to set as->usbm back to NULL > before jumping. > > At this point, set num_sgs to 0 if as->usbm is non-NULL. Didn't you just point out that this would be redundant calculation? I'll try to update your patch with the other suggestions tomorrow. Thanks! /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Sat, 5 Dec 2015, Steinar H. Gunderson wrote: > On Sat, Dec 05, 2015 at 12:36:51PM -0500, Alan Stern wrote: > > I meant that this "if" statement should test only uurb_start. If the > > test succeeds, a nested "if" statement should test buffer_length and > > return an error if it is too big. > > > > That's as opposed to the way you have it now, where if buffer_length is > > too big you return NULL. The system would then try to treat the > > coherent memory as a normal memory buffer, which may not be a good > > idea. > > OK. Why is it a problem to treat it as a normal memory buffer? (I understand > it would be slower, of course.) To tell the truth, I'm not sure whether it would be a problem or not. That's why I said it "may" not be a good idea. Mixing the usages would mean mapping the memory region for normal streaming DMA when it might already have been mapped for coherent DMA. Maybe that's okay; I don't know. It seems safest to avoid the issue. Besides, if the user program went to the trouble of mmap'ing a memory region and then ...: ... tried to present a USB buffer that overflowed the end of the region, we should point out the error. ... created a control or interrupt transfer with its USB buffer in the region, we should honor the program's request and perform a zerocopy I/O operation. > > You don't really need to do it earlier. An unnecessary calculation of > > num_sgs won't hurt. > > Is it unnecessary? I thought the test was really to force num_sgs==0 for the > DMA case, not to save a little arithmetic. Well, if you calculate num_sgs and then force it to 0 anyway, the calculation was unnecessary, right? > > Comments below. I'll skip much of the patch because it looks pretty > > good. > > Good, I guess we're finally converting on something acceptable to all parties > :-) Definitely. BTW, in the future, could you put your patches in the body of the emails instead of making them attachments? That's how we normally do things on this mailing list; it makes replying and commenting on patches easier. > +static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count) > +{ > + struct usb_dev_state *ps = usbm->ps; > + unsigned long flags; > + > + spin_lock_irqsave(>lock, flags); > + --*count; > + if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) { > + list_del_init(>memlist); This can be list_del(), since you're about to deallocate usbm. > + spin_unlock_irqrestore(>lock, flags); > + > + usb_free_coherent(ps->dev, usbm->size, usbm->mem, > + usbm->dma_handle); > + usbfs_decrease_memory_usage( > + usbm->size + sizeof(struct usb_memory)); > + kfree(usbm); > + } else { > + spin_unlock_irqrestore(>lock, flags); > + } > +} ... > +static struct usb_memory * > +find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb) > +{ > + struct usb_memory *usbm = NULL, *iter; > + unsigned long flags; > + unsigned long uurb_start = (unsigned long)uurb->buffer; > + > + spin_lock_irqsave(>lock, flags); > + list_for_each_entry(iter, >memory_list, memlist) { > + if (uurb_start >= iter->vm_start && > + uurb_start < iter->vm_start + iter->size) { > + if (uurb->buffer_length > iter->vm_start + iter->size - > + uurb_start) { > + usbm = ERR_PTR(-EINVAL); > + } else { > + usbm = iter; > + usbm->urb_use_count++; > + break; > + } The "break" belongs here, not where it is. > + } > + } > + spin_unlock_irqrestore(>lock, flags); > + return usbm; > +} > + > static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb > *uurb, > struct usbdevfs_iso_packet_desc __user *iso_frame_desc, > void __user *arg) > @@ -1372,9 +1525,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, > struct usbdevfs_urb *uurb > uurb->type = USBDEVFS_URB_TYPE_INTERRUPT; > goto interrupt_urb; > } > - num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE); > - if (num_sgs == 1 || num_sgs > ps->dev->bus->sg_tablesize) > - num_sgs = 0; > + /* do not use SG buffers when memory mapped segments > + * are in use > + */ > + if (as->usbm) { > + num_sgs = DIV_ROUND_UP(uurb->buffer_length, > + USB_SG_SIZE); > + if (num_sgs == 1 || > + num_sgs > ps->dev->bus->sg_tablesize) { > + num_sgs = 0;
Re: Infrastructure for zerocopy I/O
On Thu, Dec 03, 2015 at 04:31:35PM -0500, Alan Stern wrote: >> Seemingly controller is already a pointer, so the & is redundant. No idea why >> it didn't crash plain out. Fixing that, I can allocate, although I have some >> other bug causing an oops somewhere else. > Ah, an easy mistake to miss. But you should have gotten a warning from > the compiler about incompatible pointer types. I did. And I missed it, because a) things were scrolling fast, and b) I'm used to coding with -Werror, where you simply cannot ignore this kind of thing by accident :-) (Not to mention that I usually code C++, where this is plain out an error, not a warning.) /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Fri, Dec 04, 2015 at 01:29:05AM +0100, Steinar H. Gunderson wrote: > I must be doing something wrong, because I don't seem to get any frames > from my video capture, and when I close the application, I get a slowpath > warning: OK, the slowpath warning is a red herring -- it's just because I do the free under a spinlock, and I can fix that. There's still something that causes me to not get any frames, though. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Fri, 4 Dec 2015, Steinar H. Gunderson wrote: > On Fri, Dec 04, 2015 at 01:29:05AM +0100, Steinar H. Gunderson wrote: > > I must be doing something wrong, because I don't seem to get any frames > > from my video capture, and when I close the application, I get a slowpath > > warning: > > OK, the slowpath warning is a red herring -- it's just because I do the free > under a spinlock, and I can fix that. > > There's still something that causes me to not get any frames, though. What does usbmon show? (See Documentation/usb/usbmon.txt.) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Thu, 3 Dec 2015, Steinar H. Gunderson wrote: > On Wed, Dec 02, 2015 at 10:54:52AM -0500, Alan Stern wrote: > >> [ 28.796244] DMAR: Allocating domain for 2-2 failed > > I don't know what the reason is for that. It may be that your kernel > > isn't configured to allocate as much coherent memory as you are asking > > for. We'll have to investigate further, maybe ask somebody who knows > > more about how the DMA subsystem works. > > I'm thinking; maybe should the memory not be allocated against the USB > device, but against the controller it hangs on? (Note that it complains about > allocating the _domain_, not the memory itself.) Do you know if that's > possible from this point in the code? Ah, I see the problem. In usb_alloc_dev(), we set the new USB device's dma_mask equal to the controller's dma_mask, but we don't set the other DMA parameters -- in particular, we don't set the coherent_dma_mask. In fact, it looks like the best approach is to use usb_alloc_coherent() and usb_free_coherent(). Unfortunately this routine doesn't zero out the memory buffer, so you'll have to keep the memset(). > FWIW, I tried against >dev->bus->controller, and it didn't give me any > error messages, but still returns NULL. I'm at a loss. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Thu, 3 Dec 2015, Steinar H. Gunderson wrote: > On Thu, Dec 03, 2015 at 10:37:27AM -0500, Alan Stern wrote: > >> FWIW, I tried against >dev->bus->controller, and it didn't give me any > >> error messages, but still returns NULL. > > I'm at a loss. > > Seemingly controller is already a pointer, so the & is redundant. No idea why > it didn't crash plain out. Fixing that, I can allocate, although I have some > other bug causing an oops somewhere else. Ah, an easy mistake to miss. But you should have gotten a warning from the compiler about incompatible pointer types. > But you still think usb_alloc_coherent() is the better way to go? Should be > easy enough to change, just let me know. Yes, definitely you should use usb_alloc_coherent(). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Thu, Dec 03, 2015 at 10:37:27AM -0500, Alan Stern wrote: >> FWIW, I tried against >dev->bus->controller, and it didn't give me any >> error messages, but still returns NULL. > I'm at a loss. Seemingly controller is already a pointer, so the & is redundant. No idea why it didn't crash plain out. Fixing that, I can allocate, although I have some other bug causing an oops somewhere else. But you still think usb_alloc_coherent() is the better way to go? Should be easy enough to change, just let me know. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Thu, Dec 03, 2015 at 10:18:20PM +0100, Steinar H. Gunderson wrote: > But you still think usb_alloc_coherent() is the better way to go? Should be > easy enough to change, just let me know. I must be doing something wrong, because I don't seem to get any frames from my video capture, and when I close the application, I get a slowpath warning: [ 188.781392] [ cut here ] [ 188.781409] WARNING: CPU: 0 PID: 1747 at include/asm-generic/dma-mapping-common.h:274 hcd_buffer_free+0xe9/0x130 [usbcore]() [ 188.781422] Modules linked in: ctr ccm joydev nls_utf8 nls_cp437 vfat fat ext2 arc4 intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm_intel snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support kvm crct10dif_pclmul crc32_pclmul crc32c_intel jitterentropy_rng sha256_generic hmac drbg evdev aesni_intel psmouse aes_x86_64 iwlmvm glue_helper lrw serio_raw gf128mul ablk_helper pcspkr cryptd mac80211 efivars i2c_i801 sg thinkpad_acpi lpc_ich iwlwifi nvram snd_hda_intel rtsx_pci_ms memstick snd_hda_codec snd_seq cfg80211 snd_hwdep mei_me snd_seq_device snd_hda_core i915 snd_pcm mei snd_timer wmi i2c_algo_bit drm_kms_helper snd ac drm rfkill tpm_tis tpm battery i2c_core soundcore video button processor autofs4 ext4 crc16 mbcache jbd2 dm_mod sd_mod [ 188.781526] rtsx_pci_sdmmc mmc_core ahci libahci libata scsi_mod ehci_pci ehci_hcd rtsx_pci mfd_core xhci_pci xhci_hcd e1000e ptp usbcore pps_core usb_common thermal [ 188.781550] CPU: 0 PID: 1747 Comm: QXcbEventReader Tainted: GW 4.3.0-usb+ #34 [ 188.781554] Hardware name: LENOVO 20ALCT01WW/20ALCT01WW, BIOS GIET83WW (2.33 ) 08/25/2015 [ 188.781557] c003bb40 812a5113 81068a2c [ 188.781563] 4000 8802340c1098 8189c400 8802345d8400 [ 188.781570] 0282 c002e279 88020c968000 ffeb [ 188.781578] Call Trace: [ 188.781584] [] ? dump_stack+0x40/0x5d [ 188.781596] [] ? warn_slowpath_common+0x7c/0xb0 [ 188.781613] [] ? hcd_buffer_free+0xe9/0x130 [usbcore] [ 188.781630] [] ? dec_usb_memory_use_count+0x56/0x80 [usbcore] [ 188.781647] [] ? free_async+0xa2/0xf0 [usbcore] [ 188.781666] [] ? usbdev_release+0xde/0x130 [usbcore] [ 188.781680] [] ? __fput+0xc5/0x1d0 [ 188.781694] [] ? task_work_run+0x6f/0x90 [ 188.781711] [] ? do_exit+0x36e/0xa80 [ 188.781724] [] ? check_preempt_curr+0x74/0x80 [ 188.781737] [] ? ttwu_do_wakeup+0xf/0xc0 [ 188.781753] [] ? try_to_wake_up+0x3e/0x330 [ 188.781767] [] ? do_group_exit+0x34/0xb0 [ 188.781783] [] ? get_signal+0x297/0x680 [ 188.781799] [] ? do_signal+0x1e/0x670 [ 188.781812] [] ? wake_up_q+0x60/0x60 [ 188.781828] [] ? vfs_write+0x13e/0x180 [ 188.781834] [] ? prepare_exit_to_usermode+0xa3/0xf0 [ 188.781840] [] ? int_ret_from_sys_call+0x25/0x8f [ 188.781844] ---[ end trace 8fd705c87c4a8887 ]--- [ 188.781850] [ cut here ] I'm attaching the current patch for reference. I know I have comments of yours to attend to, but I'd like to try to figure out what's wrong before I embark on polish. /* Steinar */ -- Homepage: https://www.sesse.net/ >From e00443ca99bf5bb4871a7714ce244b63b35e8f83 Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. Add a new interface for userspace to preallocate memory that can be used with usbfs. This gives two primary benefits: - Zerocopy; data no longer needs to be copied between the userspace and the kernel, but can instead be read directly by the driver from userspace's buffers. This works for both bulk and isochronous transfers; isochronous also no longer need to memset() the buffer to zero to avoid leaking kernel data. - Once the buffers are allocated, USB transfers can no longer fail due to memory fragmentation; previously, long-running programs could run into problems finding a large enough contiguous memory chunk, especially on embedded systems or at high rates. Memory is allocated by using mmap() against the usbfs file descriptor, and similarly deallocated by munmap(). Once memory has been allocated, using it as pointers to a bulk or isochronous operation means you will automatically get zerocopy behavior. Note that this also means you cannot modify outgoing data until the transfer is complete. The same holds for data on the same cache lines as incoming data; DMA modifying them at the same time could lead to your changes being overwritten. Largely based on a patch by Markus Rechberger with some updates. The original patch can be found at: http://sundtek.de/support/devio_mmap_v0.4.diff Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 201 --- 1 file changed,
Re: Infrastructure for zerocopy I/O
On Tue, 1 Dec 2015, Steinar H. Gunderson wrote: > >> + usbm->mem = mem; > >> + usbm->offset = virt_to_phys(mem); > >> + usbm->size = size; > >> + usbm->ps = ps; > >> + spin_lock_irqsave(>lock, flags); > >> + list_add_tail(>memlist, >memory_list); > >> + spin_unlock_irqrestore(>lock, flags); > >> + > >> + return usbm; > >> +} > > This subroutine should be merged into usbdev_mmap(). > > Hm. Can you elaborate a bit on why? I thought it was nice to have it as a > counterpoint to dec_use_count(), the deallocation function, and reasonably > short. Well, it's partly a matter of taste. But there's also a race: This code adds usbm to ps->memory_list (making it available to URB submissions running on other CPUs) before usbm has been completely initialized (vm_start isn't set yet). > > In fact, all the memory-oriented routines should be located together > > in the source file. It's confusing to put some of them near the start > > and others near the end. > > Is this about usbdev_mmap, or about usbdev_vm_{open,close}? I looked at the > former as a syscall function and thus somehow better suited closer to e.g. > the ioctl function. find_memory_area() can't be moved too far down since it's > used from other places, unless you want me to do a forward declaration? find_memory_area() is used in only one place. You can put everything together if you move usbdev_mmap() and associated routines up near the start of the file, after usbdev_read(). > >> + uurb_start >= iter->vm_start && > >> + uurb->buffer_length <= iter->vm_start - uurb_start + > >> + (PAGE_SIZE << get_order(iter->size))) { > > Shouldn't this be: > > > > uurb->start >= iter->vm_start && > > uurb->start < iter->vm_start + iter->size && > > uurb->buffer_length <= iter->vm_start + iter->size - > > uurb->start) { > > I think both will work (modulo maybe issues with the PAGE_SIZE part?), > but yours is clearer. Changed. (I assume you meant uurb_start, not > uurb->start.) (Yes, I did.) On further thought, testing uurb_start is sufficient. If uurb->buffer_length then turns out to be too big, the function should return an error (or rather, an ERR_PTR) and the URB submission should fail. > >>as->userurb = arg; > >> - if (is_in && uurb->buffer_length > 0) > >> + if (is_in && uurb->buffer_length > 0 && usbm == NULL) > > I think you should keep this even when usbm is not NULL. No? > > No, then processcompl() will try to copy data there, which defeats the point > of zerocopy. What other use would the userbuffer member have? My mistake; for some reason I was thinking of as->urb->transfer_buffer when I read as->userbuffer. > >>as->userbuffer = uurb->buffer; > >>else > >>as->userbuffer = NULL; > > When you use dma_zalloc_coherent(), you have to tell the USB core that > > the buffer is already mapped for DMA by setting the URB_NO_TRANSFER_DMA_MAP > > flag in urb->transfer_flags and by storing the DMA address in > > urb->transfer_dma. > > Done. I have one question, though: Should I offset the transfer_dma address > to match with the offset in the buffer, or is this just a handle? (I've > assumed the latter, but I'm not at all sure it's correct.) No, you do have to offset transfer_dma_address to match. > >> + if (remap_pfn_range(vma, vma->vm_start, > >> + virt_to_phys(usbm->mem) >> PAGE_SHIFT, > >> + size, vma->vm_page_prot) < 0) > >> + return -EAGAIN; > > As Oliver pointed out, the memory needs to be reclaimed and accounted > > for on the failure pathways. > > Done. (I saw you used a plural; is there more than this one?) Well, alloc_urb_memory() also can fail. > New patch attached. However, it's not working; I'm getting several of these > messages: > > [ 28.796244] DMAR: Allocating domain for 2-2 failed I don't know what the reason is for that. It may be that your kernel isn't configured to allocate as much coherent memory as you are asking for. We'll have to investigate further, maybe ask somebody who knows more about how the DMA subsystem works. --- > This is essentially a patch by Markus Rechberger with some updates. > The original can be found at > > http://sundtek.de/support/devio_mmap_v0.4.diff > > This version has the following changes: > > - Rebased against a newer kernel (with some conflicts fixed). > - Fixed all checkpatch violations and some style issues. > - Fixes an issue where isochronous transfers would not really be > zero-copy, but go through a pointless memcpy from one area to > itself. > - Ask for cached memory instead of uncached. Actually, with dma_zalloc_coherent() you really are asking for uncached memory (on systems where it makes a difference). > - Drop the custom ioctl; use mmap only for allocation. > > Signed-off-by: Steinar H.
Re: Infrastructure for zerocopy I/O
On Wed, Dec 02, 2015 at 10:54:52AM -0500, Alan Stern wrote: >> [ 28.796244] DMAR: Allocating domain for 2-2 failed > I don't know what the reason is for that. It may be that your kernel > isn't configured to allocate as much coherent memory as you are asking > for. We'll have to investigate further, maybe ask somebody who knows > more about how the DMA subsystem works. I'm thinking; maybe should the memory not be allocated against the USB device, but against the controller it hangs on? (Note that it complains about allocating the _domain_, not the memory itself.) Do you know if that's possible from this point in the code? /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Thu, Dec 03, 2015 at 12:51:14AM +0100, Steinar H. Gunderson wrote: > I'm thinking; maybe should the memory not be allocated against the USB > device, but against the controller it hangs on? (Note that it complains about > allocating the _domain_, not the memory itself.) Do you know if that's > possible from this point in the code? FWIW, I tried against >dev->bus->controller, and it didn't give me any error messages, but still returns NULL. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 30, 2015 at 12:14:59PM -0500, Alan Stern wrote: > As I recall, Markus's original patch took care of this by checking to > see whether the transfer buffer was in one of the mmap'ed areas. If it > was then the transfer would be zerocopy; otherwise it would be normal. > That seems like a reasonable approach. OK. So that's basically preserved in this version. >> Clarification appreciated ;-) > Detailed notes below. Thanks for the review. >> +struct usb_memory { >> +struct list_head memlist; >> +int vma_use_count; >> +int usb_use_count; >> +u32 offset; > What's the reason for the "offset" member? It doesn't seem to do > anything. No longer needed; removed. >> +u32 size; >> +void *mem; > You'll need to store the DMA address as well. Done. >> +mem = alloc_pages_exact(size, GFP_KERNEL | GFP_DMA32); > dma_zalloc_coherent(), not alloc_pages_exact(). Done. >> +memset(mem, 0x0, PAGE_SIZE << get_order(size)); > Then this line won't be needed. Removed. >> +usbm->mem = mem; >> +usbm->offset = virt_to_phys(mem); >> +usbm->size = size; >> +usbm->ps = ps; >> +spin_lock_irqsave(>lock, flags); >> +list_add_tail(>memlist, >memory_list); >> +spin_unlock_irqrestore(>lock, flags); >> + >> +return usbm; >> +} > This subroutine should be merged into usbdev_mmap(). Hm. Can you elaborate a bit on why? I thought it was nice to have it as a counterpoint to dec_use_count(), the deallocation function, and reasonably short. > In fact, all the memory-oriented routines should be located together > in the source file. It's confusing to put some of them near the start > and others near the end. Is this about usbdev_mmap, or about usbdev_vm_{open,close}? I looked at the former as a syscall function and thus somehow better suited closer to e.g. the ioctl function. find_memory_area() can't be moved too far down since it's used from other places, unless you want me to do a forward declaration? > Forget about the WARN_ON; you know all the callers because this patch > introduces them. If you prefer, instead of a pointer pass an > enumeration value: USB_MEMORY_URB_COUNT or USB_MEMORY_VMA_COUNT. Removed. (It was mostly meant as documentation to the programmer.) > Also, you might change the name to make it a little less generic. For > example, dec_usb_memory_use_count(). Done. >> +spin_lock_irqsave(>lock, flags); >> +list_for_each_entry(iter, >memory_list, memlist) { >> +if (iter->usb_use_count == 0 && > I don't think this is necessary. It should be legal to keep the data > for two URBs in the same memory region, so long as they don't overlap. Removed. I haven't added an overlap check, but I guess this should be the caller's problem? >> +uurb_start >= iter->vm_start && >> +uurb->buffer_length <= iter->vm_start - uurb_start + >> +(PAGE_SIZE << get_order(iter->size))) { > Shouldn't this be: > > uurb->start >= iter->vm_start && > uurb->start < iter->vm_start + iter->size && > uurb->buffer_length <= iter->vm_start + iter->size - > uurb->start) { I think both will work (modulo maybe issues with the PAGE_SIZE part?), but yours is clearer. Changed. (I assume you meant uurb_start, not uurb->start.) >> +spin_lock_irqsave(>lock, flags); >> +list_for_each_entry(usbm_iter, >memory_list, memlist) { >> +if (usbm_iter->mem == as->urb->transfer_buffer) { >> +usbm = usbm_iter; >> +break; >> +} >> +} >> +spin_unlock_irqrestore(>lock, flags); > There's no need to do this. Just store the usbm pointer in the as > structure. Done. >> +list_for_each_safe(p, q, >memory_list) { >> +tmp = list_entry(p, struct usb_memory, memlist); >> +list_del(p); >> +if (tmp->mem) >> +free_pages_exact(tmp->mem, tmp->size); >> +kfree(tmp); > No, you can't do this here. The memory might still be in use by the > VMA. You have to do: > > list_del(>memory_list); > > perhaps with a comment explaining when the usb_memory structures and > the memory buffers get freed (i.e., when the use counters go to 0). Done. >> +/* do not use SG buffers when memory mapped segments >> + * are allocated >> + */ >> +if (list_empty(>memory_list)) { > No, first call find_memory_area(). If the result is NULL then do this. Done. >> +as->ps = ps; >> + > Why move this? Isn't it fine where it is? Reverted. (It was part of some other cleanup that I tried, I think.) >> as->userurb = arg; >> -if (is_in && uurb->buffer_length > 0) >> +if (is_in && uurb->buffer_length > 0 && usbm == NULL) > I think you should keep this even when usbm is not NULL. No? No, then processcompl() will try to copy data there,
Re: Infrastructure for zerocopy I/O
On Mon, 2015-11-30 at 12:14 -0500, Alan Stern wrote: > > +static struct usb_memory * > > +find_memory_area(struct usb_dev_state *ps, const struct > usbdevfs_urb *uurb) > > +{ > > + struct usb_memory *usbm = NULL, *iter; > > + unsigned long flags; > > + unsigned long uurb_start = (unsigned long)uurb->buffer; > > + > > + spin_lock_irqsave(>lock, flags); > > + list_for_each_entry(iter, >memory_list, memlist) { > > + if (iter->usb_use_count == 0 && > > I don't think this is necessary. It should be legal to keep the data > for two URBs in the same memory region, so long as they don't overlap. Hi, they also must not share cache lines. How do you guarantee that in this case? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, 30 Nov 2015, Oliver Neukum wrote: > On Mon, 2015-11-30 at 12:14 -0500, Alan Stern wrote: > > > +static struct usb_memory * > > > +find_memory_area(struct usb_dev_state *ps, const struct > > usbdevfs_urb *uurb) > > > +{ > > > + struct usb_memory *usbm = NULL, *iter; > > > + unsigned long flags; > > > + unsigned long uurb_start = (unsigned long)uurb->buffer; > > > + > > > + spin_lock_irqsave(>lock, flags); > > > + list_for_each_entry(iter, >memory_list, memlist) { > > > + if (iter->usb_use_count == 0 && > > > > I don't think this is necessary. It should be legal to keep the data > > for two URBs in the same memory region, so long as they don't overlap. > > Hi, > > they also must not share cache lines. How do you guarantee that in this > case? I can't guarantee it -- that's the responsibility of the user program. If it wants to put two transfer buffers in the same memory region, it should know about the potential problems. Besides, if we use dma_zalloc_coherent() to allocate the memory then the problem doesn't arise. Coherent memory can safely be accessed by both the device and the CPU at any time, by definition, without regard for caching. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Fri, 2015-11-27 at 00:04 +0100, Steinar H. Gunderson wrote: > On Thu, Nov 26, 2015 at 01:26:32AM +0100, Steinar H. Gunderson wrote: > >> There are numerous smaller issues: The allocated memory needs to be > >> charged against usbfs_memory_usage > > I'll fix this. > > Fixed in updated patch (attached). > > > I'll fix this, too. > > Also fixed. > > Now also checkpatch clean. > > /* Steinar */ +static int usbdev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct usb_memory *usbm = NULL; + struct usb_dev_state *ps = file->private_data; + size_t size = vma->vm_end - vma->vm_start; + int ret; + + ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory)); + if (ret) + return ret; + + usbm = alloc_usb_memory(ps, size); + if (!usbm) + return -ENOMEM; Looks like an accounting leak in the error case. + + if (remap_pfn_range(vma, vma->vm_start, + virt_to_phys(usbm->mem) >> PAGE_SHIFT, + size, vma->vm_page_prot) < 0) + return -EAGAIN; The same leak and a memory leak. HTH Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Thu, 26 Nov 2015, Steinar H. Gunderson wrote: > On Wed, Nov 25, 2015 at 10:29:53AM -0500, Alan Stern wrote: > > I want to see a modified version of your patch. Several things need to > > be changed or fixed, but the major change needs to be the way memory is > > allocated. It should be done as part of the mmap system call, not as a > > separate ioctl. > > I took a stab at this. The attached patch doesn't fix any of your other > issues, but could you take a look at whether the mmap part looks sane? > I've verified that it works in practice with my own code, and see no > performance regressions. It is still the case that you can't mix mmap > and non-mmap transfers on the same device; I suppose that should be governed > by a zerocopy flag instead of “are there any mmap-ed areas”. As I recall, Markus's original patch took care of this by checking to see whether the transfer buffer was in one of the mmap'ed areas. If it was then the transfer would be zerocopy; otherwise it would be normal. That seems like a reasonable approach. > Let me go through your other issues: > > > There are numerous smaller issues: The allocated memory needs to be > > charged against usbfs_memory_usage > > I'll fix this. > > > the memory should be allocated using dma_zalloc_coherent instead of > > kmalloc, > > dma_zalloc_coherent returns a dma_handle in addition to the memory itself; > should we just throw this away? No, the handle has to be stored for use when an URB is submitted and when the memory is deallocated -- it is a required argument for dma_free_coherent(). > > proc_do_submiturb should > > check that the buffer starts anywhere within the allocated memory > > rather than just at the beginning > > I'll fix this, too. > > > , and so on. > > Clarification appreciated ;-) Detailed notes below. > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > index 38ae877c..9a1a7d6 100644 > --- a/drivers/usb/core/devio.c > +++ b/drivers/usb/core/devio.c > @@ -69,6 +69,7 @@ struct usb_dev_state { > spinlock_t lock;/* protects the async urb lists */ > struct list_head async_pending; > struct list_head async_completed; > + struct list_head memory_list; > wait_queue_head_t wait; /* wake up if a request completed */ > unsigned int discsignr; > struct pid *disc_pid; > @@ -96,6 +97,17 @@ struct async { > u8 bulk_status; > }; > > +struct usb_memory { > + struct list_head memlist; > + int vma_use_count; > + int usb_use_count; > + u32 offset; What's the reason for the "offset" member? It doesn't seem to do anything. > + u32 size; > + void *mem; You'll need to store the DMA address as well. > + unsigned long vm_start; > + struct usb_dev_state *ps; > +}; > + > static bool usbfs_snoop; > module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR); > MODULE_PARM_DESC(usbfs_snoop, "true to log all usbfs traffic"); > @@ -157,6 +169,74 @@ static int connected(struct usb_dev_state *ps) > ps->dev->state != USB_STATE_NOTATTACHED); > } > > +static struct usb_memory * > +alloc_usb_memory(struct usb_dev_state *ps, size_t size) > +{ > + struct usb_memory *usbm; > + void *mem; > + unsigned long flags; > + > + mem = alloc_pages_exact(size, GFP_KERNEL | GFP_DMA32); dma_zalloc_coherent(), not alloc_pages_exact(). > + if (!mem) > + return NULL; > + > + usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL); > + if (!usbm) { > + free_pages_exact(mem, size); > + return NULL; > + } > + memset(mem, 0x0, PAGE_SIZE << get_order(size)); Then this line won't be needed. > + usbm->mem = mem; > + usbm->offset = virt_to_phys(mem); > + usbm->size = size; > + usbm->ps = ps; > + spin_lock_irqsave(>lock, flags); > + list_add_tail(>memlist, >memory_list); > + spin_unlock_irqrestore(>lock, flags); > + > + return usbm; > +} This subroutine should be merged into usbdev_mmap(). In fact, all the memory-oriented routines should be located together in the source file. It's confusing to put some of them near the start and others near the end. > + > +static void dec_use_count(struct usb_memory *usbm, int *count) > +{ > + struct usb_dev_state *ps = usbm->ps; > + unsigned long flags; > + > + spin_lock_irqsave(>lock, flags); > + WARN_ON(count != >usb_use_count && count != >vma_use_count); > + --*count; > + if (usbm->usb_use_count == 0 && usbm->vma_use_count == 0) { Forget about the WARN_ON; you know all the callers because this patch introduces them. If you prefer, instead of a pointer pass an enumeration value: USB_MEMORY_URB_COUNT or USB_MEMORY_VMA_COUNT. Also, you might change the name to make it a little less generic. For example, dec_usb_memory_use_count(). > + list_del_init(>memlist); > + free_pages_exact(usbm->mem, usbm->size); > +
Re: Infrastructure for zerocopy I/O
On Thu, Nov 26, 2015 at 01:26:32AM +0100, Steinar H. Gunderson wrote: >> There are numerous smaller issues: The allocated memory needs to be >> charged against usbfs_memory_usage > I'll fix this. Fixed in updated patch (attached). > I'll fix this, too. Also fixed. Now also checkpatch clean. /* Steinar */ -- Homepage: https://www.sesse.net/ >From 558466c34b1cbb8f12f3c56042558fdce2e7f87e Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. This is essentially a patch by Markus Rechberger with some updates. The original can be found at http://sundtek.de/support/devio_mmap_v0.4.diff This version has the following changes: - Rebased against a newer kernel (with some conflicts fixed). - Fixed all checkpatch violations and some style issues. - Fixes an issue where isochronous transfers would not really be zero-copy, but go through a pointless memcpy from one area to itself. - Ask for cached memory instead of uncached. - Drop the custom ioctl; use mmap only for allocation. Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 203 --- 1 file changed, 193 insertions(+), 10 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 38ae877c..9a1a7d6 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -69,6 +69,7 @@ struct usb_dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -96,6 +97,17 @@ struct async { u8 bulk_status; }; +struct usb_memory { + struct list_head memlist; + int vma_use_count; + int usb_use_count; + u32 offset; + u32 size; + void *mem; + unsigned long vm_start; + struct usb_dev_state *ps; +}; + static bool usbfs_snoop; module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(usbfs_snoop, "true to log all usbfs traffic"); @@ -157,6 +169,74 @@ static int connected(struct usb_dev_state *ps) ps->dev->state != USB_STATE_NOTATTACHED); } +static struct usb_memory * +alloc_usb_memory(struct usb_dev_state *ps, size_t size) +{ + struct usb_memory *usbm; + void *mem; + unsigned long flags; + + mem = alloc_pages_exact(size, GFP_KERNEL | GFP_DMA32); + if (!mem) + return NULL; + + usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL); + if (!usbm) { + free_pages_exact(mem, size); + return NULL; + } + memset(mem, 0x0, PAGE_SIZE << get_order(size)); + usbm->mem = mem; + usbm->offset = virt_to_phys(mem); + usbm->size = size; + usbm->ps = ps; + spin_lock_irqsave(>lock, flags); + list_add_tail(>memlist, >memory_list); + spin_unlock_irqrestore(>lock, flags); + + return usbm; +} + +static void dec_use_count(struct usb_memory *usbm, int *count) +{ + struct usb_dev_state *ps = usbm->ps; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + WARN_ON(count != >usb_use_count && count != >vma_use_count); + --*count; + if (usbm->usb_use_count == 0 && usbm->vma_use_count == 0) { + list_del_init(>memlist); + free_pages_exact(usbm->mem, usbm->size); + usbfs_decrease_memory_usage( + usbm->size + sizeof(struct usb_memory)); + kfree(usbm); + } + spin_unlock_irqrestore(>lock, flags); +} + +static struct usb_memory * +find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb) +{ + struct usb_memory *usbm = NULL, *iter; + unsigned long flags; + unsigned long uurb_start = (unsigned long)uurb->buffer; + + spin_lock_irqsave(>lock, flags); + list_for_each_entry(iter, >memory_list, memlist) { + if (iter->usb_use_count == 0 && + uurb_start >= iter->vm_start && + uurb->buffer_length <= iter->vm_start - uurb_start + +(PAGE_SIZE << get_order(iter->size))) { + usbm = iter; + usbm->usb_use_count++; + break; + } + } + spin_unlock_irqrestore(>lock, flags); + return usbm; +} + static loff_t usbdev_lseek(struct file *file, loff_t offset, int orig) { loff_t ret; @@ -288,6 +368,9 @@ static struct async *alloc_async(unsigned int numisoframes) static void free_async(struct async *as) { + struct usb_memory *usbm = NULL, *usbm_iter; + unsigned long flags; + struct usb_dev_state *ps = as->ps; int i; put_pid(as->pid); @@ -297,8 +380,22 @@ static void free_async(struct async *as) if (sg_page(>urb->sg[i])) kfree(sg_virt(>urb->sg[i])); } + + spin_lock_irqsave(>lock, flags); + list_for_each_entry(usbm_iter, >memory_list, memlist) { + if (usbm_iter->mem == as->urb->transfer_buffer) { + usbm = usbm_iter; + break; + } + } + spin_unlock_irqrestore(>lock, flags); + kfree(as->urb->sg); - kfree(as->urb->transfer_buffer); + if (usbm == NULL) + kfree(as->urb->transfer_buffer); + else +
Re: Infrastructure for zerocopy I/O
On Wed, 25 Nov 2015, Markus Rechberger wrote: > Seems like it's getting stuck again, since we're entering devices with > a higher bandwidth need now it seems to be necessary to really fix it > in the kernel. > > I propose add it as it is, it's proven that it works and there are 2 > testcases out there right now and it decreases the CPU load on intel > Atoms as well. > Otherwise it might take ages until things are getting forward here. > Apple is just running fine for ages and they're not struggling on > mailinglists about it. Aside of that the Mac USB 2.0 through USB 3.0 > support also works rock solid while linux is just a shame in that area > .. it never worked and even totally crashes some systems. > While that linux issue is known for a long time it just seems like > it's accepted already. > Either come up with a solution or just add this patch so we can make use of > it! I want to see a modified version of your patch. Several things need to be changed or fixed, but the major change needs to be the way memory is allocated. It should be done as part of the mmap system call, not as a separate ioctl. There are numerous smaller issues: The allocated memory needs to be charged against usbfs_memory_usage, the memory should be allocated using dma_zalloc_coherent instead of kmalloc, proc_do_submiturb should check that the buffer starts anywhere within the allocated memory rather than just at the beginning, and so on. I'd make these changes myself, but I just don't have enough free time now. If anybody else would like to work on them, please feel free. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Wed, Nov 25, 2015 at 10:29:53AM -0500, Alan Stern wrote: > I want to see a modified version of your patch. Several things need to > be changed or fixed, but the major change needs to be the way memory is > allocated. It should be done as part of the mmap system call, not as a > separate ioctl. > > There are numerous smaller issues: The allocated memory needs to be > charged against usbfs_memory_usage, the memory should be allocated > using dma_zalloc_coherent instead of kmalloc, proc_do_submiturb should > check that the buffer starts anywhere within the allocated memory > rather than just at the beginning, and so on. > > I'd make these changes myself, but I just don't have enough free time > now. If anybody else would like to work on them, please feel free. I can take a stab at fixing some of these. Perhaps not all of them, but maybe it will be a useful starting point. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Wed, Nov 25, 2015 at 10:29:53AM -0500, Alan Stern wrote: > I want to see a modified version of your patch. Several things need to > be changed or fixed, but the major change needs to be the way memory is > allocated. It should be done as part of the mmap system call, not as a > separate ioctl. I took a stab at this. The attached patch doesn't fix any of your other issues, but could you take a look at whether the mmap part looks sane? I've verified that it works in practice with my own code, and see no performance regressions. It is still the case that you can't mix mmap and non-mmap transfers on the same device; I suppose that should be governed by a zerocopy flag instead of “are there any mmap-ed areas”. Let me go through your other issues: > There are numerous smaller issues: The allocated memory needs to be > charged against usbfs_memory_usage I'll fix this. > the memory should be allocated using dma_zalloc_coherent instead of > kmalloc, dma_zalloc_coherent returns a dma_handle in addition to the memory itself; should we just throw this away? > proc_do_submiturb should > check that the buffer starts anywhere within the allocated memory > rather than just at the beginning I'll fix this, too. > , and so on. Clarification appreciated ;-) /* Steinar */ -- Homepage: https://www.sesse.net/ >From 2bb0f7b91d41de0cbd1e8984612252d101fbb2ca Mon Sep 17 00:00:00 2001 From: "Steinar H. Gunderson"Date: Thu, 26 Nov 2015 01:19:13 +0100 Subject: [PATCH] Add support for usbfs zerocopy. This is essentially a patch by Markus Rechberger with some updates. The original can be found at http://sundtek.de/support/devio_mmap_v0.4.diff This version has the following changes: - Rebased against a newer kernel (with some conflicts fixed). - Fixed most checkpatch violations (some remain) and some style issues. - Fixes an issue where isochronous transfers would not really be zero-copy, but go through a pointless memcpy from one area to itself. - Ask for cached memory instead of uncached. - Drop the custom ioctl; use mmap only for allocation. Signed-off-by: Steinar H. Gunderson Signed-off-by: Markus Rechberger --- drivers/usb/core/devio.c | 188 +++ 1 file changed, 175 insertions(+), 13 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 38ae877c..a17263d56 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -69,6 +69,7 @@ struct usb_dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -96,6 +97,17 @@ struct async { u8 bulk_status; }; +struct usb_memory { + struct list_head memlist; + int vma_use_count; + int usb_use_count; + u32 offset; + u32 size; + void *mem; + unsigned long vm_start; + struct usb_dev_state *ps; +}; + static bool usbfs_snoop; module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(usbfs_snoop, "true to log all usbfs traffic"); @@ -157,6 +169,22 @@ static int connected(struct usb_dev_state *ps) ps->dev->state != USB_STATE_NOTATTACHED); } +static void dec_use_count(struct usb_memory *usbm, int *count) +{ + struct usb_dev_state *ps = usbm->ps; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + WARN_ON(count != >usb_use_count && count != >vma_use_count); + --*count; + if (usbm->usb_use_count == 0 && usbm->vma_use_count == 0) { + list_del_init(>memlist); + free_pages_exact(usbm->mem, usbm->size); + kfree(usbm); + } + spin_unlock_irqrestore(>lock, flags); +} + static loff_t usbdev_lseek(struct file *file, loff_t offset, int orig) { loff_t ret; @@ -288,6 +316,9 @@ static struct async *alloc_async(unsigned int numisoframes) static void free_async(struct async *as) { + struct usb_memory *usbm = NULL, *usbm_iter; + unsigned long flags; + struct usb_dev_state *ps = as->ps; int i; put_pid(as->pid); @@ -297,8 +328,22 @@ static void free_async(struct async *as) if (sg_page(>urb->sg[i])) kfree(sg_virt(>urb->sg[i])); } + + spin_lock_irqsave(>lock, flags); + list_for_each_entry(usbm_iter, >memory_list, memlist) { + if (usbm_iter->mem == as->urb->transfer_buffer) { + usbm = usbm_iter; + break; + } + } + spin_unlock_irqrestore(>lock, flags); + kfree(as->urb->sg); - kfree(as->urb->transfer_buffer); + if (usbm == NULL) + kfree(as->urb->transfer_buffer); + else + dec_use_count(usbm, >usb_use_count); + kfree(as->urb->setup_packet); usb_free_urb(as->urb); usbfs_decrease_memory_usage(as->mem_usage); @@ -910,6 +955,7 @@ static int usbdev_open(struct inode *inode, struct file *file) INIT_LIST_HEAD(>list); INIT_LIST_HEAD(>async_pending); INIT_LIST_HEAD(>async_completed); + INIT_LIST_HEAD(>memory_list);
Re: Infrastructure for zerocopy I/O
Seems like it's getting stuck again, since we're entering devices with a higher bandwidth need now it seems to be necessary to really fix it in the kernel. I propose add it as it is, it's proven that it works and there are 2 testcases out there right now and it decreases the CPU load on intel Atoms as well. Otherwise it might take ages until things are getting forward here. Apple is just running fine for ages and they're not struggling on mailinglists about it. Aside of that the Mac USB 2.0 through USB 3.0 support also works rock solid while linux is just a shame in that area .. it never worked and even totally crashes some systems. While that linux issue is known for a long time it just seems like it's accepted already. Either come up with a solution or just add this patch so we can make use of it! Best Regards Markus On Wed, Nov 18, 2015 at 11:55 PM, Markus Rechbergerwrote: > By the way QNAP NAS systems are shipped with a 64bit kernel but a > 32bit system environment. > Those systems support USB 2.0 and USB 3.0. > > You can expect any kind of combination out there. > > > On Wed, Nov 18, 2015 at 7:23 PM, Bjørn Mork wrote: >> "Steinar H. Gunderson" writes: >>> On Tue, Nov 17, 2015 at 03:16:55PM -0500, Alan Stern wrote: xHCI always uses 64-bit addresses. But many EHCI controllers don't, and only a few of the EHCI platform drivers support 64-bit DMA. >>> >>> OK, sure. But are systems with USB2 only and more than 4GB of RAM common? >> >> Hmpff. They are common in my house at least :) >> >> bjorn@nemi:~$ lspci -nn|grep USB >> 00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #4 [8086:2937] (rev 03) >> 00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #5 [8086:2938] (rev 03) >> 00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #6 [8086:2939] (rev 03) >> 00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 >> EHCI Controller #2 [8086:293c] (rev 03) >> 00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #1 [8086:2934] (rev 03) >> 00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #2 [8086:2935] (rev 03) >> 00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #3 [8086:2936] (rev 03) >> 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 >> EHCI Controller #1 [8086:293a] (rev 03) >> bjorn@nemi:~$ grep MemTotal /proc/meminfo >> MemTotal:8051536 kB >> >> bjorn@canardo:~$ lspci -nn|grep USB >> 00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #4 [8086:2937] (rev 02) >> 00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #5 [8086:2938] (rev 02) >> 00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #6 [8086:2939] (rev 02) >> 00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 >> EHCI Controller #2 [8086:293c] (rev 02) >> 00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #1 [8086:2934] (rev 02) >> 00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #2 [8086:2935] (rev 02) >> 00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB >> UHCI Controller #3 [8086:2936] (rev 02) >> 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 >> EHCI Controller #1 [8086:293a] (rev 02) >> bjorn@canardo:~$ grep MemTotal /proc/meminfo >> MemTotal:8195224 kB >> >> Most systems of that generation can take 8GB RAM, and there isn't really >> any reason not to max that out, is there? >> >>> (And will they not increasingly die out, if nothing else as USB3 becomes >>> commonplace?) >> >> Can you wait 10 years for that to happen, or do you want a solution >> earlier? >> >> >> Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Wed, 18 Nov 2015, Peter Stuge wrote: > Alan Stern wrote: > > if we can directly map the user-provided buffer for DMA. > > Given a memory buffer either kernel or userspace has to apply the > hardware constraints to find out what part if any is usable for DMA. > > I think it's fine to push this task to userspace - as long as > userspace has a way to take care of it. That means userspace must be > able to perform an allocation according to the constraints. I don't > think this is currently possible. Is that correct? As far as I know, it's not possible. In fact, the situation is worse than I thought at first. Memory allocated by userspace generally will not be located in physically contiguous pages. This means that using the memory as an I/O buffer will require scatter-gather. But the USB host controller drivers support scatter-gather only for bulk transfers, not isochronous. Of course, isochronous transfers already involve a weak sort of scatter-gather in the form of usbdevfs_iso_packet_desc lists. But at the driver level, the packets described by these lists must all sit in a single contiguous buffer. In theory a user program can work around this by submitting lots of small isochronous transfers instead of one big transfer. But that wastes resources in the form of excessive system calls. > The kernel could certainly do the allocation and hand the buffer to > userspace. Wiping the buffer once upon allocation is a reasonable > cost, because this allocation should be long-lived. For isochronous especially, this seems like the only way to go. > > On the other hand, adding an API that allows users to allocate low > > memory and then hiding it in the USB subsystem doesn't seem like a > > good idea. > > Unless another generic API already supports the neccessary constraints > then what other options are there? Adding a new generic API instead of hiding it in the USB subsystem. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
"Steinar H. Gunderson"writes: > On Tue, Nov 17, 2015 at 03:16:55PM -0500, Alan Stern wrote: >> xHCI always uses 64-bit addresses. But many EHCI controllers don't, >> and only a few of the EHCI platform drivers support 64-bit DMA. > > OK, sure. But are systems with USB2 only and more than 4GB of RAM common? Hmpff. They are common in my house at least :) bjorn@nemi:~$ lspci -nn|grep USB 00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 [8086:2937] (rev 03) 00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 [8086:2938] (rev 03) 00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 [8086:2939] (rev 03) 00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 03) 00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 03) 00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 [8086:2935] (rev 03) 00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 [8086:2936] (rev 03) 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 03) bjorn@nemi:~$ grep MemTotal /proc/meminfo MemTotal:8051536 kB bjorn@canardo:~$ lspci -nn|grep USB 00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 [8086:2937] (rev 02) 00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 [8086:2938] (rev 02) 00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 [8086:2939] (rev 02) 00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 02) 00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 02) 00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 [8086:2935] (rev 02) 00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 [8086:2936] (rev 02) 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 02) bjorn@canardo:~$ grep MemTotal /proc/meminfo MemTotal:8195224 kB Most systems of that generation can take 8GB RAM, and there isn't really any reason not to max that out, is there? > (And will they not increasingly die out, if nothing else as USB3 becomes > commonplace?) Can you wait 10 years for that to happen, or do you want a solution earlier? Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Wed, Nov 18, 2015 at 3:02 PM, Christoph Hellwigwrote: > On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote: >> If we really want to do zerocopy I/O then we should not use a bounce >> buffer. But the only way to avoid bounce buffers may be to give user >> programs a way of allocating memory pages below 4 GB, because lots of >> USB hardware can only do 32-bit DMA. > > But any system worth it's money these days has an IOMMU. > you're definitely wrong with that routers, smartphones, settopboxes, NAS systems, etc. seems like you're sticking with X86 platform. USB is the most versatile bus system out there you'll find it in the range from embedded microprocessors up to high end systems. You will always have to expect that some system will not or some architecture will never support IOMMU due cut down design limitations out there. >> Is there an API for allocating user memory below 4 GB? Would a new >> MMAP flag be acceptable? > > You'll have to ask the MM folks. But I doubt they will be excited about it. Markus -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote: > If we really want to do zerocopy I/O then we should not use a bounce > buffer. But the only way to avoid bounce buffers may be to give user > programs a way of allocating memory pages below 4 GB, because lots of > USB hardware can only do 32-bit DMA. But any system worth it's money these days has an IOMMU. > Is there an API for allocating user memory below 4 GB? Would a new > MMAP flag be acceptable? You'll have to ask the MM folks. But I doubt they will be excited about it. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Infrastructure for zerocopy I/O
From: Christoph Hellwig > Sent: 18 November 2015 14:02 > On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote: > > If we really want to do zerocopy I/O then we should not use a bounce > > buffer. But the only way to avoid bounce buffers may be to give user > > programs a way of allocating memory pages below 4 GB, because lots of > > USB hardware can only do 32-bit DMA. > > But any system worth it's money these days has an IOMMU. If a system has an iommu, then the cost of updating the iommu itself becomes significant. This is on top of any TLB and cache operations that are required for typical page loaning schemes. One scheme that might work (if you have an open fd into the usb driver) is to do an mmap() request on the usb driver that will make it allocate DMA-able memory and map it into the applications address space. The USB transfer buffers can then be arranged to be in this area. David -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
By the way QNAP NAS systems are shipped with a 64bit kernel but a 32bit system environment. Those systems support USB 2.0 and USB 3.0. You can expect any kind of combination out there. On Wed, Nov 18, 2015 at 7:23 PM, Bjørn Morkwrote: > "Steinar H. Gunderson" writes: >> On Tue, Nov 17, 2015 at 03:16:55PM -0500, Alan Stern wrote: >>> xHCI always uses 64-bit addresses. But many EHCI controllers don't, >>> and only a few of the EHCI platform drivers support 64-bit DMA. >> >> OK, sure. But are systems with USB2 only and more than 4GB of RAM common? > > Hmpff. They are common in my house at least :) > > bjorn@nemi:~$ lspci -nn|grep USB > 00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #4 [8086:2937] (rev 03) > 00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #5 [8086:2938] (rev 03) > 00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #6 [8086:2939] (rev 03) > 00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 > EHCI Controller #2 [8086:293c] (rev 03) > 00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #1 [8086:2934] (rev 03) > 00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #2 [8086:2935] (rev 03) > 00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #3 [8086:2936] (rev 03) > 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 > EHCI Controller #1 [8086:293a] (rev 03) > bjorn@nemi:~$ grep MemTotal /proc/meminfo > MemTotal:8051536 kB > > bjorn@canardo:~$ lspci -nn|grep USB > 00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #4 [8086:2937] (rev 02) > 00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #5 [8086:2938] (rev 02) > 00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #6 [8086:2939] (rev 02) > 00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 > EHCI Controller #2 [8086:293c] (rev 02) > 00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #1 [8086:2934] (rev 02) > 00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #2 [8086:2935] (rev 02) > 00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB > UHCI Controller #3 [8086:2936] (rev 02) > 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 > EHCI Controller #1 [8086:293a] (rev 02) > bjorn@canardo:~$ grep MemTotal /proc/meminfo > MemTotal:8195224 kB > > Most systems of that generation can take 8GB RAM, and there isn't really > any reason not to max that out, is there? > >> (And will they not increasingly die out, if nothing else as USB3 becomes >> commonplace?) > > Can you wait 10 years for that to happen, or do you want a solution > earlier? > > > Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 16, 2015 at 03:22:06PM -0500, Alan Stern wrote: > In other words, you're suggesting we do this: > > Check that userspace requested zerocopy (otherwise the user > program might try to access other data stored in the same cache > lines as the buffer while the I/O is in progres); > > Call get_user_pages (or get_user_pages_fast? -- it's not clear > which should be used) for this buffer; > > Use the array of pages returned by that routine to populate > a scatter-gather list (sg_alloc_table_from_pages); > > Pass that list to dma_map_sg. > > Is that right? Yes. > Does dma_map_sg check the page addresses against the DMA mask and > automatically create a bounce buffer, or do we have to do that > manually? Documentation/DMA-API-HOWTO.txt doesn't discuss this. You need to do this manually. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 2015-11-17 at 13:07 +0100, Steinar H. Gunderson wrote: > On Mon, Nov 16, 2015 at 03:22:06PM -0500, Alan Stern wrote: > > Check that userspace requested zerocopy (otherwise the user > > program might try to access other data stored in the same cache > > lines as the buffer while the I/O is in progres); > > Needed for send only, right? For receive, I guess you're not allowed to > modify the buffer while the transfer is going on anyway. No, send is the harmless case. You just make sure the cache is flushed. For receive the issue is that touching memory next to the memory used as buffer can alter the buffer. The buffer itself is a clear issue. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 17 Nov 2015, Christoph Hellwig wrote: > On Mon, Nov 16, 2015 at 03:22:06PM -0500, Alan Stern wrote: > > In other words, you're suggesting we do this: > > > > Check that userspace requested zerocopy (otherwise the user > > program might try to access other data stored in the same cache > > lines as the buffer while the I/O is in progres); > > > > Call get_user_pages (or get_user_pages_fast? -- it's not clear > > which should be used) for this buffer; > > > > Use the array of pages returned by that routine to populate > > a scatter-gather list (sg_alloc_table_from_pages); > > > > Pass that list to dma_map_sg. > > > > Is that right? > > Yes. > > > Does dma_map_sg check the page addresses against the DMA mask and > > automatically create a bounce buffer, or do we have to do that > > manually? Documentation/DMA-API-HOWTO.txt doesn't discuss this. > > You need to do this manually. I looked through the code. Christoph was wrong about this, at least on systems that support CONFIG_SWIOTLB. Of course, using a bounce buffer kind of defeats the purpose of zerocopy I/O, but I guess sometimes there's no choice. AFAICT this leaves two questions. First, should we worry about systems that don't support SWIOTLB? My feeling is probably not. In fact, the existing DMA mapping code used for ordinary USB communications doesn't try to handle mapping errors by setting up bounce buffers; it assumes that dma_map_sg() takes care of all that. Second, how shall we ask user programs to indicate that they won't access the buffer's memory pages during I/O? My suggestion is that we add a USBDEVFS_URB_ZEROCOPY flag (and a corresponding capability bit). Any objections? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 16, 2015 at 03:22:06PM -0500, Alan Stern wrote: > Check that userspace requested zerocopy (otherwise the user > program might try to access other data stored in the same cache > lines as the buffer while the I/O is in progres); Needed for send only, right? For receive, I guess you're not allowed to modify the buffer while the transfer is going on anyway. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
There are 2 problems with the current implementation: 1. the memset on isochronous transfers to empty the buffers in order to avoid leaking raw memory to userspace (this costs a lot on intel Atoms and is also noticeable on other systems). 2. the memory fragmentation. Seems like recent systems have a better performance here since we did not get that report for several months now, or maybe the user behavior changed. Some older Linux systems (maybe 2-3 years old) triggered this issue way more often. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems (eg. some older MIPS systems) where copying data is more expensive. I would not count on IOMMU in that case because several systems which should take benefit of a change in that area simply do not support IOMMU. You can search support.sundtek.com for reports about allocation issues you'll find quite a few. Best Regards, Markus On Tue, Nov 17, 2015 at 6:02 PM, Alan Sternwrote: > On Tue, 17 Nov 2015, Christoph Hellwig wrote: > >> On Mon, Nov 16, 2015 at 03:22:06PM -0500, Alan Stern wrote: >> > In other words, you're suggesting we do this: >> > >> > Check that userspace requested zerocopy (otherwise the user >> > program might try to access other data stored in the same cache >> > lines as the buffer while the I/O is in progres); >> > >> > Call get_user_pages (or get_user_pages_fast? -- it's not clear >> > which should be used) for this buffer; >> > >> > Use the array of pages returned by that routine to populate >> > a scatter-gather list (sg_alloc_table_from_pages); >> > >> > Pass that list to dma_map_sg. >> > >> > Is that right? >> >> Yes. >> >> > Does dma_map_sg check the page addresses against the DMA mask and >> > automatically create a bounce buffer, or do we have to do that >> > manually? Documentation/DMA-API-HOWTO.txt doesn't discuss this. >> >> You need to do this manually. > > I looked through the code. Christoph was wrong about this, at least on > systems that support CONFIG_SWIOTLB. Of course, using a bounce buffer > kind of defeats the purpose of zerocopy I/O, but I guess sometimes > there's no choice. > > AFAICT this leaves two questions. First, should we worry about systems > that don't support SWIOTLB? My feeling is probably not. In fact, the > existing DMA mapping code used for ordinary USB communications doesn't > try to handle mapping errors by setting up bounce buffers; it assumes > that dma_map_sg() takes care of all that. > > Second, how shall we ask user programs to indicate that they won't > access the buffer's memory pages during I/O? My suggestion is that we > add a USBDEVFS_URB_ZEROCOPY flag (and a corresponding capability bit). > Any objections? > > Alan Stern > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Nov 17, 2015 at 07:17:31PM +0100, Markus Rechberger wrote: > 1. the memset on isochronous transfers to empty the buffers in order > to avoid leaking raw memory to userspace (this costs a lot on intel > Atoms and is also noticeable on other systems). > > 2. the memory fragmentation. Seems like recent systems have a better > performance here since we did not get that report for several months > now, or maybe the user behavior changed. > Some older Linux systems (maybe 2-3 years old) triggered this issue > way more often. I guess if we get transparent zerocopy, both of these are going away just like with your patch, right? The only difference is really who sets up the memory area (the kernel or not). Alan, could we perhaps let the zerocopy flag make the request fail (instead of going through a bounce buffer) if direct DMA is not possible? That way, it would be quite obvious that you need to allocate the memory some other way instead of silently hitting the issues Markus mention. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 17 Nov 2015, Markus Rechberger wrote: > There are 2 problems with the current implementation: > > 1. the memset on isochronous transfers to empty the buffers in order > to avoid leaking raw memory to userspace (this costs a lot on intel > Atoms and is also noticeable on other systems). It won't be necessary if we can directly map the user-provided buffer for DMA. But if we have to use a bounce buffer, we have to do something about it. In theory we could avoid the memset by only copying back to userspace the memory that was written to. > 2. the memory fragmentation. Seems like recent systems have a better > performance here since we did not get that report for several months > now, or maybe the user behavior changed. > Some older Linux systems (maybe 2-3 years old) triggered this issue > way more often. Again, this is a problem only if a bounce buffer is needed. On the other hand, bounce buffers might be common on systems with more than 4 GB of memory, if the host controller can only do 32-bit DMA. On the other hand, adding an API that allows users to allocate low memory and then hiding it in the USB subsystem doesn't seem like a good idea. > The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook > The CPU usage decreases 6-8% on an Intel Atom n270 when > transferring 20mbyte/sec (isochronous), it should be more interesting to > see those > statistics on embedded systems (eg. some older MIPS systems) where > copying data is more expensive. > > I would not count on IOMMU in that case because several systems which > should take benefit of a change in that area simply do not support > IOMMU. If those systems don't have more than 4 GB of memory then it won't matter. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
Just a side note, for older videodevices it was also common to allocate the DMA memory during bootup because it might have failed during runtime because of memory fragmentation. The reason for the memset on the isochronous transfer is that USB might not fill up the entire buffer leaving half of the isochronous buffer with raw data. On bulk transfers the memset is not needed. So only mapping the urbs to userspace is no solution either. Such kind of security is a little bit odd for a stack that doesn't work nearly as well as Mac or Windows. On userspace side many distributions grant normal users raw access to USB devices as well nowadays. On Tue, Nov 17, 2015 at 7:42 PM, Steinar H. Gundersonwrote: > On Tue, Nov 17, 2015 at 07:17:31PM +0100, Markus Rechberger wrote: >> 1. the memset on isochronous transfers to empty the buffers in order >> to avoid leaking raw memory to userspace (this costs a lot on intel >> Atoms and is also noticeable on other systems). >> >> 2. the memory fragmentation. Seems like recent systems have a better >> performance here since we did not get that report for several months >> now, or maybe the user behavior changed. >> Some older Linux systems (maybe 2-3 years old) triggered this issue >> way more often. > > I guess if we get transparent zerocopy, both of these are going away > just like with your patch, right? The only difference is really who sets up > the memory area (the kernel or not). > > Alan, could we perhaps let the zerocopy flag make the request fail (instead > of going through a bounce buffer) if direct DMA is not possible? That way, > it would be quite obvious that you need to allocate the memory some other way > instead of silently hitting the issues Markus mention. > > /* Steinar */ > -- > Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
Hi, Markus Rechbergerwrites: > Such kind of security is a little bit odd for a stack that doesn't > work nearly as well as Mac or Windows. And you say this based on what, exactly ? Linux supports many more devices, it's usually faster with most devices (actually Mac's is on par with Linux, Windows is much slower) and we supported USB3 before any major OS release had it (I'm not talking about the likes of MCCI who provide their own stack at a premium). What is your concept of better in this case ? -- balbi signature.asc Description: PGP signature
Re: Infrastructure for zerocopy I/O
On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote: > Is there an API for allocating user memory below 4 GB? Would a new > MMAP flag be acceptable? MAP_32BIT (to mmap) can seemingly do this, but from the man page, it sounds more like a kludge than anything else. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 17 Nov 2015, Steinar H. Gunderson wrote: > On Tue, Nov 17, 2015 at 02:13:49PM -0500, Alan Stern wrote: > > But what other way of allocating memory is there? > > For my part, GPU memory versus malloc(). (You can ask OpenGL to permanently > map up a chunk of GPU memory for you into userspace, but you have no > guarantees as of if it's DMA-able. But typical memory from malloc() might.) I don't think there's any reason to expect malloc to provide memory where you need it. Also, since the memory it provides isn't locked, the kernel can move it to any physical address. > It might be overengineering things, though. I'd be fairly happy if I only had > zerocopy in most common situations. (Does xHCI have this 32-bit limitation?) xHCI always uses 64-bit addresses. But many EHCI controllers don't, and only a few of the EHCI platform drivers support 64-bit DMA. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
By the way on MacOSX we also pre-allocate some buffer in userspace first for everything. Well it's up to you anyway I haven't seen a move into a good direction in that area for years anyway on Linux so I'm used to deal with what is currently available. On Tue, Nov 17, 2015 at 8:13 PM, Alan Sternwrote: > On Tue, 17 Nov 2015, Steinar H. Gunderson wrote: > >> On Tue, Nov 17, 2015 at 07:17:31PM +0100, Markus Rechberger wrote: >> > 1. the memset on isochronous transfers to empty the buffers in order >> > to avoid leaking raw memory to userspace (this costs a lot on intel >> > Atoms and is also noticeable on other systems). >> > >> > 2. the memory fragmentation. Seems like recent systems have a better >> > performance here since we did not get that report for several months >> > now, or maybe the user behavior changed. >> > Some older Linux systems (maybe 2-3 years old) triggered this issue >> > way more often. >> >> I guess if we get transparent zerocopy, both of these are going away >> just like with your patch, right? The only difference is really who sets up >> the memory area (the kernel or not). >> >> Alan, could we perhaps let the zerocopy flag make the request fail (instead >> of going through a bounce buffer) if direct DMA is not possible? That way, >> it would be quite obvious that you need to allocate the memory some other way >> instead of silently hitting the issues Markus mention. > > But what other way of allocating memory is there? > > With scatter-gather lists, fragmentation isn't an issue. But bounce > buffers are unavoidable if the memory isn't accessible to the USB > hardware. > > Alan Stern > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Nov 17, 2015 at 02:13:49PM -0500, Alan Stern wrote: > But what other way of allocating memory is there? For my part, GPU memory versus malloc(). (You can ask OpenGL to permanently map up a chunk of GPU memory for you into userspace, but you have no guarantees as of if it's DMA-able. But typical memory from malloc() might.) It might be overengineering things, though. I'd be fairly happy if I only had zerocopy in most common situations. (Does xHCI have this 32-bit limitation?) /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
Alan Stern wrote: > if we can directly map the user-provided buffer for DMA. Given a memory buffer either kernel or userspace has to apply the hardware constraints to find out what part if any is usable for DMA. I think it's fine to push this task to userspace - as long as userspace has a way to take care of it. That means userspace must be able to perform an allocation according to the constraints. I don't think this is currently possible. Is that correct? The kernel could certainly do the allocation and hand the buffer to userspace. Wiping the buffer once upon allocation is a reasonable cost, because this allocation should be long-lived. > On the other hand, adding an API that allows users to allocate low > memory and then hiding it in the USB subsystem doesn't seem like a > good idea. Unless another generic API already supports the neccessary constraints then what other options are there? //Peter -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 2015-11-17 at 14:13 -0500, Alan Stern wrote: > On Tue, 17 Nov 2015, Steinar H. Gunderson wrote: > > > Alan, could we perhaps let the zerocopy flag make the request fail (instead > > of going through a bounce buffer) if direct DMA is not possible? That way, > > it would be quite obvious that you need to allocate the memory some other > > way > > instead of silently hitting the issues Markus mention. > > But what other way of allocating memory is there? mmap() if you mean in user space. In kernel space there is GFP_DMA32. Thus in the kernel the problem is solved if you disregard controllers that can do only less than 32 bit (We could fall back to GFP_DMA, but the buffers would have to be tiny) So the issue is getting the buffer to the user. mmap() can do that. > With scatter-gather lists, fragmentation isn't an issue. But bounce > buffers are unavoidable if the memory isn't accessible to the USB > hardware. True and therefore therefore you cannot do the allocation before you tell the kernel what you need the memory for. Hence it seems to me that if you want to do it without an IOMMU, mmap() is the only generic option. It implicitly tells the kernel what the memory is for by being tied to a device. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 17 Nov 2015, Steinar H. Gunderson wrote: > On Tue, Nov 17, 2015 at 07:17:31PM +0100, Markus Rechberger wrote: > > 1. the memset on isochronous transfers to empty the buffers in order > > to avoid leaking raw memory to userspace (this costs a lot on intel > > Atoms and is also noticeable on other systems). > > > > 2. the memory fragmentation. Seems like recent systems have a better > > performance here since we did not get that report for several months > > now, or maybe the user behavior changed. > > Some older Linux systems (maybe 2-3 years old) triggered this issue > > way more often. > > I guess if we get transparent zerocopy, both of these are going away > just like with your patch, right? The only difference is really who sets up > the memory area (the kernel or not). > > Alan, could we perhaps let the zerocopy flag make the request fail (instead > of going through a bounce buffer) if direct DMA is not possible? That way, > it would be quite obvious that you need to allocate the memory some other way > instead of silently hitting the issues Markus mention. But what other way of allocating memory is there? With scatter-gather lists, fragmentation isn't an issue. But bounce buffers are unavoidable if the memory isn't accessible to the USB hardware. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 17 Nov 2015, Steinar H. Gunderson wrote: > On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote: > > Is there an API for allocating user memory below 4 GB? Would a new > > MMAP flag be acceptable? > > MAP_32BIT (to mmap) can seemingly do this, but from the man page, it sounds > more like a kludge than anything else. No, that's not what I'm talking about. MAP_32BIT means that the new memory's _virtual_ address should be below 2 GB. I'm talking about the _physical_ address. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, 17 Nov 2015, Steinar H. Gunderson wrote: > > xHCI always uses 64-bit addresses. But many EHCI controllers don't, > > and only a few of the EHCI platform drivers support 64-bit DMA. > > OK, sure. But are systems with USB2 only and more than 4GB of RAM common? > (And will they not increasingly die out, if nothing else as USB3 becomes > commonplace?) I think they're still reasonably common. For example, the computer I'm writing this email on has an xHCI controller, but that controller is connected only to the SuperSpeed bus. The High/Full/Low speed bus is connected to an EHCI controller. And the computer has 8 GB of memory. Over time such things will die out in the desktop world. I'm not so sure about the mobile or embedded worlds, though. Anyway, right now we would like to support as much as possible. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, 16 Nov 2015, Christoph Hellwig wrote: > On Mon, Nov 16, 2015 at 08:03:12PM +0100, Steinar H. Gunderson wrote: > > The original use case: DVB capture on embedded devices. > > > > My use case: USB3 uncompressed HD video capture (from multiple cards). > > > > Both of these hit (beyond the speed boost from zerocopy) the problem that > > as time goes by, memory gets more fragmented and usbfs fails allocation. > > Allocating memory up-front solves that. > > As said I think you should just use get_user_pages() on user memory, > and bounce buffer if it doesn't fit the DMA mask. > > Thіs also allows the user to use hugetlbs if they need large contiguous > allocations for performance reasons. If we really want to do zerocopy I/O then we should not use a bounce buffer. But the only way to avoid bounce buffers may be to give user programs a way of allocating memory pages below 4 GB, because lots of USB hardware can only do 32-bit DMA. Is there an API for allocating user memory below 4 GB? Would a new MMAP flag be acceptable? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Nov 17, 2015 at 03:01:39PM -0500, Alan Stern wrote: >> On Tue, Nov 17, 2015 at 02:07:58PM -0500, Alan Stern wrote: >>> Is there an API for allocating user memory below 4 GB? Would a new >>> MMAP flag be acceptable? >> MAP_32BIT (to mmap) can seemingly do this, but from the man page, it sounds >> more like a kludge than anything else. > No, that's not what I'm talking about. MAP_32BIT means that the new > memory's _virtual_ address should be below 2 GB. I'm talking about the > _physical_ address. Sorry, I misunderstood the documentation. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Tue, Nov 17, 2015 at 03:16:55PM -0500, Alan Stern wrote: >>> But what other way of allocating memory is there? >> For my part, GPU memory versus malloc(). (You can ask OpenGL to permanently >> map up a chunk of GPU memory for you into userspace, but you have no >> guarantees as of if it's DMA-able. But typical memory from malloc() might.) > I don't think there's any reason to expect malloc to provide memory > where you need it. Also, since the memory it provides isn't locked, > the kernel can move it to any physical address. Well, fair enough, given that it can be moved. But for a system with <4GB RAM, I would be pretty sure. (In my hypothetical situation, my priority list would be 1. Zerocopy to GPU memory and process using compute shaders, 2. Zerocopy to CPU memory and process using AVX2, 3. Non-zerocopy to GPU memory. But in reality, I'm probably too lazy to maintain two code paths.) > xHCI always uses 64-bit addresses. But many EHCI controllers don't, > and only a few of the EHCI platform drivers support 64-bit DMA. OK, sure. But are systems with USB2 only and more than 4GB of RAM common? (And will they not increasingly die out, if nothing else as USB3 becomes commonplace?) /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Infrastructure for zerocopy I/O
Christoph: Can you help answer some questions about implementing zerocopy I/O? We would like to add zerocopy support for USB devices. Basically this means: Allocating memory pages that match the device's DMA mask, for use as I/O buffers, and locking them so their physical addresses don't change (and they don't get paged out); Mapping those pages into a user process; Constructing scatter-gather lists to describe the I/O if the pages aren't contiguous. I don't have much experience in this area. Can you point to good examples where these things are done or documents describing what is involved? For example, are the allocating and mapping steps normally done separately or are they normally combined in an mmap(2) system call? A proposed patch has been posted (http://marc.info/?l=linux-usb=144763502829452=2), but I'm not convinced that it is the best approach. For instance, it always tries to get contiguous pages and so is vulnerable to memory fragmentation. Thanks, Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 16, 2015 at 01:29:58PM -0500, Alan Stern wrote: > A proposed patch has been posted > (http://marc.info/?l=linux-usb=144763502829452=2), but I'm not > convinced that it is the best approach. For instance, it always tries > to get contiguous pages and so is vulnerable to memory fragmentation. What Alan says is right, but there's one additional piece of information: The patch also makes allocation an up-front issue (the memory block allocated can be reused across transfers), so if you can allocate at the start of the program, you will not be affected by later fragmentation. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 16, 2015 at 07:55:45PM +0100, Christoph Hellwig wrote: >> A proposed patch has been posted >> (http://marc.info/?l=linux-usb=144763502829452=2), but I'm not >> convinced that it is the best approach. For instance, it always tries >> to get contiguous pages and so is vulnerable to memory fragmentation. > This looks totally crazy to me. What's the use case for it? The original use case: DVB capture on embedded devices. My use case: USB3 uncompressed HD video capture (from multiple cards). Both of these hit (beyond the speed boost from zerocopy) the problem that as time goes by, memory gets more fragmented and usbfs fails allocation. Allocating memory up-front solves that. /* Steinar */ -- Homepage: https://www.sesse.net/ -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 16, 2015 at 01:29:58PM -0500, Alan Stern wrote: > Allocating memory pages that match the device's DMA mask, for > use as I/O buffers, and locking them so their physical > addresses don't change (and they don't get paged out); > > Mapping those pages into a user process; > > Constructing scatter-gather lists to describe the I/O if the > pages aren't contiguous. > > I don't have much experience in this area. Can you point to good > examples where these things are done or documents describing what is > involved? For example, are the allocating and mapping steps normally > done separately or are they normally combined in an mmap(2) system > call? The answer is probably not what you want to here, but the recommendation is to not bother. Just use the iommu for this one reasonable hardware, or swiotlb for bounce buffering if your systems has more addressable memory than what the hardware can address. This is what we've been doing for block I/O since day 1. > A proposed patch has been posted > (http://marc.info/?l=linux-usb=144763502829452=2), but I'm not > convinced that it is the best approach. For instance, it always tries > to get contiguous pages and so is vulnerable to memory fragmentation. This looks totally crazy to me. What's the use case for it? -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, Nov 16, 2015 at 08:03:12PM +0100, Steinar H. Gunderson wrote: > The original use case: DVB capture on embedded devices. > > My use case: USB3 uncompressed HD video capture (from multiple cards). > > Both of these hit (beyond the speed boost from zerocopy) the problem that > as time goes by, memory gets more fragmented and usbfs fails allocation. > Allocating memory up-front solves that. As said I think you should just use get_user_pages() on user memory, and bounce buffer if it doesn't fit the DMA mask. Thіs also allows the user to use hugetlbs if they need large contiguous allocations for performance reasons. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infrastructure for zerocopy I/O
On Mon, 16 Nov 2015, Christoph Hellwig wrote: > On Mon, Nov 16, 2015 at 08:03:12PM +0100, Steinar H. Gunderson wrote: > > The original use case: DVB capture on embedded devices. > > > > My use case: USB3 uncompressed HD video capture (from multiple cards). > > > > Both of these hit (beyond the speed boost from zerocopy) the problem that > > as time goes by, memory gets more fragmented and usbfs fails allocation. > > Allocating memory up-front solves that. > > As said I think you should just use get_user_pages() on user memory, > and bounce buffer if it doesn't fit the DMA mask. > > Thіs also allows the user to use hugetlbs if they need large contiguous > allocations for performance reasons. In other words, you're suggesting we do this: Check that userspace requested zerocopy (otherwise the user program might try to access other data stored in the same cache lines as the buffer while the I/O is in progres); Call get_user_pages (or get_user_pages_fast? -- it's not clear which should be used) for this buffer; Use the array of pages returned by that routine to populate a scatter-gather list (sg_alloc_table_from_pages); Pass that list to dma_map_sg. Is that right? Does dma_map_sg check the page addresses against the DMA mask and automatically create a bounce buffer, or do we have to do that manually? Documentation/DMA-API-HOWTO.txt doesn't discuss this. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html