Re: [PATCH] memory mapping for usbfs (v0.4)
On Sat, Oct 5, 2013 at 4:34 AM, Alan Stern st...@rowland.harvard.edu wrote: On Fri, 4 Oct 2013, Markus Rechberger wrote: I was only testing reading the data so I didn't see any caching effects since I don't have a device or driver which I can send a lot data out. As far as I understand pgprot_noncached would only be required when sending data (writing data to the DMA'able buffer). No, applciation still may read obsolete data from the mapped buffer even after transfer from device is complete without any synchronization. This question is still open. The buffer should be cached. The userspace program will have to make sure that it doesn't try to access the buffer while DMA is in progress. As long as that restriction is obeyed, the USB core will take care of mapping the buffer for DMA (which flushes the cache). No, HCD mapping/unmapping can't deal with the problem since they use the kernel direct-mapped virtual address of the buffer to flush cache, but applications use the mapped virtual address, and CPU can use both the two virtual addresse to cache data, so it is probable that the transfer buffer can be cached in more than one locations by CPU for VIVT or VIPT-alias cache. So Markus takes the simplest way which uses nocahced mapping, but it may degrade performance on some ARCHs since it is reported that it is extremely slow to access non-cached memory on some ARMv7 SoCs. As I understand it, you wrote this in order to solve problems where memory couldn't be allocated for USB transfers. If the system is running short on memory, won't your USBDEVFS_ALLOC_MEMORY ioctl also fail? No, the buffers will be kept in a list and re-used (aside of that they are also kept in a list in devio.c and re-used). When the buffers are allocated initially there's no chance to fail this process during run-time. Why not? What if the memory is already almost full? The call to alloc_pages_exact() could fail. I agree this is less likely than a failure with the current system, but it is still possible. With CONFIG_COMPACTION enabled, it is hard to trigger the allocation failure, and actually on some ARCHs, size of kernel stack or page directory table for creating each process is 16KB. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sat, 5 Oct 2013, Ming Lei wrote: The buffer should be cached. The userspace program will have to make sure that it doesn't try to access the buffer while DMA is in progress. As long as that restriction is obeyed, the USB core will take care of mapping the buffer for DMA (which flushes the cache). No, HCD mapping/unmapping can't deal with the problem since they use the kernel direct-mapped virtual address of the buffer to flush cache, but applications use the mapped virtual address, and CPU can use both the two virtual addresse to cache data, so it is probable that the transfer buffer can be cached in more than one locations by CPU for VIVT or VIPT-alias cache. So Markus takes the simplest way which uses nocahced mapping, but it may degrade performance on some ARCHs since it is reported that it is extremely slow to access non-cached memory on some ARMv7 SoCs. Then how do you suggest the cache be flushed? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sat, Oct 5, 2013 at 11:10 PM, Alan Stern st...@rowland.harvard.edu wrote: On Sat, 5 Oct 2013, Ming Lei wrote: The buffer should be cached. The userspace program will have to make sure that it doesn't try to access the buffer while DMA is in progress. As long as that restriction is obeyed, the USB core will take care of mapping the buffer for DMA (which flushes the cache). No, HCD mapping/unmapping can't deal with the problem since they use the kernel direct-mapped virtual address of the buffer to flush cache, but applications use the mapped virtual address, and CPU can use both the two virtual addresse to cache data, so it is probable that the transfer buffer can be cached in more than one locations by CPU for VIVT or VIPT-alias cache. So Markus takes the simplest way which uses nocahced mapping, but it may degrade performance on some ARCHs since it is reported that it is extremely slow to access non-cached memory on some ARMv7 SoCs. Then how do you suggest the cache be flushed? flush_cache_range() may be OK, and suggest to refer to Documentation/cachetlb.txt first. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Mon, Sep 30, 2013 at 5:12 PM, Markus Rechberger mrechber...@gmail.com wrote: On Mon, Sep 30, 2013 at 4:59 PM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 30 Sep 2013, Markus Rechberger wrote: to explain why Isochronous makes such a difference, the kernel driver doesn't do the memset anymore for each urb packet. However that patch addresses multiple issues * Isochronous improvement by removing memset for each packet * Pre-Allocation to prevent allocation failures during runtime. * Allow to restore the original behavior for 15k buffers (I'd Why do you use 15 KB as the boundary line? Wouldn't it make more sense to use 16 KB (that is, four pages)? it is 16k yes. recommend to make further decisions about forcing SG transfers over Bulk transfers once manufacturers deliver low-end devices which include both possibilities) here we have seen the latency issue with certain chipsets when buffers are too small I don't know the behavior/latency of SG transfers unfortunately and there's no way to test it on the particular target system - other systems do not trigger that problem. In general, an SG transfer has slightly higher time and memory requirements than a non-SG transfer. It's not a big difference. But with non-SG, the memory is allocated as a single, big buffer, whereas with SG the memory is allocated as a bunch of smaller buffers. Therefore, when memory is fragmented (which will happen on a system with a small amount of memory), SG transfers are more likely to succeed than non-SG. For example, you are more likely to be able to allocate eight 16-KB memory regions (SG) than a single 128-KB region (non-SG). I'm just worried about that single case where timing of the USB frames seem to matter, it worked with 50k buffers. The 16k transfer size buffer seems to stop the transfer randomly after some time needing to toggle a register to re-enable the data again (the transfer length is reported to be 0 the packets themself still arrive on the host). I'm not against the SG transfers here but I'd rather like to keep both possibilities (the old one which was available with Linux 3.5 where there was no 16k limitation) and the SG one, I'd certainly like to test both of them. To make SG transfers foolproof I recommend anyway to use pre-allocated buffers, you saw the bugreports that permanently re-allocating 16k buffers over some time is not reliable. As far as I can see there's still some space for improvements. (eg. pre-allocate large buffers (15k) for USB 3.x otherwise it will end up with allocation failures during runtime on low-end systems - as seen by our customers with USB 2.0 and 15k buffers on low-end systems). With SG and USB 3.0 this problem should happen even quicker than with USB 2.0. Why would low-end systems be using USB-3? Routers for sharing Harddisks/NAS Systems. There are already some available on the market. is this going to be fixed? http://support.sundtek.com/index.php/topic,350.0.html http://support.sundtek.com/index.php/topic,1097.msg7780.html#msg7780 http://support.sundtek.com/index.php/topic,411.msg2153.html#msg2153 http://support.sundtek.com/index.php/topic,531.msg2956.html#msg2956 http://support.sundtek.com/index.php/topic,364.msg1829.html#msg1829 or do you have another solution for that issue on systems with a lower performance? https://developer.apple.com/library/mac/documentation/IOKit/Reference/IOUSBInterfaceInterface192_reference/translated_content/IOUSBInterfaceInterface192.html#//apple_ref/doc/uid/TP40011544-CLSCHIOUSBInterfaceInterface192-DontLinkElementID_2 Apple obviously did it the same way, though that doesn't mean that Linux has to do it that way of course. Markus Markus Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, 29 Sep 2013, Markus Rechberger wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. Usage from userspace: allocation: rv = ioctl(priv-usbfd, USBDEVFS_ALLOC_MEMORY, mem); if (rv == 0) buffer = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, priv-usbfd, mem.offset); use the mapped buffer with urb-buffer. release: rv = munmap(buffer, size); memset(mem, 0x0, sizeof(struct usbdevfs_memory)); mem.buffer = buffer; rv = ioctl(priv-usbfd, USBDEVFS_RELEASE_MEMORY, mem); non freed memory buffers are collected and will be released when closing the node. I tested this patch with Bulk and Isochronous, with and without memory mapping (applications which don't use mmap will just fall back to the legacy mechanism). On the whole this seems reasonable. There are a few stylistic things that could be cleaned up (missing blank lines after variable declarations, for example, and other checkpatch issues), but they are minor. Why do you constantly compute (PAGE_SIZEget_order(usbm-size)) instead of just using usbm-size directly? (And why is the variable sometimes called usbm and sometimes called usbmem?) The biggest problem is that your proc_alloc_memory() routine doesn't call usbfs_increase_memory_usage(). Without that, there's nothing to prevent a user from allocating all the available kernel memory. Version 0.3: * Removed comment * Renamed USBDEVFS_FREE_MEMORY to USBDEVFS_RELEASE_MEMORY * Clearing allocated memory Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK * adding pgprot_noncached to ensure that the IO memory is not cached (thanks to Ming Lei for mentioning that) I don't understand this. Why shouldn't the buffer memory be cached? Won't uncached buffers be a lot slower? As I understand it, you wrote this in order to solve problems where memory couldn't be allocated for USB transfers. If the system is running short on memory, won't your USBDEVFS_ALLOC_MEMORY ioctl also fail? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Fri, Oct 4, 2013 at 8:41 PM, Alan Stern st...@rowland.harvard.edu wrote: On the whole this seems reasonable. There are a few stylistic things that could be cleaned up (missing blank lines after variable declarations, for example, and other checkpatch issues), but they are minor. Why do you constantly compute (PAGE_SIZEget_order(usbm-size)) instead of just using usbm-size directly? (And why is the variable sometimes called usbm and sometimes called usbmem?) ok. The biggest problem is that your proc_alloc_memory() routine doesn't call usbfs_increase_memory_usage(). Without that, there's nothing to prevent a user from allocating all the available kernel memory. only root is supposed to have raw USB access, which limit would you suggest? root could also delete the entire usb harddisk with appropriate commands. Version 0.3: * Removed comment * Renamed USBDEVFS_FREE_MEMORY to USBDEVFS_RELEASE_MEMORY * Clearing allocated memory Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK * adding pgprot_noncached to ensure that the IO memory is not cached (thanks to Ming Lei for mentioning that) I don't understand this. Why shouldn't the buffer memory be cached? Won't uncached buffers be a lot slower? I was only testing reading the data so I didn't see any caching effects since I don't have a device or driver which I can send a lot data out. As far as I understand pgprot_noncached would only be required when sending data (writing data to the DMA'able buffer). This question is still open. As I understand it, you wrote this in order to solve problems where memory couldn't be allocated for USB transfers. If the system is running short on memory, won't your USBDEVFS_ALLOC_MEMORY ioctl also fail? No, the buffers will be kept in a list and re-used (aside of that they are also kept in a list in devio.c and re-used). When the buffers are allocated initially there's no chance to fail this process during run-time. please note the transfer buffer are allocated and freed permanently during run-time with the old mechanism, this mechanism usually fails during run-time on low end systems. Markus -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Fri, 4 Oct 2013, Markus Rechberger wrote: The biggest problem is that your proc_alloc_memory() routine doesn't call usbfs_increase_memory_usage(). Without that, there's nothing to prevent a user from allocating all the available kernel memory. only root is supposed to have raw USB access, Not true at all. Any user may be allowed to access a USB device, depending on the permissions for the usbfs device file. which limit would you suggest? The limit is already in place. All you have to do is call the routine to make sure that you don't exceed the limit. Check the other places where usbfs_increase_memory_usage() is used and you'll see how it works. Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK * adding pgprot_noncached to ensure that the IO memory is not cached (thanks to Ming Lei for mentioning that) I don't understand this. Why shouldn't the buffer memory be cached? Won't uncached buffers be a lot slower? I was only testing reading the data so I didn't see any caching effects since I don't have a device or driver which I can send a lot data out. As far as I understand pgprot_noncached would only be required when sending data (writing data to the DMA'able buffer). This question is still open. The buffer should be cached. The userspace program will have to make sure that it doesn't try to access the buffer while DMA is in progress. As long as that restriction is obeyed, the USB core will take care of mapping the buffer for DMA (which flushes the cache). As I understand it, you wrote this in order to solve problems where memory couldn't be allocated for USB transfers. If the system is running short on memory, won't your USBDEVFS_ALLOC_MEMORY ioctl also fail? No, the buffers will be kept in a list and re-used (aside of that they are also kept in a list in devio.c and re-used). When the buffers are allocated initially there's no chance to fail this process during run-time. Why not? What if the memory is already almost full? The call to alloc_pages_exact() could fail. I agree this is less likely than a failure with the current system, but it is still possible. please note the transfer buffer are allocated and freed permanently during run-time with the old mechanism, this mechanism usually fails during run-time on low end systems. It's true that the existing system allocates and deallocates the buffer for each transfer. With your patch, the buffer is allocated only once. In addition, the same buffer is used in userspace and in the kernel, removing the need to allocate two buffers and copy everything over. This will be a significant saving of both time and space. By the way, have you considered that the user program might want the I/O transfer to start at some position other than the beginning of the buffer? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Fri, Oct 4, 2013 at 10:34 PM, Alan Stern st...@rowland.harvard.edu wrote: On Fri, 4 Oct 2013, Markus Rechberger wrote: The biggest problem is that your proc_alloc_memory() routine doesn't call usbfs_increase_memory_usage(). Without that, there's nothing to prevent a user from allocating all the available kernel memory. only root is supposed to have raw USB access, Not true at all. Any user may be allowed to access a USB device, depending on the permissions for the usbfs device file. well I mean by default if you connect an unknown device only root will usually have access to it. I know there are udev rules for some special devices. which limit would you suggest? The limit is already in place. All you have to do is call the routine to make sure that you don't exceed the limit. Check the other places where usbfs_increase_memory_usage() is used and you'll see how it works. ok Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK * adding pgprot_noncached to ensure that the IO memory is not cached (thanks to Ming Lei for mentioning that) I don't understand this. Why shouldn't the buffer memory be cached? Won't uncached buffers be a lot slower? I was only testing reading the data so I didn't see any caching effects since I don't have a device or driver which I can send a lot data out. As far as I understand pgprot_noncached would only be required when sending data (writing data to the DMA'able buffer). This question is still open. The buffer should be cached. The userspace program will have to make sure that it doesn't try to access the buffer while DMA is in progress. As long as that restriction is obeyed, the USB core will take care of mapping the buffer for DMA (which flushes the cache). ok so it can be removed. As I understand it, you wrote this in order to solve problems where memory couldn't be allocated for USB transfers. If the system is running short on memory, won't your USBDEVFS_ALLOC_MEMORY ioctl also fail? No, the buffers will be kept in a list and re-used (aside of that they are also kept in a list in devio.c and re-used). When the buffers are allocated initially there's no chance to fail this process during run-time. Why not? What if the memory is already almost full? The call to alloc_pages_exact() could fail. I agree this is less likely than a failure with the current system, but it is still possible. if it happens the allocation ioctl returns an error, and the application knows from the beginning on that this is not going to work. But once it's allocated it should be solid. Our driver starts up early as well there has never been a failure at driver startup. I mentioned within another email we also have another kernel stub driver which does the pre-allocation within the module, no issues have been reported when the pre-allocation module was in place. There are discussions on several mailinglists about pre-allocating DMA'able memory when kernel drivers are loaded, that way is not so uncommon.. please note the transfer buffer are allocated and freed permanently during run-time with the old mechanism, this mechanism usually fails during run-time on low end systems. It's true that the existing system allocates and deallocates the buffer for each transfer. With your patch, the buffer is allocated only once. In addition, the same buffer is used in userspace and in the kernel, removing the need to allocate two buffers and copy everything over. This will be a significant saving of both time and space. By the way, have you considered that the user program might want the I/O transfer to start at some position other than the beginning of the buffer? I thought about that but I couldn't find any useful usecase for that. I guess it would be very much device dependent. The URB buffers themself aren't that big either. Just when comparing the MacOSX API it wouldn't support that either. If someone needs it he could still add it. Markus Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Mon, Sep 30, 2013 at 5:51 AM, Marcel Holtmann mar...@holtmann.org wrote: Hi Markus, Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... Well do you have any device which has a userspace driver? Without a device you can barely test it. There's a settopbox company which added the backported patch to Linux 3.2 they use USB cardreaders and even tested the devices with and without mmap support. I doubt that the SG support has any good testing on low end systems, I'm worried that it will introduce those latency issues again which we saw with 15k buffers. There are lots of userspace drivers based on libusb or libusbx. If you post an enhancement for libusbx to make use of your buffer mappings, those are free to adapt their libraries to use that feature (see first post how to use the ioctls). You need to find a reasonable application for those patches, I do not know anyone else using such a heavy bulk or isochronous application on low end systems not even on high end systems (otherwise you'd very likely be able to find more such kernel allocation messages on the internet and not only from us, obviously they all link to our forum or devices). VMWare, Qemu or Xen might be a good target for testing (when passing through windows requests and using a camera in windows). I think the same can be done with any standard Bluetooth controller and using SCO audio data for a headset connection. These also use ISOC transfers and as of today, do not work inside KVM. So if anybody wants to see if these changes actually make a difference, then a Qemu/KVM using them might be a good test case. you might need some other optimizations for that regarding bluetooth. here are some videos from the Intel N270 which deal about USB (Our USB capture device connected to a DVD Player, transferring raw video 720x576 - 20mb/sec): http://sundtek.de/support/devio_legacy_n270.mts (legacy mode, just not using the allocation ioctl and mmap function) ~25% CPU http://sundtek.de/support/devio_mmap_n270.mts (new mmap'ed mode) ~18% CPU 5% of the driver is used by SIMD instructions to do some video filtering, the rest for the USB transfer (Video, Audio, VBI) and parsing the data. Markus -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Mon, Sep 30, 2013 at 1:26 PM, Markus Rechberger mrechber...@gmail.com wrote: On Mon, Sep 30, 2013 at 5:51 AM, Marcel Holtmann mar...@holtmann.org wrote: Hi Markus, Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... Well do you have any device which has a userspace driver? Without a device you can barely test it. There's a settopbox company which added the backported patch to Linux 3.2 they use USB cardreaders and even tested the devices with and without mmap support. I doubt that the SG support has any good testing on low end systems, I'm worried that it will introduce those latency issues again which we saw with 15k buffers. There are lots of userspace drivers based on libusb or libusbx. If you post an enhancement for libusbx to make use of your buffer mappings, those are free to adapt their libraries to use that feature (see first post how to use the ioctls). You need to find a reasonable application for those patches, I do not know anyone else using such a heavy bulk or isochronous application on low end systems not even on high end systems (otherwise you'd very likely be able to find more such kernel allocation messages on the internet and not only from us, obviously they all link to our forum or devices). VMWare, Qemu or Xen might be a good target for testing (when passing through windows requests and using a camera in windows). I think the same can be done with any standard Bluetooth controller and using SCO audio data for a headset connection. These also use ISOC transfers and as of today, do not work inside KVM. So if anybody wants to see if these changes actually make a difference, then a Qemu/KVM using them might be a good test case. you might need some other optimizations for that regarding bluetooth. here are some videos from the Intel N270 which deal about USB (Our USB capture device connected to a DVD Player, transferring raw video 720x576 - 20mb/sec): http://sundtek.de/support/devio_legacy_n270.mts (legacy mode, just not using the allocation ioctl and mmap function) ~25% CPU http://sundtek.de/support/devio_mmap_n270.mts (new mmap'ed mode) ~18% CPU 5% of the driver is used by SIMD instructions to do some video filtering, the rest for the USB transfer (Video, Audio, VBI) and parsing the data. to explain why Isochronous makes such a difference, the kernel driver doesn't do the memset anymore for each urb packet. However that patch addresses multiple issues * Isochronous improvement by removing memset for each packet * Pre-Allocation to prevent allocation failures during runtime. * Allow to restore the original behavior for 15k buffers (I'd recommend to make further decisions about forcing SG transfers over Bulk transfers once manufacturers deliver low-end devices which include both possibilities) here we have seen the latency issue with certain chipsets when buffers are too small I don't know the behavior/latency of SG transfers unfortunately and there's no way to test it on the particular target system - other systems do not trigger that problem. As far as I can see there's still some space for improvements. (eg. pre-allocate large buffers (15k) for USB 3.x otherwise it will end up with allocation failures during runtime on low-end systems - as seen by our customers with USB 2.0 and 15k buffers on low-end systems). With SG and USB 3.0 this problem should happen even quicker than with USB 2.0. Markus -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Mon, 30 Sep 2013, Markus Rechberger wrote: to explain why Isochronous makes such a difference, the kernel driver doesn't do the memset anymore for each urb packet. However that patch addresses multiple issues * Isochronous improvement by removing memset for each packet * Pre-Allocation to prevent allocation failures during runtime. * Allow to restore the original behavior for 15k buffers (I'd Why do you use 15 KB as the boundary line? Wouldn't it make more sense to use 16 KB (that is, four pages)? recommend to make further decisions about forcing SG transfers over Bulk transfers once manufacturers deliver low-end devices which include both possibilities) here we have seen the latency issue with certain chipsets when buffers are too small I don't know the behavior/latency of SG transfers unfortunately and there's no way to test it on the particular target system - other systems do not trigger that problem. In general, an SG transfer has slightly higher time and memory requirements than a non-SG transfer. It's not a big difference. But with non-SG, the memory is allocated as a single, big buffer, whereas with SG the memory is allocated as a bunch of smaller buffers. Therefore, when memory is fragmented (which will happen on a system with a small amount of memory), SG transfers are more likely to succeed than non-SG. For example, you are more likely to be able to allocate eight 16-KB memory regions (SG) than a single 128-KB region (non-SG). As far as I can see there's still some space for improvements. (eg. pre-allocate large buffers (15k) for USB 3.x otherwise it will end up with allocation failures during runtime on low-end systems - as seen by our customers with USB 2.0 and 15k buffers on low-end systems). With SG and USB 3.0 this problem should happen even quicker than with USB 2.0. Why would low-end systems be using USB-3? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Mon, Sep 30, 2013 at 4:59 PM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 30 Sep 2013, Markus Rechberger wrote: to explain why Isochronous makes such a difference, the kernel driver doesn't do the memset anymore for each urb packet. However that patch addresses multiple issues * Isochronous improvement by removing memset for each packet * Pre-Allocation to prevent allocation failures during runtime. * Allow to restore the original behavior for 15k buffers (I'd Why do you use 15 KB as the boundary line? Wouldn't it make more sense to use 16 KB (that is, four pages)? it is 16k yes. recommend to make further decisions about forcing SG transfers over Bulk transfers once manufacturers deliver low-end devices which include both possibilities) here we have seen the latency issue with certain chipsets when buffers are too small I don't know the behavior/latency of SG transfers unfortunately and there's no way to test it on the particular target system - other systems do not trigger that problem. In general, an SG transfer has slightly higher time and memory requirements than a non-SG transfer. It's not a big difference. But with non-SG, the memory is allocated as a single, big buffer, whereas with SG the memory is allocated as a bunch of smaller buffers. Therefore, when memory is fragmented (which will happen on a system with a small amount of memory), SG transfers are more likely to succeed than non-SG. For example, you are more likely to be able to allocate eight 16-KB memory regions (SG) than a single 128-KB region (non-SG). I'm just worried about that single case where timing of the USB frames seem to matter, it worked with 50k buffers. The 16k transfer size buffer seems to stop the transfer randomly after some time needing to toggle a register to re-enable the data again (the transfer length is reported to be 0 the packets themself still arrive on the host). I'm not against the SG transfers here but I'd rather like to keep both possibilities (the old one which was available with Linux 3.5 where there was no 16k limitation) and the SG one, I'd certainly like to test both of them. To make SG transfers foolproof I recommend anyway to use pre-allocated buffers, you saw the bugreports that permanently re-allocating 16k buffers over some time is not reliable. As far as I can see there's still some space for improvements. (eg. pre-allocate large buffers (15k) for USB 3.x otherwise it will end up with allocation failures during runtime on low-end systems - as seen by our customers with USB 2.0 and 15k buffers on low-end systems). With SG and USB 3.0 this problem should happen even quicker than with USB 2.0. Why would low-end systems be using USB-3? Routers for sharing Harddisks/NAS Systems. There are already some available on the market. Markus Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] memory mapping for usbfs (v0.4)
This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. Usage from userspace: allocation: rv = ioctl(priv-usbfd, USBDEVFS_ALLOC_MEMORY, mem); if (rv == 0) buffer = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, priv-usbfd, mem.offset); use the mapped buffer with urb-buffer. release: rv = munmap(buffer, size); memset(mem, 0x0, sizeof(struct usbdevfs_memory)); mem.buffer = buffer; rv = ioctl(priv-usbfd, USBDEVFS_RELEASE_MEMORY, mem); non freed memory buffers are collected and will be released when closing the node. I tested this patch with Bulk and Isochronous, with and without memory mapping (applications which don't use mmap will just fall back to the legacy mechanism). Version 0.3: * Removed comment * Renamed USBDEVFS_FREE_MEMORY to USBDEVFS_RELEASE_MEMORY * Clearing allocated memory Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK * adding pgprot_noncached to ensure that the IO memory is not cached (thanks to Ming Lei for mentioning that) http://sundtek.de/support/devio_mmap_v0.4.diff Signed-off-by: Markus Rechberger patc...@sundtek.com diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 737e3c1..a32fb70 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -69,6 +69,7 @@ struct dev_state { spinlock_t lock;/* protects the async urb lists */ struct list_head async_pending; struct list_head async_completed; + struct list_head memory_list; wait_queue_head_t wait; /* wake up if a request completed */ unsigned int discsignr; struct pid *disc_pid; @@ -96,6 +97,16 @@ struct async { u8 bulk_status; }; +struct usb_memory { + struct list_head memlist; + int vma_use_count; + int usb_use_count; + u32 offset; + u32 size; + void *mem; + unsigned long vm_start; +}; + static bool usbfs_snoop; module_param(usbfs_snoop, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(usbfs_snoop, true to log all usbfs traffic); @@ -288,6 +299,9 @@ static struct async *alloc_async(unsigned int numisoframes) static void free_async(struct async *as) { + struct usb_memory *usbm = NULL, *usbm_iter; + unsigned long flags; + struct dev_state *ps = as-ps; int i; put_pid(as-pid); @@ -297,8 +311,22 @@ static void free_async(struct async *as) if (sg_page(as-urb-sg[i])) kfree(sg_virt(as-urb-sg[i])); } + + spin_lock_irqsave(ps-lock, flags); + list_for_each_entry(usbm_iter, ps-memory_list, memlist) { + if (usbm_iter-mem == as-urb-transfer_buffer) { + usbm = usbm_iter; + break; + } + } + spin_unlock_irqrestore(ps-lock, flags); + kfree(as-urb-sg); - kfree(as-urb-transfer_buffer); + if (usbm == NULL) + kfree(as-urb-transfer_buffer); + else + usbm-usb_use_count--; + kfree(as-urb-setup_packet); usb_free_urb(as-urb); usbfs_decrease_memory_usage(as-mem_usage); @@ -811,6 +839,7 @@ static int usbdev_open(struct inode *inode, struct file *file) INIT_LIST_HEAD(ps-list); INIT_LIST_HEAD(ps-async_pending); INIT_LIST_HEAD(ps-async_completed); + INIT_LIST_HEAD(ps-memory_list); init_waitqueue_head(ps-wait); ps-discsignr = 0; ps-disc_pid = get_pid(task_pid(current)); @@ -839,6 +868,8 @@ static int usbdev_release(struct inode *inode, struct file *file) struct dev_state *ps = file-private_data; struct usb_device *dev = ps-dev; unsigned int ifnum; + struct list_head *p, *q; + struct usb_memory *tmp; struct async *as; usb_lock_device(dev); @@ -863,6 +894,14 @@ static int usbdev_release(struct inode *inode, struct file *file) free_async(as); as = async_getcompleted(ps); } + + list_for_each_safe(p, q, ps-memory_list) { + tmp = list_entry(p, struct usb_memory, memlist); + list_del(p); + if (tmp-mem) + free_pages_exact(tmp-mem, tmp-size); + kfree(tmp); + } kfree(ps); return 0; } @@ -1173,6 +1212,7 @@ static int proc_do_submiturb(struct dev_state *ps, struct usbdevfs_urb *uurb, struct usb_host_endpoint *ep; struct async *as = NULL; struct usb_ctrlrequest *dr = NULL; + struct usb_memory *usbm
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 10:02 PM, Markus Rechberger mrechber...@gmail.com wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. X86 platform should be memory coherent, which means uncached mapping might not affect memory access performance from CPU, so you might need provide performance data on other non-coherent ARCH. Usage from userspace: allocation: rv = ioctl(priv-usbfd, USBDEVFS_ALLOC_MEMORY, mem); if (rv == 0) buffer = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, priv-usbfd, mem.offset); use the mapped buffer with urb-buffer. release: rv = munmap(buffer, size); memset(mem, 0x0, sizeof(struct usbdevfs_memory)); mem.buffer = buffer; rv = ioctl(priv-usbfd, USBDEVFS_RELEASE_MEMORY, mem); non freed memory buffers are collected and will be released when closing the node. I tested this patch with Bulk and Isochronous, with and without memory mapping (applications which don't use mmap will just fall back to the legacy mechanism). Version 0.3: * Removed comment * Renamed USBDEVFS_FREE_MEMORY to USBDEVFS_RELEASE_MEMORY * Clearing allocated memory Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK Why do you want to override SG transfer? * adding pgprot_noncached to ensure that the IO memory is not cached Accessing non-cached mapping on some ARCHs is extremely slow, so I an wondering if it is good way to map the area as non-cached. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 04:02:56PM +0200, Markus Rechberger wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 5:14 PM, Greg KH gre...@linuxfoundation.org wrote: On Sun, Sep 29, 2013 at 04:02:56PM +0200, Markus Rechberger wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. I don't even know another device which is transferring 20mb/sec other than our ones with USBFS. I figured out another company is doing grabber drivers in userspace for industrial cameras but that's also proprietary. Searching the kernel messages with google usually links back to our customers which are using our high speed devices. Markus thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 4:24 PM, Ming Lei tom.leim...@gmail.com wrote: On Sun, Sep 29, 2013 at 10:02 PM, Markus Rechberger mrechber...@gmail.com wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. X86 platform should be memory coherent, which means uncached mapping might not affect memory access performance from CPU, so you might need provide performance data on other non-coherent ARCH. Usage from userspace: allocation: rv = ioctl(priv-usbfd, USBDEVFS_ALLOC_MEMORY, mem); if (rv == 0) buffer = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, priv-usbfd, mem.offset); use the mapped buffer with urb-buffer. release: rv = munmap(buffer, size); memset(mem, 0x0, sizeof(struct usbdevfs_memory)); mem.buffer = buffer; rv = ioctl(priv-usbfd, USBDEVFS_RELEASE_MEMORY, mem); non freed memory buffers are collected and will be released when closing the node. I tested this patch with Bulk and Isochronous, with and without memory mapping (applications which don't use mmap will just fall back to the legacy mechanism). Version 0.3: * Removed comment * Renamed USBDEVFS_FREE_MEMORY to USBDEVFS_RELEASE_MEMORY * Clearing allocated memory Version 0.4: * overriding SG transfers once memory mapped buffers are allocated for BULK Why do you want to override SG transfer? please read up the previous emails to the mailinglist I provided a bunch of links with bugreports. * adding pgprot_noncached to ensure that the IO memory is not cached Accessing non-cached mapping on some ARCHs is extremely slow, so I an wondering if it is good way to map the area as non-cached. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 07:32:56PM +0200, Markus Rechberger wrote: On Sun, Sep 29, 2013 at 5:14 PM, Greg KH gre...@linuxfoundation.org wrote: On Sun, Sep 29, 2013 at 04:02:56PM +0200, Markus Rechberger wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 9:01 PM, Greg KH gre...@linuxfoundation.org wrote: On Sun, Sep 29, 2013 at 07:32:56PM +0200, Markus Rechberger wrote: On Sun, Sep 29, 2013 at 5:14 PM, Greg KH gre...@linuxfoundation.org wrote: On Sun, Sep 29, 2013 at 04:02:56PM +0200, Markus Rechberger wrote: This patch adds memory mapping support to USBFS for isochronous and bulk data transfers, it allows to pre-allocate usb transfer buffers. The CPU usage decreases 1-2% on my 1.3ghz U7300 notebook The CPU usage decreases 6-8% on an Intel Atom n270 when transferring 20mbyte/sec (isochronous), it should be more interesting to see those statistics on embedded systems where copying data is more expensive. Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... Well do you have any device which has a userspace driver? Without a device you can barely test it. There's a settopbox company which added the backported patch to Linux 3.2 they use USB cardreaders and even tested the devices with and without mmap support. I doubt that the SG support has any good testing on low end systems, I'm worried that it will introduce those latency issues again which we saw with 15k buffers. Markus -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, 29 Sep 2013, Markus Rechberger wrote: Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... Well do you have any device which has a userspace driver? Without a device you can barely test it. There's a settopbox company which added the backported patch to Linux 3.2 they use USB cardreaders and even tested the devices with and without mmap support. I doubt that the SG support has any good testing on low end systems, I'm worried that it will introduce those latency issues again which we saw with 15k buffers. There are lots of userspace drivers based on libusb or libusbx. If you post an enhancement for libusbx to make use of your buffer mappings, many people will be able to test it. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
On Sun, Sep 29, 2013 at 9:58 PM, Alan Stern st...@rowland.harvard.edu wrote: On Sun, 29 Sep 2013, Markus Rechberger wrote: Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... Well do you have any device which has a userspace driver? Without a device you can barely test it. There's a settopbox company which added the backported patch to Linux 3.2 they use USB cardreaders and even tested the devices with and without mmap support. I doubt that the SG support has any good testing on low end systems, I'm worried that it will introduce those latency issues again which we saw with 15k buffers. There are lots of userspace drivers based on libusb or libusbx. If you post an enhancement for libusbx to make use of your buffer mappings, those are free to adapt their libraries to use that feature (see first post how to use the ioctls). You need to find a reasonable application for those patches, I do not know anyone else using such a heavy bulk or isochronous application on low end systems not even on high end systems (otherwise you'd very likely be able to find more such kernel allocation messages on the internet and not only from us, obviously they all link to our forum or devices). VMWare, Qemu or Xen might be a good target for testing (when passing through windows requests and using a camera in windows). I'll try to post some videos from the intel atom using our driver with the legacy mode (basically the normal way without mmap) and enhanced one with mmap isochronous and bulk. Markus -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] memory mapping for usbfs (v0.4)
Hi Markus, Do you have a userspace test program that we can use to verify that this does work, and that others can use to run on some different platforms to verify that this is actually faster? You will need one of our devices for testing I guess. Some scanners (which use USBFS) or other low speed devices won't really utilize usbfs too much. I think I could provide some grabber device for testing if you want to. So no test userspace program you can knock up for us? I really hate adding new core functionality to the kernel that I have no way of testing at all, that's a recipe for it quickly breaking... Well do you have any device which has a userspace driver? Without a device you can barely test it. There's a settopbox company which added the backported patch to Linux 3.2 they use USB cardreaders and even tested the devices with and without mmap support. I doubt that the SG support has any good testing on low end systems, I'm worried that it will introduce those latency issues again which we saw with 15k buffers. There are lots of userspace drivers based on libusb or libusbx. If you post an enhancement for libusbx to make use of your buffer mappings, those are free to adapt their libraries to use that feature (see first post how to use the ioctls). You need to find a reasonable application for those patches, I do not know anyone else using such a heavy bulk or isochronous application on low end systems not even on high end systems (otherwise you'd very likely be able to find more such kernel allocation messages on the internet and not only from us, obviously they all link to our forum or devices). VMWare, Qemu or Xen might be a good target for testing (when passing through windows requests and using a camera in windows). I think the same can be done with any standard Bluetooth controller and using SCO audio data for a headset connection. These also use ISOC transfers and as of today, do not work inside KVM. So if anybody wants to see if these changes actually make a difference, then a Qemu/KVM using them might be a good test case. Regards Marcel -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html