MAP_POPULATE does not work with XIP on ramdisk
Hi, I'm testing mmap() performance on a ramdisk. The kernel is 3.19-rc3. The device driver is brd, and the file system is ext2. Normal mmap() does not make sense on a ramdisk because it adds additional memory copy, so XIP is enabled to map the pages directly into application's address space. With XIP, MAP_POPULATE flag does not work. i.e. prefault fails. Basically it fails in vm_normal_page(), where it's supposed to find the struct page from pfn, but the vma has flag VM_MIXEDMAP and the method returns NULL. As I understand, VM_MIXEDMAP means the memory may not contain a struct page backing, so the code logic is reasonable. However brd driver does provide struct page for each memory page. If I modify the __get_user_pages() and let the prefault runs for all the pages, MAP_POPULATE works as expected. My question is, is there any elegant way to workaround this? I do want to make MAP_POPULATE works with XIP. This is because as the device is memory and access latency is pretty low, page fault as well as the mode switch play an important part in the software overhead. In my experiment, MAP_POPULATE provides a 3x improvement on latency when access a big file for the first time. Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
MAP_POPULATE does not work with XIP on ramdisk
Hi, I'm testing mmap() performance on a ramdisk. The kernel is 3.19-rc3. The device driver is brd, and the file system is ext2. Normal mmap() does not make sense on a ramdisk because it adds additional memory copy, so XIP is enabled to map the pages directly into application's address space. With XIP, MAP_POPULATE flag does not work. i.e. prefault fails. Basically it fails in vm_normal_page(), where it's supposed to find the struct page from pfn, but the vma has flag VM_MIXEDMAP and the method returns NULL. As I understand, VM_MIXEDMAP means the memory may not contain a struct page backing, so the code logic is reasonable. However brd driver does provide struct page for each memory page. If I modify the __get_user_pages() and let the prefault runs for all the pages, MAP_POPULATE works as expected. My question is, is there any elegant way to workaround this? I do want to make MAP_POPULATE works with XIP. This is because as the device is memory and access latency is pretty low, page fault as well as the mode switch play an important part in the software overhead. In my experiment, MAP_POPULATE provides a 3x improvement on latency when access a big file for the first time. Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Description for memmap in kernel-parameters.txt is wrong
On Thu, Jan 30, 2014 at 2:54 PM, Randy Dunlap wrote: > On 01/30/2014 02:17 PM, David Rientjes wrote: >> On Thu, 30 Jan 2014, Randy Dunlap wrote: >> >> Hi, >> >> In kernel-parameters.txt, there is following description: >> >> memmap=nn[KMG]$ss[KMG] >> [KNL,ACPI] Mark specific memory as reserved. >> Region of memory to be used, from ss to ss+nn. > > Should be: > Region of memory to be reserved, from ss to > ss+nn. > > but that doesn't help with the problem that you describe, does it? > Actually it should be: Region of memory to be reserved, from nn to nn+ss. That is, exchange nn and ss. >>> >>> Yes, I understand that that's what you are reporting. I just haven't yet >>> worked out how the code manages to exchange those 2 values. >>> >> >> It doesn't, the documentation is correct as written and could be improved >> by your suggestion of "Region of memory to be reserved, from ss to ss+nn." >> I think Andiry probably is having a problem with his bootloader >> interpreting the '$' incorrectly (or variable expansion if coming from the >> shell) or interpreting the resulting user-defined e820 map incorrectly. >> -- > > Yeah, I certainly don't see a problem with the code and I would want to > see/understand that before I exchanged the 2 values in the documentation. > > I'll submit a patch to make the wording a bit better. > I'm using Ubuntu 13.04 with GRUB2. If it's a bootloader issue, what should I do? Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Description for memmap in kernel-parameters.txt is wrong
On Thu, Jan 30, 2014 at 11:25 AM, Randy Dunlap wrote: > [adding linux-mm mailing list] > > On 01/30/2014 08:52 AM, Andiry Xu wrote: >> Hi, >> >> In kernel-parameters.txt, there is following description: >> >> memmap=nn[KMG]$ss[KMG] >> [KNL,ACPI] Mark specific memory as reserved. >> Region of memory to be used, from ss to ss+nn. > > Should be: > Region of memory to be reserved, from ss to ss+nn. > > but that doesn't help with the problem that you describe, does it? > Actually it should be: Region of memory to be reserved, from nn to nn+ss. That is, exchange nn and ss. > >> Unfortunately this is incorrect. The meaning of nn and ss is reversed. >> For example: >> >> Command Expected Result >> memmap 2G$6G6G - 8G reserved 2G - 8G reserved >> memmap 6G$2G2G - 8G reserved 6G - 8G reserved > > Are you testing on x86? > The code in arch/x86/kernel/e820.c always parses mem_size followed by start > address. > I don't (yet) see where it goes wrong... > Yes, it's a x86 machine. > >> Test kernel version 3.13, but I believe the issue has been there long ago. >> >> I'm not sure whether the description or implementation should be >> fixed, but apparently they do not match. > > I prefer to change the documentation and leave the implementation as is. > That's fine. memmap itself works OK, it's just the description is wrong and people like me get confused. Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Description for memmap in kernel-parameters.txt is wrong
Hi, In kernel-parameters.txt, there is following description: memmap=nn[KMG]$ss[KMG] [KNL,ACPI] Mark specific memory as reserved. Region of memory to be used, from ss to ss+nn. Unfortunately this is incorrect. The meaning of nn and ss is reversed. For example: Command Expected Result memmap 2G$6G6G - 8G reserved 2G - 8G reserved memmap 6G$2G2G - 8G reserved 6G - 8G reserved Test kernel version 3.13, but I believe the issue has been there long ago. I'm not sure whether the description or implementation should be fixed, but apparently they do not match. Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Description for memmap in kernel-parameters.txt is wrong
Hi, In kernel-parameters.txt, there is following description: memmap=nn[KMG]$ss[KMG] [KNL,ACPI] Mark specific memory as reserved. Region of memory to be used, from ss to ss+nn. Unfortunately this is incorrect. The meaning of nn and ss is reversed. For example: Command Expected Result memmap 2G$6G6G - 8G reserved 2G - 8G reserved memmap 6G$2G2G - 8G reserved 6G - 8G reserved Test kernel version 3.13, but I believe the issue has been there long ago. I'm not sure whether the description or implementation should be fixed, but apparently they do not match. Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Description for memmap in kernel-parameters.txt is wrong
On Thu, Jan 30, 2014 at 11:25 AM, Randy Dunlap rdun...@infradead.org wrote: [adding linux-mm mailing list] On 01/30/2014 08:52 AM, Andiry Xu wrote: Hi, In kernel-parameters.txt, there is following description: memmap=nn[KMG]$ss[KMG] [KNL,ACPI] Mark specific memory as reserved. Region of memory to be used, from ss to ss+nn. Should be: Region of memory to be reserved, from ss to ss+nn. but that doesn't help with the problem that you describe, does it? Actually it should be: Region of memory to be reserved, from nn to nn+ss. That is, exchange nn and ss. Unfortunately this is incorrect. The meaning of nn and ss is reversed. For example: Command Expected Result memmap 2G$6G6G - 8G reserved 2G - 8G reserved memmap 6G$2G2G - 8G reserved 6G - 8G reserved Are you testing on x86? The code in arch/x86/kernel/e820.c always parses mem_size followed by start address. I don't (yet) see where it goes wrong... Yes, it's a x86 machine. Test kernel version 3.13, but I believe the issue has been there long ago. I'm not sure whether the description or implementation should be fixed, but apparently they do not match. I prefer to change the documentation and leave the implementation as is. That's fine. memmap itself works OK, it's just the description is wrong and people like me get confused. Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Description for memmap in kernel-parameters.txt is wrong
On Thu, Jan 30, 2014 at 2:54 PM, Randy Dunlap rdun...@infradead.org wrote: On 01/30/2014 02:17 PM, David Rientjes wrote: On Thu, 30 Jan 2014, Randy Dunlap wrote: Hi, In kernel-parameters.txt, there is following description: memmap=nn[KMG]$ss[KMG] [KNL,ACPI] Mark specific memory as reserved. Region of memory to be used, from ss to ss+nn. Should be: Region of memory to be reserved, from ss to ss+nn. but that doesn't help with the problem that you describe, does it? Actually it should be: Region of memory to be reserved, from nn to nn+ss. That is, exchange nn and ss. Yes, I understand that that's what you are reporting. I just haven't yet worked out how the code manages to exchange those 2 values. It doesn't, the documentation is correct as written and could be improved by your suggestion of Region of memory to be reserved, from ss to ss+nn. I think Andiry probably is having a problem with his bootloader interpreting the '$' incorrectly (or variable expansion if coming from the shell) or interpreting the resulting user-defined e820 map incorrectly. -- Yeah, I certainly don't see a problem with the code and I would want to see/understand that before I exchanged the 2 values in the documentation. I'll submit a patch to make the wording a bit better. I'm using Ubuntu 13.04 with GRUB2. If it's a bootloader issue, what should I do? Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
On Thu, Nov 7, 2013 at 2:45 PM, Andiry Xu wrote: > On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara wrote: >> On Thu 07-11-13 13:50:09, Andiry Xu wrote: >>> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara wrote: >>> > On Thu 07-11-13 12:14:13, Andiry Xu wrote: >>> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara wrote: >>> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote: >>> >> >> >> Do you know the reason why write() outperforms mmap() in some >>> >> >> >> cases? I >>> >> >> >> know it's not related the thread but I really appreciate if you can >>> >> >> >> answer my question. >>> >> >> > Well, I'm not completely sure. mmap()ed memory always works on >>> >> >> > page-by-page >>> >> >> > basis - you first access the page, it gets faulted in and you can >>> >> >> > further >>> >> >> > access it. So for small (sub page size) accesses this is a win >>> >> >> > because you >>> >> >> > don't have an overhead of syscall and fs write path. For accesses >>> >> >> > larger >>> >> >> > than page size the overhead of syscall and some initial checks is >>> >> >> > well >>> >> >> > hidden by other things. I guess write() ends up being more efficient >>> >> >> > because write path taken for each page is somewhat lighter than >>> >> >> > full page >>> >> >> > fault. But you'd need to look into perf data to get some hard >>> >> >> > numbers on >>> >> >> > where the time is spent. >>> >> >> > >>> >> >> >>> >> >> Thanks for the reply. However I have filled up the whole RAM disk >>> >> >> before doing the test, i.e. asked the brd driver to allocate all the >>> >> >> pages initially. >>> >> > Well, pages in ramdisk are always present, that's not an issue. But >>> >> > you >>> >> > will get a page fault to map a particular physical page in process' >>> >> > virtual address space when you first access that virtual address in the >>> >> > mapping from the process. The cost of setting up this virtual->physical >>> >> > mapping is what I'm talking about. >>> >> > >>> >> >>> >> Yes, you are right, there are page faults observed with perf. I >>> >> misunderstood page fault as copying pages between backing store and >>> >> physical memory. >>> >> >>> >> > If you had a process which first mmaps the file and writes to all >>> >> > pages in >>> >> > the mapping and *then* measure the cost of another round of writing to >>> >> > the >>> >> > mapping, I would expect you should see speeds close to those of memory >>> >> > bus. >>> >> > >>> >> >>> >> I've tried this as well. mmap() performance improves but still not as >>> >> good as write(). >>> >> I used the perf report to compare write() and mmap() applications. For >>> >> write() version, top of perf report shows as: >>> >> 33.33% __copy_user_nocache >>> >> 4.72%ext2_get_blocks >>> >> 4.42%mutex_unlock >>> >> 3.59%__find_get_block >>> >> >>> >> which looks reasonable. >>> >> >>> >> However, for mmap() version, the perf report looks strange: >>> >> 94.98% libc-2.15.so [.] 0x0014698d >>> >> 2.25% page_fault >>> >> 0.18% handle_mm_fault >>> >> >>> >> I don't know what the first item is but it took the majority of cycles. >>> > The first item means that it's some userspace code in libc. My guess >>> > would be that it's libc's memcpy() function (or whatever you use to write >>> > to mmap). How do you access the mmap? >>> > >>> >>> Like this: >>> >>> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); >>> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); >>> for (i = 0; i < count; i++) >>> { >>>memcpy(dest, src, request_size); >>>dest += request_size; >>> } >> OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how >> to tune that though... >> > > Hmm, I will try some different kinds of memcpy to see if there is a > difference. Just want to make sure I do not make some stupid mistakes > before trying that. > Thanks a lot for your help! > Your advice does makes difference. I use a optimized version of memcpy and it does improve the mmap application performance: on a Ramdisk with Ext2 xip, mmap() version now achieves 11GB/s of bandwidth, comparing to posix write version with 7GB/s. Now I wonder if they have a plan to update the memcpy() in libc.. Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
On Thu, Nov 7, 2013 at 2:45 PM, Andiry Xu and...@gmail.com wrote: On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara j...@suse.cz wrote: On Thu 07-11-13 13:50:09, Andiry Xu wrote: On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara j...@suse.cz wrote: On Thu 07-11-13 12:14:13, Andiry Xu wrote: On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara j...@suse.cz wrote: On Tue 05-11-13 17:28:35, Andiry Xu wrote: Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Well, I'm not completely sure. mmap()ed memory always works on page-by-page basis - you first access the page, it gets faulted in and you can further access it. So for small (sub page size) accesses this is a win because you don't have an overhead of syscall and fs write path. For accesses larger than page size the overhead of syscall and some initial checks is well hidden by other things. I guess write() ends up being more efficient because write path taken for each page is somewhat lighter than full page fault. But you'd need to look into perf data to get some hard numbers on where the time is spent. Thanks for the reply. However I have filled up the whole RAM disk before doing the test, i.e. asked the brd driver to allocate all the pages initially. Well, pages in ramdisk are always present, that's not an issue. But you will get a page fault to map a particular physical page in process' virtual address space when you first access that virtual address in the mapping from the process. The cost of setting up this virtual-physical mapping is what I'm talking about. Yes, you are right, there are page faults observed with perf. I misunderstood page fault as copying pages between backing store and physical memory. If you had a process which first mmaps the file and writes to all pages in the mapping and *then* measure the cost of another round of writing to the mapping, I would expect you should see speeds close to those of memory bus. I've tried this as well. mmap() performance improves but still not as good as write(). I used the perf report to compare write() and mmap() applications. For write() version, top of perf report shows as: 33.33% __copy_user_nocache 4.72%ext2_get_blocks 4.42%mutex_unlock 3.59%__find_get_block which looks reasonable. However, for mmap() version, the perf report looks strange: 94.98% libc-2.15.so [.] 0x0014698d 2.25% page_fault 0.18% handle_mm_fault I don't know what the first item is but it took the majority of cycles. The first item means that it's some userspace code in libc. My guess would be that it's libc's memcpy() function (or whatever you use to write to mmap). How do you access the mmap? Like this: fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); for (i = 0; i count; i++) { memcpy(dest, src, request_size); dest += request_size; } OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how to tune that though... Hmm, I will try some different kinds of memcpy to see if there is a difference. Just want to make sure I do not make some stupid mistakes before trying that. Thanks a lot for your help! Your advice does makes difference. I use a optimized version of memcpy and it does improve the mmap application performance: on a Ramdisk with Ext2 xip, mmap() version now achieves 11GB/s of bandwidth, comparing to posix write version with 7GB/s. Now I wonder if they have a plan to update the memcpy() in libc.. Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara wrote: > On Thu 07-11-13 13:50:09, Andiry Xu wrote: >> On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara wrote: >> > On Thu 07-11-13 12:14:13, Andiry Xu wrote: >> >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara wrote: >> >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote: >> >> >> >> Do you know the reason why write() outperforms mmap() in some >> >> >> >> cases? I >> >> >> >> know it's not related the thread but I really appreciate if you can >> >> >> >> answer my question. >> >> >> > Well, I'm not completely sure. mmap()ed memory always works on >> >> >> > page-by-page >> >> >> > basis - you first access the page, it gets faulted in and you can >> >> >> > further >> >> >> > access it. So for small (sub page size) accesses this is a win >> >> >> > because you >> >> >> > don't have an overhead of syscall and fs write path. For accesses >> >> >> > larger >> >> >> > than page size the overhead of syscall and some initial checks is >> >> >> > well >> >> >> > hidden by other things. I guess write() ends up being more efficient >> >> >> > because write path taken for each page is somewhat lighter than full >> >> >> > page >> >> >> > fault. But you'd need to look into perf data to get some hard >> >> >> > numbers on >> >> >> > where the time is spent. >> >> >> > >> >> >> >> >> >> Thanks for the reply. However I have filled up the whole RAM disk >> >> >> before doing the test, i.e. asked the brd driver to allocate all the >> >> >> pages initially. >> >> > Well, pages in ramdisk are always present, that's not an issue. But >> >> > you >> >> > will get a page fault to map a particular physical page in process' >> >> > virtual address space when you first access that virtual address in the >> >> > mapping from the process. The cost of setting up this virtual->physical >> >> > mapping is what I'm talking about. >> >> > >> >> >> >> Yes, you are right, there are page faults observed with perf. I >> >> misunderstood page fault as copying pages between backing store and >> >> physical memory. >> >> >> >> > If you had a process which first mmaps the file and writes to all pages >> >> > in >> >> > the mapping and *then* measure the cost of another round of writing to >> >> > the >> >> > mapping, I would expect you should see speeds close to those of memory >> >> > bus. >> >> > >> >> >> >> I've tried this as well. mmap() performance improves but still not as >> >> good as write(). >> >> I used the perf report to compare write() and mmap() applications. For >> >> write() version, top of perf report shows as: >> >> 33.33% __copy_user_nocache >> >> 4.72%ext2_get_blocks >> >> 4.42%mutex_unlock >> >> 3.59%__find_get_block >> >> >> >> which looks reasonable. >> >> >> >> However, for mmap() version, the perf report looks strange: >> >> 94.98% libc-2.15.so [.] 0x0014698d >> >> 2.25% page_fault >> >> 0.18% handle_mm_fault >> >> >> >> I don't know what the first item is but it took the majority of cycles. >> > The first item means that it's some userspace code in libc. My guess >> > would be that it's libc's memcpy() function (or whatever you use to write >> > to mmap). How do you access the mmap? >> > >> >> Like this: >> >> fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); >> dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); >> for (i = 0; i < count; i++) >> { >>memcpy(dest, src, request_size); >>dest += request_size; >> } > OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how > to tune that though... > Hmm, I will try some different kinds of memcpy to see if there is a difference. Just want to make sure I do not make some stupid mistakes before trying that. Thanks a lot for your help! Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara wrote: > On Thu 07-11-13 12:14:13, Andiry Xu wrote: >> On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara wrote: >> > On Tue 05-11-13 17:28:35, Andiry Xu wrote: >> >> >> Do you know the reason why write() outperforms mmap() in some cases? I >> >> >> know it's not related the thread but I really appreciate if you can >> >> >> answer my question. >> >> > Well, I'm not completely sure. mmap()ed memory always works on >> >> > page-by-page >> >> > basis - you first access the page, it gets faulted in and you can >> >> > further >> >> > access it. So for small (sub page size) accesses this is a win because >> >> > you >> >> > don't have an overhead of syscall and fs write path. For accesses larger >> >> > than page size the overhead of syscall and some initial checks is well >> >> > hidden by other things. I guess write() ends up being more efficient >> >> > because write path taken for each page is somewhat lighter than full >> >> > page >> >> > fault. But you'd need to look into perf data to get some hard numbers on >> >> > where the time is spent. >> >> > >> >> >> >> Thanks for the reply. However I have filled up the whole RAM disk >> >> before doing the test, i.e. asked the brd driver to allocate all the >> >> pages initially. >> > Well, pages in ramdisk are always present, that's not an issue. But you >> > will get a page fault to map a particular physical page in process' >> > virtual address space when you first access that virtual address in the >> > mapping from the process. The cost of setting up this virtual->physical >> > mapping is what I'm talking about. >> > >> >> Yes, you are right, there are page faults observed with perf. I >> misunderstood page fault as copying pages between backing store and >> physical memory. >> >> > If you had a process which first mmaps the file and writes to all pages in >> > the mapping and *then* measure the cost of another round of writing to the >> > mapping, I would expect you should see speeds close to those of memory bus. >> > >> >> I've tried this as well. mmap() performance improves but still not as >> good as write(). >> I used the perf report to compare write() and mmap() applications. For >> write() version, top of perf report shows as: >> 33.33% __copy_user_nocache >> 4.72%ext2_get_blocks >> 4.42%mutex_unlock >> 3.59%__find_get_block >> >> which looks reasonable. >> >> However, for mmap() version, the perf report looks strange: >> 94.98% libc-2.15.so [.] 0x0014698d >> 2.25% page_fault >> 0.18% handle_mm_fault >> >> I don't know what the first item is but it took the majority of cycles. > The first item means that it's some userspace code in libc. My guess > would be that it's libc's memcpy() function (or whatever you use to write > to mmap). How do you access the mmap? > Like this: fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); for (i = 0; i < count; i++) { memcpy(dest, src, request_size); dest += request_size; } Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara wrote: > On Tue 05-11-13 17:28:35, Andiry Xu wrote: >> Hi, >> >> On Tue, Nov 5, 2013 at 6:32 AM, Jan Kara wrote: >> > Hello, >> > >> > On Mon 04-11-13 18:37:40, Andiry Xu wrote: >> >> On Mon, Nov 4, 2013 at 4:37 PM, Jan Kara wrote: >> >> > Hello, >> >> > >> >> > On Mon 04-11-13 14:31:34, Andiry Xu wrote: >> >> >> When I'm trying XIP on ext2, I find that xip does not work on ext2 >> >> >> with latest kernel. >> >> >> >> >> >> Reproduce steps: >> >> >> Compile kernel with following configs: >> >> >> CONFIG_BLK_DEV_XIP=y >> >> >> CONFIG_EXT2_FS_XIP=y >> >> >> >> >> >> And run following commands: >> >> >> # mke2fs -b 4096 /dev/ram0 >> >> >> # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ >> >> >> # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 >> >> >> >> >> >> And it shows: >> >> >> dd: writing `/mnt/ramdisk/test1': No space left on device >> >> >> >> >> >> df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a >> >> >> 16MB write should only occupy 1/4 capacity. >> >> >> >> >> >> Criminal commit: >> >> >> After git bisect, it points to the following commit: >> >> >> 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b >> >> >> Ext2: mark inode dirty after the function dquot_free_block_nodirty is >> >> >> called >> >> > Thanks for report and the bisection! >> >> > >> >> >> Particularly, the following code: >> >> >> @@ -1412,9 +1415,11 @@ allocated: >> >> >> *errp = 0; >> >> >> brelse(bitmap_bh); >> >> >> -dquot_free_block_nodirty(inode, *count-num); >> >> >> -mark_inode_dirty(inode); >> >> >> -*count = num; >> >> >> +if (num < *count) { >> >> >> +dquot_free_block_nodirty(inode, *count-num); >> >> >> +mark_inode_dirty(inode); >> >> >> +*count = num; >> >> >> +} >> >> >> return ret_block; >> >> >> >> >> >> Not mark_inode_dirty() is called only when num is less than *count. >> >> >> However, I've seen >> >> >> with the dd command, there is case where num >= *count. >> >> >> >> >> >> Fix: >> >> >> I've verified that the following patch fixes the issue: >> >> >> diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c >> >> >> index 9f9992b..5446a52 100644 >> >> >> --- a/fs/ext2/balloc.c >> >> >> +++ b/fs/ext2/balloc.c >> >> >> @@ -1406,11 +1406,10 @@ allocated: >> >> >> >> >> >> *errp = 0; >> >> >> brelse(bitmap_bh); >> >> >> - if (num < *count) { >> >> >> + if (num <= *count) >> >> >> dquot_free_block_nodirty(inode, *count-num); >> >> >> - mark_inode_dirty(inode); >> >> >> - *count = num; >> >> >> - } >> >> >> + mark_inode_dirty(inode); >> >> >> + *count = num; >> >> >> return ret_block; >> >> >> >> >> >> io_error: >> >> >> >> >> >> However, I'm not familiar with ext2 source code and cannot tell if >> >> >> this is the correct fix. At least it fixes my issue. >> >> > With this, you have essentially reverted a hunk from commit >> >> > 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b. But I don't see a reason why >> >> > it >> >> > should be reverted. num should never ever be greater than *count and >> >> > when >> >> > num == count, we the code inside if doesn't do anything useful. >> >> > >> >> > I've looked into the code and I think I see the problem. It is a long >> >> > standing bug in __ext2_get_block() in fs/ext2/xip.c. It calls >> >> > ext2_get_block() asking for 0 blocks to map (while we really want 1 >> >> > block). >> >>
Re: [BUG][ext2] XIP does not work on ext2
On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara j...@suse.cz wrote: On Tue 05-11-13 17:28:35, Andiry Xu wrote: Hi, On Tue, Nov 5, 2013 at 6:32 AM, Jan Kara j...@suse.cz wrote: Hello, On Mon 04-11-13 18:37:40, Andiry Xu wrote: On Mon, Nov 4, 2013 at 4:37 PM, Jan Kara j...@suse.cz wrote: Hello, On Mon 04-11-13 14:31:34, Andiry Xu wrote: When I'm trying XIP on ext2, I find that xip does not work on ext2 with latest kernel. Reproduce steps: Compile kernel with following configs: CONFIG_BLK_DEV_XIP=y CONFIG_EXT2_FS_XIP=y And run following commands: # mke2fs -b 4096 /dev/ram0 # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 And it shows: dd: writing `/mnt/ramdisk/test1': No space left on device df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a 16MB write should only occupy 1/4 capacity. Criminal commit: After git bisect, it points to the following commit: 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b Ext2: mark inode dirty after the function dquot_free_block_nodirty is called Thanks for report and the bisection! Particularly, the following code: @@ -1412,9 +1415,11 @@ allocated: *errp = 0; brelse(bitmap_bh); -dquot_free_block_nodirty(inode, *count-num); -mark_inode_dirty(inode); -*count = num; +if (num *count) { +dquot_free_block_nodirty(inode, *count-num); +mark_inode_dirty(inode); +*count = num; +} return ret_block; Not mark_inode_dirty() is called only when num is less than *count. However, I've seen with the dd command, there is case where num = *count. Fix: I've verified that the following patch fixes the issue: diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 9f9992b..5446a52 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -1406,11 +1406,10 @@ allocated: *errp = 0; brelse(bitmap_bh); - if (num *count) { + if (num = *count) dquot_free_block_nodirty(inode, *count-num); - mark_inode_dirty(inode); - *count = num; - } + mark_inode_dirty(inode); + *count = num; return ret_block; io_error: However, I'm not familiar with ext2 source code and cannot tell if this is the correct fix. At least it fixes my issue. With this, you have essentially reverted a hunk from commit 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b. But I don't see a reason why it should be reverted. num should never ever be greater than *count and when num == count, we the code inside if doesn't do anything useful. I've looked into the code and I think I see the problem. It is a long standing bug in __ext2_get_block() in fs/ext2/xip.c. It calls ext2_get_block() asking for 0 blocks to map (while we really want 1 block). ext2_get_block() just passes that request and ext2_get_blocks() actually allocates 1 block. And that's were the commit you have identified makes a difference because previously we returned that 1 block was allocated while now we return that 0 blocks were allocated and thus allocation is repeated until all free blocks are exhaused. Attached patch should fix the problem. Thanks for the reply. I've verified that your patch fixes my issue. And it's absolutely better than my solution. Tested-by: Andiry Xu andiry...@gmail.com Thanks for testing! I have another question about ext2 XIP performance, although it's not quite related to this thread. I'm testing xip with ext2 on a ram disk drive, the driver is brd.c. The RAM disk size is 2GB and I pre-fill it to guarantee that all pages reside in main memory. Then I use two different applications to write to the ram disk. One is open() with O_DIRECT flag, and writing with Posix write(). Another is open() with O_DIRECT, mmap() it to user space, then use memcpy() to write data. I use different request size to write data, from 512 bytes to 64MB. In my understanding, the mmap version bypasses the file system and does not go to kernel space, hence it should have better performance than the Posix-write version. However, my test result shows it's not always true: when the request size is between 8KB and 1MB, the Posix-write() version has bandwidth about 7GB/s and mmap version only has 5GB/s. The test is performed on a i7-3770K machine with 8GB memory, kernel 3.12. Also I have tested on kernel 3.2, in which mmap has really bad performance, only 2GB/s for all request sizes. Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Well, I'm not completely sure. mmap()ed memory always works on page-by-page basis - you
Re: [BUG][ext2] XIP does not work on ext2
On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara j...@suse.cz wrote: On Thu 07-11-13 12:14:13, Andiry Xu wrote: On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara j...@suse.cz wrote: On Tue 05-11-13 17:28:35, Andiry Xu wrote: Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Well, I'm not completely sure. mmap()ed memory always works on page-by-page basis - you first access the page, it gets faulted in and you can further access it. So for small (sub page size) accesses this is a win because you don't have an overhead of syscall and fs write path. For accesses larger than page size the overhead of syscall and some initial checks is well hidden by other things. I guess write() ends up being more efficient because write path taken for each page is somewhat lighter than full page fault. But you'd need to look into perf data to get some hard numbers on where the time is spent. Thanks for the reply. However I have filled up the whole RAM disk before doing the test, i.e. asked the brd driver to allocate all the pages initially. Well, pages in ramdisk are always present, that's not an issue. But you will get a page fault to map a particular physical page in process' virtual address space when you first access that virtual address in the mapping from the process. The cost of setting up this virtual-physical mapping is what I'm talking about. Yes, you are right, there are page faults observed with perf. I misunderstood page fault as copying pages between backing store and physical memory. If you had a process which first mmaps the file and writes to all pages in the mapping and *then* measure the cost of another round of writing to the mapping, I would expect you should see speeds close to those of memory bus. I've tried this as well. mmap() performance improves but still not as good as write(). I used the perf report to compare write() and mmap() applications. For write() version, top of perf report shows as: 33.33% __copy_user_nocache 4.72%ext2_get_blocks 4.42%mutex_unlock 3.59%__find_get_block which looks reasonable. However, for mmap() version, the perf report looks strange: 94.98% libc-2.15.so [.] 0x0014698d 2.25% page_fault 0.18% handle_mm_fault I don't know what the first item is but it took the majority of cycles. The first item means that it's some userspace code in libc. My guess would be that it's libc's memcpy() function (or whatever you use to write to mmap). How do you access the mmap? Like this: fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); for (i = 0; i count; i++) { memcpy(dest, src, request_size); dest += request_size; } Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
On Thu, Nov 7, 2013 at 2:20 PM, Jan Kara j...@suse.cz wrote: On Thu 07-11-13 13:50:09, Andiry Xu wrote: On Thu, Nov 7, 2013 at 1:07 PM, Jan Kara j...@suse.cz wrote: On Thu 07-11-13 12:14:13, Andiry Xu wrote: On Wed, Nov 6, 2013 at 1:18 PM, Jan Kara j...@suse.cz wrote: On Tue 05-11-13 17:28:35, Andiry Xu wrote: Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Well, I'm not completely sure. mmap()ed memory always works on page-by-page basis - you first access the page, it gets faulted in and you can further access it. So for small (sub page size) accesses this is a win because you don't have an overhead of syscall and fs write path. For accesses larger than page size the overhead of syscall and some initial checks is well hidden by other things. I guess write() ends up being more efficient because write path taken for each page is somewhat lighter than full page fault. But you'd need to look into perf data to get some hard numbers on where the time is spent. Thanks for the reply. However I have filled up the whole RAM disk before doing the test, i.e. asked the brd driver to allocate all the pages initially. Well, pages in ramdisk are always present, that's not an issue. But you will get a page fault to map a particular physical page in process' virtual address space when you first access that virtual address in the mapping from the process. The cost of setting up this virtual-physical mapping is what I'm talking about. Yes, you are right, there are page faults observed with perf. I misunderstood page fault as copying pages between backing store and physical memory. If you had a process which first mmaps the file and writes to all pages in the mapping and *then* measure the cost of another round of writing to the mapping, I would expect you should see speeds close to those of memory bus. I've tried this as well. mmap() performance improves but still not as good as write(). I used the perf report to compare write() and mmap() applications. For write() version, top of perf report shows as: 33.33% __copy_user_nocache 4.72%ext2_get_blocks 4.42%mutex_unlock 3.59%__find_get_block which looks reasonable. However, for mmap() version, the perf report looks strange: 94.98% libc-2.15.so [.] 0x0014698d 2.25% page_fault 0.18% handle_mm_fault I don't know what the first item is but it took the majority of cycles. The first item means that it's some userspace code in libc. My guess would be that it's libc's memcpy() function (or whatever you use to write to mmap). How do you access the mmap? Like this: fd = open(file_name, O_CREAT | O_RDWR | O_DIRECT, 0755); dest = (char *)mmap(NULL, FILE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0); for (i = 0; i count; i++) { memcpy(dest, src, request_size); dest += request_size; } OK, maybe libc memcpy isn't very well optimized for you cpu? Not sure how to tune that though... Hmm, I will try some different kinds of memcpy to see if there is a difference. Just want to make sure I do not make some stupid mistakes before trying that. Thanks a lot for your help! Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
Hi, On Tue, Nov 5, 2013 at 6:32 AM, Jan Kara wrote: > Hello, > > On Mon 04-11-13 18:37:40, Andiry Xu wrote: >> On Mon, Nov 4, 2013 at 4:37 PM, Jan Kara wrote: >> > Hello, >> > >> > On Mon 04-11-13 14:31:34, Andiry Xu wrote: >> >> When I'm trying XIP on ext2, I find that xip does not work on ext2 >> >> with latest kernel. >> >> >> >> Reproduce steps: >> >> Compile kernel with following configs: >> >> CONFIG_BLK_DEV_XIP=y >> >> CONFIG_EXT2_FS_XIP=y >> >> >> >> And run following commands: >> >> # mke2fs -b 4096 /dev/ram0 >> >> # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ >> >> # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 >> >> >> >> And it shows: >> >> dd: writing `/mnt/ramdisk/test1': No space left on device >> >> >> >> df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a >> >> 16MB write should only occupy 1/4 capacity. >> >> >> >> Criminal commit: >> >> After git bisect, it points to the following commit: >> >> 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b >> >> Ext2: mark inode dirty after the function dquot_free_block_nodirty is >> >> called >> > Thanks for report and the bisection! >> > >> >> Particularly, the following code: >> >> @@ -1412,9 +1415,11 @@ allocated: >> >> *errp = 0; >> >> brelse(bitmap_bh); >> >> -dquot_free_block_nodirty(inode, *count-num); >> >> -mark_inode_dirty(inode); >> >> -*count = num; >> >> +if (num < *count) { >> >> +dquot_free_block_nodirty(inode, *count-num); >> >> +mark_inode_dirty(inode); >> >> +*count = num; >> >> +} >> >> return ret_block; >> >> >> >> Not mark_inode_dirty() is called only when num is less than *count. >> >> However, I've seen >> >> with the dd command, there is case where num >= *count. >> >> >> >> Fix: >> >> I've verified that the following patch fixes the issue: >> >> diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c >> >> index 9f9992b..5446a52 100644 >> >> --- a/fs/ext2/balloc.c >> >> +++ b/fs/ext2/balloc.c >> >> @@ -1406,11 +1406,10 @@ allocated: >> >> >> >> *errp = 0; >> >> brelse(bitmap_bh); >> >> - if (num < *count) { >> >> + if (num <= *count) >> >> dquot_free_block_nodirty(inode, *count-num); >> >> - mark_inode_dirty(inode); >> >> - *count = num; >> >> - } >> >> + mark_inode_dirty(inode); >> >> + *count = num; >> >> return ret_block; >> >> >> >> io_error: >> >> >> >> However, I'm not familiar with ext2 source code and cannot tell if >> >> this is the correct fix. At least it fixes my issue. >> > With this, you have essentially reverted a hunk from commit >> > 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b. But I don't see a reason why it >> > should be reverted. num should never ever be greater than *count and when >> > num == count, we the code inside if doesn't do anything useful. >> > >> > I've looked into the code and I think I see the problem. It is a long >> > standing bug in __ext2_get_block() in fs/ext2/xip.c. It calls >> > ext2_get_block() asking for 0 blocks to map (while we really want 1 block). >> > ext2_get_block() just passes that request and ext2_get_blocks() actually >> > allocates 1 block. And that's were the commit you have identified makes a >> > difference because previously we returned that 1 block was allocated while >> > now we return that 0 blocks were allocated and thus allocation is repeated >> > until all free blocks are exhaused. >> > >> > Attached patch should fix the problem. >> > >> >> Thanks for the reply. I've verified that your patch fixes my issue. >> And it's absolutely better than my solution. >> >> Tested-by: Andiry Xu > Thanks for testing! > >> I have another question about ext2 XIP performance, although it's not >> quite related to this thread. >> >> I'm testing xip with ext2 on a ram disk drive, the driver is brd.c. >> The RAM disk size is 2GB and I pre-fill it to g
Re: [BUG][ext2] XIP does not work on ext2
Hi, On Tue, Nov 5, 2013 at 6:32 AM, Jan Kara j...@suse.cz wrote: Hello, On Mon 04-11-13 18:37:40, Andiry Xu wrote: On Mon, Nov 4, 2013 at 4:37 PM, Jan Kara j...@suse.cz wrote: Hello, On Mon 04-11-13 14:31:34, Andiry Xu wrote: When I'm trying XIP on ext2, I find that xip does not work on ext2 with latest kernel. Reproduce steps: Compile kernel with following configs: CONFIG_BLK_DEV_XIP=y CONFIG_EXT2_FS_XIP=y And run following commands: # mke2fs -b 4096 /dev/ram0 # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 And it shows: dd: writing `/mnt/ramdisk/test1': No space left on device df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a 16MB write should only occupy 1/4 capacity. Criminal commit: After git bisect, it points to the following commit: 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b Ext2: mark inode dirty after the function dquot_free_block_nodirty is called Thanks for report and the bisection! Particularly, the following code: @@ -1412,9 +1415,11 @@ allocated: *errp = 0; brelse(bitmap_bh); -dquot_free_block_nodirty(inode, *count-num); -mark_inode_dirty(inode); -*count = num; +if (num *count) { +dquot_free_block_nodirty(inode, *count-num); +mark_inode_dirty(inode); +*count = num; +} return ret_block; Not mark_inode_dirty() is called only when num is less than *count. However, I've seen with the dd command, there is case where num = *count. Fix: I've verified that the following patch fixes the issue: diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 9f9992b..5446a52 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -1406,11 +1406,10 @@ allocated: *errp = 0; brelse(bitmap_bh); - if (num *count) { + if (num = *count) dquot_free_block_nodirty(inode, *count-num); - mark_inode_dirty(inode); - *count = num; - } + mark_inode_dirty(inode); + *count = num; return ret_block; io_error: However, I'm not familiar with ext2 source code and cannot tell if this is the correct fix. At least it fixes my issue. With this, you have essentially reverted a hunk from commit 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b. But I don't see a reason why it should be reverted. num should never ever be greater than *count and when num == count, we the code inside if doesn't do anything useful. I've looked into the code and I think I see the problem. It is a long standing bug in __ext2_get_block() in fs/ext2/xip.c. It calls ext2_get_block() asking for 0 blocks to map (while we really want 1 block). ext2_get_block() just passes that request and ext2_get_blocks() actually allocates 1 block. And that's were the commit you have identified makes a difference because previously we returned that 1 block was allocated while now we return that 0 blocks were allocated and thus allocation is repeated until all free blocks are exhaused. Attached patch should fix the problem. Thanks for the reply. I've verified that your patch fixes my issue. And it's absolutely better than my solution. Tested-by: Andiry Xu andiry...@gmail.com Thanks for testing! I have another question about ext2 XIP performance, although it's not quite related to this thread. I'm testing xip with ext2 on a ram disk drive, the driver is brd.c. The RAM disk size is 2GB and I pre-fill it to guarantee that all pages reside in main memory. Then I use two different applications to write to the ram disk. One is open() with O_DIRECT flag, and writing with Posix write(). Another is open() with O_DIRECT, mmap() it to user space, then use memcpy() to write data. I use different request size to write data, from 512 bytes to 64MB. In my understanding, the mmap version bypasses the file system and does not go to kernel space, hence it should have better performance than the Posix-write version. However, my test result shows it's not always true: when the request size is between 8KB and 1MB, the Posix-write() version has bandwidth about 7GB/s and mmap version only has 5GB/s. The test is performed on a i7-3770K machine with 8GB memory, kernel 3.12. Also I have tested on kernel 3.2, in which mmap has really bad performance, only 2GB/s for all request sizes. Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Well, I'm not completely sure. mmap()ed memory always works on page-by-page basis - you first access the page, it gets faulted in and you can further access it. So for small (sub page size) accesses this is a win because you don't have an overhead of syscall and fs write path. For accesses larger than page size the overhead of syscall
Re: [BUG][ext2] XIP does not work on ext2
Hi Jan, On Mon, Nov 4, 2013 at 4:37 PM, Jan Kara wrote: > Hello, > > On Mon 04-11-13 14:31:34, Andiry Xu wrote: >> When I'm trying XIP on ext2, I find that xip does not work on ext2 >> with latest kernel. >> >> Reproduce steps: >> Compile kernel with following configs: >> CONFIG_BLK_DEV_XIP=y >> CONFIG_EXT2_FS_XIP=y >> >> And run following commands: >> # mke2fs -b 4096 /dev/ram0 >> # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ >> # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 >> >> And it shows: >> dd: writing `/mnt/ramdisk/test1': No space left on device >> >> df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a >> 16MB write should only occupy 1/4 capacity. >> >> Criminal commit: >> After git bisect, it points to the following commit: >> 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b >> Ext2: mark inode dirty after the function dquot_free_block_nodirty is called > Thanks for report and the bisection! > >> Particularly, the following code: >> @@ -1412,9 +1415,11 @@ allocated: >> *errp = 0; >> brelse(bitmap_bh); >> -dquot_free_block_nodirty(inode, *count-num); >> -mark_inode_dirty(inode); >> -*count = num; >> +if (num < *count) { >> +dquot_free_block_nodirty(inode, *count-num); >> +mark_inode_dirty(inode); >> +*count = num; >> +} >> return ret_block; >> >> Not mark_inode_dirty() is called only when num is less than *count. >> However, I've seen >> with the dd command, there is case where num >= *count. >> >> Fix: >> I've verified that the following patch fixes the issue: >> diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c >> index 9f9992b..5446a52 100644 >> --- a/fs/ext2/balloc.c >> +++ b/fs/ext2/balloc.c >> @@ -1406,11 +1406,10 @@ allocated: >> >> *errp = 0; >> brelse(bitmap_bh); >> - if (num < *count) { >> + if (num <= *count) >> dquot_free_block_nodirty(inode, *count-num); >> - mark_inode_dirty(inode); >> - *count = num; >> - } >> + mark_inode_dirty(inode); >> + *count = num; >> return ret_block; >> >> io_error: >> >> However, I'm not familiar with ext2 source code and cannot tell if >> this is the correct fix. At least it fixes my issue. > With this, you have essentially reverted a hunk from commit > 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b. But I don't see a reason why it > should be reverted. num should never ever be greater than *count and when > num == count, we the code inside if doesn't do anything useful. > > I've looked into the code and I think I see the problem. It is a long > standing bug in __ext2_get_block() in fs/ext2/xip.c. It calls > ext2_get_block() asking for 0 blocks to map (while we really want 1 block). > ext2_get_block() just passes that request and ext2_get_blocks() actually > allocates 1 block. And that's were the commit you have identified makes a > difference because previously we returned that 1 block was allocated while > now we return that 0 blocks were allocated and thus allocation is repeated > until all free blocks are exhaused. > > Attached patch should fix the problem. > Thanks for the reply. I've verified that your patch fixes my issue. And it's absolutely better than my solution. Tested-by: Andiry Xu I have another question about ext2 XIP performance, although it's not quite related to this thread. I'm testing xip with ext2 on a ram disk drive, the driver is brd.c. The RAM disk size is 2GB and I pre-fill it to guarantee that all pages reside in main memory. Then I use two different applications to write to the ram disk. One is open() with O_DIRECT flag, and writing with Posix write(). Another is open() with O_DIRECT, mmap() it to user space, then use memcpy() to write data. I use different request size to write data, from 512 bytes to 64MB. In my understanding, the mmap version bypasses the file system and does not go to kernel space, hence it should have better performance than the Posix-write version. However, my test result shows it's not always true: when the request size is between 8KB and 1MB, the Posix-write() version has bandwidth about 7GB/s and mmap version only has 5GB/s. The test is performed on a i7-3770K machine with 8GB memory, kernel 3.12. Also I have tested on kernel 3.2, in which mmap has really bad performance, only 2GB/s for all request sizes. Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG][ext2] XIP does not work on ext2
Hi, When I'm trying XIP on ext2, I find that xip does not work on ext2 with latest kernel. Reproduce steps: Compile kernel with following configs: CONFIG_BLK_DEV_XIP=y CONFIG_EXT2_FS_XIP=y And run following commands: # mke2fs -b 4096 /dev/ram0 # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 And it shows: dd: writing `/mnt/ramdisk/test1': No space left on device df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a 16MB write should only occupy 1/4 capacity. Criminal commit: After git bisect, it points to the following commit: 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b Ext2: mark inode dirty after the function dquot_free_block_nodirty is called Particularly, the following code: @@ -1412,9 +1415,11 @@ allocated: *errp = 0; brelse(bitmap_bh); -dquot_free_block_nodirty(inode, *count-num); -mark_inode_dirty(inode); -*count = num; +if (num < *count) { +dquot_free_block_nodirty(inode, *count-num); +mark_inode_dirty(inode); +*count = num; +} return ret_block; Not mark_inode_dirty() is called only when num is less than *count. However, I've seen with the dd command, there is case where num >= *count. Fix: I've verified that the following patch fixes the issue: diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 9f9992b..5446a52 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -1406,11 +1406,10 @@ allocated: *errp = 0; brelse(bitmap_bh); - if (num < *count) { + if (num <= *count) dquot_free_block_nodirty(inode, *count-num); - mark_inode_dirty(inode); - *count = num; - } + mark_inode_dirty(inode); + *count = num; return ret_block; io_error: However, I'm not familiar with ext2 source code and cannot tell if this is the correct fix. At least it fixes my issue. Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG][ext2] XIP does not work on ext2
Hi, When I'm trying XIP on ext2, I find that xip does not work on ext2 with latest kernel. Reproduce steps: Compile kernel with following configs: CONFIG_BLK_DEV_XIP=y CONFIG_EXT2_FS_XIP=y And run following commands: # mke2fs -b 4096 /dev/ram0 # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 And it shows: dd: writing `/mnt/ramdisk/test1': No space left on device df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a 16MB write should only occupy 1/4 capacity. Criminal commit: After git bisect, it points to the following commit: 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b Ext2: mark inode dirty after the function dquot_free_block_nodirty is called Particularly, the following code: @@ -1412,9 +1415,11 @@ allocated: *errp = 0; brelse(bitmap_bh); -dquot_free_block_nodirty(inode, *count-num); -mark_inode_dirty(inode); -*count = num; +if (num *count) { +dquot_free_block_nodirty(inode, *count-num); +mark_inode_dirty(inode); +*count = num; +} return ret_block; Not mark_inode_dirty() is called only when num is less than *count. However, I've seen with the dd command, there is case where num = *count. Fix: I've verified that the following patch fixes the issue: diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 9f9992b..5446a52 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -1406,11 +1406,10 @@ allocated: *errp = 0; brelse(bitmap_bh); - if (num *count) { + if (num = *count) dquot_free_block_nodirty(inode, *count-num); - mark_inode_dirty(inode); - *count = num; - } + mark_inode_dirty(inode); + *count = num; return ret_block; io_error: However, I'm not familiar with ext2 source code and cannot tell if this is the correct fix. At least it fixes my issue. Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG][ext2] XIP does not work on ext2
Hi Jan, On Mon, Nov 4, 2013 at 4:37 PM, Jan Kara j...@suse.cz wrote: Hello, On Mon 04-11-13 14:31:34, Andiry Xu wrote: When I'm trying XIP on ext2, I find that xip does not work on ext2 with latest kernel. Reproduce steps: Compile kernel with following configs: CONFIG_BLK_DEV_XIP=y CONFIG_EXT2_FS_XIP=y And run following commands: # mke2fs -b 4096 /dev/ram0 # mount -t ext2 -o xip /dev/ram0 /mnt/ramdisk/ # dd if=/dev/zero of=/mnt/ramdisk/test1 bs=1M count=16 And it shows: dd: writing `/mnt/ramdisk/test1': No space left on device df also shows /mnt/ramdisk is 100% full. Its default size is 64MB so a 16MB write should only occupy 1/4 capacity. Criminal commit: After git bisect, it points to the following commit: 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b Ext2: mark inode dirty after the function dquot_free_block_nodirty is called Thanks for report and the bisection! Particularly, the following code: @@ -1412,9 +1415,11 @@ allocated: *errp = 0; brelse(bitmap_bh); -dquot_free_block_nodirty(inode, *count-num); -mark_inode_dirty(inode); -*count = num; +if (num *count) { +dquot_free_block_nodirty(inode, *count-num); +mark_inode_dirty(inode); +*count = num; +} return ret_block; Not mark_inode_dirty() is called only when num is less than *count. However, I've seen with the dd command, there is case where num = *count. Fix: I've verified that the following patch fixes the issue: diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 9f9992b..5446a52 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -1406,11 +1406,10 @@ allocated: *errp = 0; brelse(bitmap_bh); - if (num *count) { + if (num = *count) dquot_free_block_nodirty(inode, *count-num); - mark_inode_dirty(inode); - *count = num; - } + mark_inode_dirty(inode); + *count = num; return ret_block; io_error: However, I'm not familiar with ext2 source code and cannot tell if this is the correct fix. At least it fixes my issue. With this, you have essentially reverted a hunk from commit 8e3dffc651cb668e1ff4d8b89cc1c3dde7540d3b. But I don't see a reason why it should be reverted. num should never ever be greater than *count and when num == count, we the code inside if doesn't do anything useful. I've looked into the code and I think I see the problem. It is a long standing bug in __ext2_get_block() in fs/ext2/xip.c. It calls ext2_get_block() asking for 0 blocks to map (while we really want 1 block). ext2_get_block() just passes that request and ext2_get_blocks() actually allocates 1 block. And that's were the commit you have identified makes a difference because previously we returned that 1 block was allocated while now we return that 0 blocks were allocated and thus allocation is repeated until all free blocks are exhaused. Attached patch should fix the problem. Thanks for the reply. I've verified that your patch fixes my issue. And it's absolutely better than my solution. Tested-by: Andiry Xu andiry...@gmail.com I have another question about ext2 XIP performance, although it's not quite related to this thread. I'm testing xip with ext2 on a ram disk drive, the driver is brd.c. The RAM disk size is 2GB and I pre-fill it to guarantee that all pages reside in main memory. Then I use two different applications to write to the ram disk. One is open() with O_DIRECT flag, and writing with Posix write(). Another is open() with O_DIRECT, mmap() it to user space, then use memcpy() to write data. I use different request size to write data, from 512 bytes to 64MB. In my understanding, the mmap version bypasses the file system and does not go to kernel space, hence it should have better performance than the Posix-write version. However, my test result shows it's not always true: when the request size is between 8KB and 1MB, the Posix-write() version has bandwidth about 7GB/s and mmap version only has 5GB/s. The test is performed on a i7-3770K machine with 8GB memory, kernel 3.12. Also I have tested on kernel 3.2, in which mmap has really bad performance, only 2GB/s for all request sizes. Do you know the reason why write() outperforms mmap() in some cases? I know it's not related the thread but I really appreciate if you can answer my question. Thanks, Andiry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: Recognize USB 3.0 devices as superspeed at powerup
On Thu, Aug 23, 2012 at 12:53 AM, wrote: > From: Manoj Iyer > > On Intel Panther Point chipset USB 3.0 devices show up as > high-speed devices on powerup, but after an s3 cycle they are > correctly recognized as SuperSpeed. At powerup switch the port > to xHCI so that USB 3.0 devices are correctly recognized. > > BugLink: http://bugs.launchpad.net/bugs/1000424 > > Signed-off-by: Manoj Iyer This one looks OK to me. Thanks, Andiry > --- > drivers/usb/host/pci-quirks.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c > index c5e9e4a..486e812 100644 > --- a/drivers/usb/host/pci-quirks.c > +++ b/drivers/usb/host/pci-quirks.c > @@ -870,9 +870,10 @@ static void __devinit quirk_usb_handoff_xhci(struct > pci_dev *pdev) > /* Disable any BIOS SMIs and clear all SMI events*/ > writel(val, base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET); > > +hc_init: > if (usb_is_intel_switchable_xhci(pdev)) > usb_enable_xhci_ports(pdev); > -hc_init: > + > op_reg_base = base + XHCI_HC_LENGTH(readl(base)); > > /* Wait for the host controller to be ready before writing any > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: Recognize USB 3.0 devices as superspeed at powerup
On Thu, Aug 23, 2012 at 12:53 AM, manoj.i...@canonical.com wrote: From: Manoj Iyer manoj.i...@canonical.com On Intel Panther Point chipset USB 3.0 devices show up as high-speed devices on powerup, but after an s3 cycle they are correctly recognized as SuperSpeed. At powerup switch the port to xHCI so that USB 3.0 devices are correctly recognized. BugLink: http://bugs.launchpad.net/bugs/1000424 Signed-off-by: Manoj Iyer manoj.i...@canonical.com This one looks OK to me. Thanks, Andiry --- drivers/usb/host/pci-quirks.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c index c5e9e4a..486e812 100644 --- a/drivers/usb/host/pci-quirks.c +++ b/drivers/usb/host/pci-quirks.c @@ -870,9 +870,10 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev) /* Disable any BIOS SMIs and clear all SMI events*/ writel(val, base + ext_cap_offset + XHCI_LEGACY_CONTROL_OFFSET); +hc_init: if (usb_is_intel_switchable_xhci(pdev)) usb_enable_xhci_ports(pdev); -hc_init: + op_reg_base = base + XHCI_HC_LENGTH(readl(base)); /* Wait for the host controller to be ready before writing any -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: Unconditionally switch ports to xHCI on powerup
On Wed, Aug 22, 2012 at 12:16 AM, Manoj Iyer wrote: > > Looks like in pci-quirks.c, we enter the do() while() loop, reach the end of > extended capabilities and goto hc_init: label, skipping the switch. Probably > moving the switch under the hc_init label might work? Currently we switch > unconditionally on resume, so we could do the same at powerup as well. > If this is a must-to-do thing for Intel Panther Point platform, then we need to make sure it's called on power up and resume. Yes, I think moving the code below hc_init label should work and I think it's a better solution than your original patch. Thanks, Andiry > > On Tue, 21 Aug 2012, Andiry Xu wrote: > >> On Tue, Aug 21, 2012 at 12:06 PM, wrote: >>> >>> From: Manoj Iyer >>> >>> USB 3.0 devices show up as high-speed devices on powerup, after an >>> s3 cycle they are correctly recognized as SuperSpeed. At powerup >>> unconditionally switch the port to xHCI like we do when we resume >>> from suspend. >>> >>> BugLink: http://bugs.launchpad.net/bugs/1000424 >>> >>> Signed-off-by: Manoj Iyer >>> --- >>> drivers/usb/host/xhci-pci.c |8 >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c >>> index 9bfd4ca11..5c8dbea 100644 >>> --- a/drivers/usb/host/xhci-pci.c >>> +++ b/drivers/usb/host/xhci-pci.c >>> @@ -48,6 +48,14 @@ static int xhci_pci_reinit(struct xhci_hcd *xhci, >>> struct pci_dev *pdev) >>> if (!pci_set_mwi(pdev)) >>> xhci_dbg(xhci, "MWI active\n"); >>> >>> + /* >>> +* USB SuperSpeed ports are recognized as HighSpeed ports on >>> powerup >>> +* unconditionally switch the ports to xHCI like we do when >>> resume >>> +* from suspend. >>> +*/ >>> + if (usb_is_intel_switchable_xhci(pdev)) >>> + usb_enable_xhci_ports(pdev); >>> + >> >> >> Strange. This should have been called during system power up, in >> quirk_usb_handoff_xhci() of pci_quirks.c. Do you see that routine get >> called during power up? >> >> Thanks, >> Andiry >> >>> xhci_dbg(xhci, "Finished xhci_pci_reinit\n"); >>> return 0; >>> } >>> -- >>> 1.7.9.5 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > > -- > > Manoj Iyer > Ubuntu/Canonical > Hardware Enablement > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: Unconditionally switch ports to xHCI on powerup
On Tue, Aug 21, 2012 at 12:06 PM, wrote: > From: Manoj Iyer > > USB 3.0 devices show up as high-speed devices on powerup, after an > s3 cycle they are correctly recognized as SuperSpeed. At powerup > unconditionally switch the port to xHCI like we do when we resume > from suspend. > > BugLink: http://bugs.launchpad.net/bugs/1000424 > > Signed-off-by: Manoj Iyer > --- > drivers/usb/host/xhci-pci.c |8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c > index 9bfd4ca11..5c8dbea 100644 > --- a/drivers/usb/host/xhci-pci.c > +++ b/drivers/usb/host/xhci-pci.c > @@ -48,6 +48,14 @@ static int xhci_pci_reinit(struct xhci_hcd *xhci, struct > pci_dev *pdev) > if (!pci_set_mwi(pdev)) > xhci_dbg(xhci, "MWI active\n"); > > + /* > +* USB SuperSpeed ports are recognized as HighSpeed ports on powerup > +* unconditionally switch the ports to xHCI like we do when resume > +* from suspend. > +*/ > + if (usb_is_intel_switchable_xhci(pdev)) > + usb_enable_xhci_ports(pdev); > + Strange. This should have been called during system power up, in quirk_usb_handoff_xhci() of pci_quirks.c. Do you see that routine get called during power up? Thanks, Andiry > xhci_dbg(xhci, "Finished xhci_pci_reinit\n"); > return 0; > } > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-usb" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: Unconditionally switch ports to xHCI on powerup
On Wed, Aug 22, 2012 at 12:16 AM, Manoj Iyer manoj.i...@canonical.com wrote: Looks like in pci-quirks.c, we enter the do() while() loop, reach the end of extended capabilities and goto hc_init: label, skipping the switch. Probably moving the switch under the hc_init label might work? Currently we switch unconditionally on resume, so we could do the same at powerup as well. If this is a must-to-do thing for Intel Panther Point platform, then we need to make sure it's called on power up and resume. Yes, I think moving the code below hc_init label should work and I think it's a better solution than your original patch. Thanks, Andiry On Tue, 21 Aug 2012, Andiry Xu wrote: On Tue, Aug 21, 2012 at 12:06 PM, manoj.i...@canonical.com wrote: From: Manoj Iyer manoj.i...@canonical.com USB 3.0 devices show up as high-speed devices on powerup, after an s3 cycle they are correctly recognized as SuperSpeed. At powerup unconditionally switch the port to xHCI like we do when we resume from suspend. BugLink: http://bugs.launchpad.net/bugs/1000424 Signed-off-by: Manoj Iyer manoj.i...@canonical.com --- drivers/usb/host/xhci-pci.c |8 1 file changed, 8 insertions(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 9bfd4ca11..5c8dbea 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -48,6 +48,14 @@ static int xhci_pci_reinit(struct xhci_hcd *xhci, struct pci_dev *pdev) if (!pci_set_mwi(pdev)) xhci_dbg(xhci, MWI active\n); + /* +* USB SuperSpeed ports are recognized as HighSpeed ports on powerup +* unconditionally switch the ports to xHCI like we do when resume +* from suspend. +*/ + if (usb_is_intel_switchable_xhci(pdev)) + usb_enable_xhci_ports(pdev); + Strange. This should have been called during system power up, in quirk_usb_handoff_xhci() of pci_quirks.c. Do you see that routine get called during power up? Thanks, Andiry xhci_dbg(xhci, Finished xhci_pci_reinit\n); return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Manoj Iyer Ubuntu/Canonical Hardware Enablement -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] xhci: Unconditionally switch ports to xHCI on powerup
On Tue, Aug 21, 2012 at 12:06 PM, manoj.i...@canonical.com wrote: From: Manoj Iyer manoj.i...@canonical.com USB 3.0 devices show up as high-speed devices on powerup, after an s3 cycle they are correctly recognized as SuperSpeed. At powerup unconditionally switch the port to xHCI like we do when we resume from suspend. BugLink: http://bugs.launchpad.net/bugs/1000424 Signed-off-by: Manoj Iyer manoj.i...@canonical.com --- drivers/usb/host/xhci-pci.c |8 1 file changed, 8 insertions(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 9bfd4ca11..5c8dbea 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -48,6 +48,14 @@ static int xhci_pci_reinit(struct xhci_hcd *xhci, struct pci_dev *pdev) if (!pci_set_mwi(pdev)) xhci_dbg(xhci, MWI active\n); + /* +* USB SuperSpeed ports are recognized as HighSpeed ports on powerup +* unconditionally switch the ports to xHCI like we do when resume +* from suspend. +*/ + if (usb_is_intel_switchable_xhci(pdev)) + usb_enable_xhci_ports(pdev); + Strange. This should have been called during system power up, in quirk_usb_handoff_xhci() of pci_quirks.c. Do you see that routine get called during power up? Thanks, Andiry xhci_dbg(xhci, Finished xhci_pci_reinit\n); return 0; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/