Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
16/02/2017 08:22, Chao Zhu: > Thomas, > > We have several different internal fixes and didn't get a conclusion. Let me > summarize them and give a final patch sets. > Thanks for your reminder! This patch is now classified as rejected.
Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
Thomas, We have several different internal fixes and didn't get a conclusion. Let me summarize them and give a final patch sets. Thanks for your reminder! -Original Message- From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] Sent: 2017年2月15日 16:52 To: Sergio Gonzalez Monroy ; Chao Zhu ; 'Gowrishankar' Cc: dev@dpdk.org; 'Bruce Richardson' ; 'David Marchand' Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order There was no follow-up on this discussion. Please, what is the conclusion? 2016-05-20 11:25, Sergio Gonzalez Monroy: > On 20/05/2016 09:41, Chao Zhu wrote: > > Sergio, > > > > The step 4 will not fail because each huge page will get an virtual address > > finally, though it's a different address. If you take a look at the > > function rte_eal_hugepage_init(), in the last loop, it uses both physical > > address and virtual address to determine a new memory segment. This step > > can make sure that the initialization is correct. What I want to say is, > > this bug also influence the secondary process in function > > rte_eal_hugepage_attach(). It can make the secondary process fail to init. > > I'm trying to figure out how to make it work. > > You are right, I misread the code. > > So basically because mmap ignores the hint to mmap on the requested > address, by default we get VA maps in decreasing address order. > > Knowing that, PPC orders pages by decreasing physical address order so > when this happens we actually get hugepages in order in the "new" final_va. > > Not sure if that makes sense but I think I understand where you are > coming from. > > I think we need to document this as know issue and/or add comments > regarding this behavior , basically calling out that all this > "reverse-ordering" > is required > because mmap fails to map on the requested VA. > > Thanks, > Sergio > > > -Original Message- > > From: Sergio Gonzalez Monroy > > [mailto:sergio.gonzalez.mon...@intel.com] > > Sent: 2016年5月20日 16:01 > > To: Chao Zhu ; 'Bruce Richardson' > > > > Cc: 'Gowrishankar' ; > > dev@dpdk.org; 'David Marchand' > > Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to > > map hugepages in correct order > > > > On 20/05/2016 04:03, Chao Zhu wrote: > >> Bruce, > >> > >> Recently, we find some bugs with mmap in PowerLinux. The mmap > >> doesn't respect the address hints. In function get_virtual_area() > >> in eal_memory.c, mmap get the free virtual address range as the > >> address hint. However, when mapping the real memory in > >> rte_eal_hugepage_init(), mmap doesn't return the same address as > >> the requested address. When taking a look at the /proc//maps, > >> the requested address range is free for use. With this bug, pre-allocate > >> some free space doesn't work. > > Hi Chao, > > > > If I understand you correctly, the issue you are describing would cause > > DPDK to fail initialization even with the reverse reordering that you are > > doing for PPC. > > > > Basically (just showing relevant initialization steps): > > 1. map_all_hugepages(..., orig=1) > > - map all hugepages > > 2. find physical address for each hugepage 3. sort by physical address 4. > > map_all_hugepages(..., orig=0) > > - Now we try to get big chunk of virtual address for a block of > > contig hugepages > > so we know we have that virtual address chunk available. > > - Then we try to remap each page of that block of contig pages into > > that > > virtual address chunk. > > > > So the issue you are describing would make step 4 fail regardless of the > > different ordering that PPC does. > > I'm probably missing something, would you care to elaborate? > > > > Sergio > > > > > >> We're trying to create some test case and report it as a bug to > >> kernel community. > >> > >> Here's some logs: > >> === > >> EAL: Ask a virtual area of 0x1000 bytes > >> EAL: Virtual area found at 0x3fffa700 (size = 0x1000) > >> EAL: map_all_hugepages, /mnt/huge/rtemap_52,paddr 0x3ca600 > >> requested > >> addr: 0x3fffa700 mmaped addr: 0x3efff000 > >> EAL: map_all_hugepages, /mnt/huge/rtemap_53,paddr 0x3ca500 > >> requested > >> addr: 0x3fffa800 mmaped addr:
Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
There was no follow-up on this discussion. Please, what is the conclusion? 2016-05-20 11:25, Sergio Gonzalez Monroy: > On 20/05/2016 09:41, Chao Zhu wrote: > > Sergio, > > > > The step 4 will not fail because each huge page will get an virtual address > > finally, though it's a different address. If you take a look at the > > function rte_eal_hugepage_init(), in the last loop, it uses both physical > > address and virtual address to determine a new memory segment. This step > > can make sure that the initialization is correct. What I want to say is, > > this bug also influence the secondary process in function > > rte_eal_hugepage_attach(). It can make the secondary process fail to init. > > I'm trying to figure out how to make it work. > > You are right, I misread the code. > > So basically because mmap ignores the hint to mmap on the requested address, > by default we get VA maps in decreasing address order. > > Knowing that, PPC orders pages by decreasing physical address order so when > this happens we actually get hugepages in order in the "new" final_va. > > Not sure if that makes sense but I think I understand where you are > coming from. > > I think we need to document this as know issue and/or add comments regarding > this behavior , basically calling out that all this "reverse-ordering" > is required > because mmap fails to map on the requested VA. > > Thanks, > Sergio > > > -Original Message- > > From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.mon...@intel.com] > > Sent: 2016年5月20日 16:01 > > To: Chao Zhu ; 'Bruce Richardson' > > > > Cc: 'Gowrishankar' ; dev@dpdk.org; > > 'David Marchand' > > Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map > > hugepages in correct order > > > > On 20/05/2016 04:03, Chao Zhu wrote: > >> Bruce, > >> > >> Recently, we find some bugs with mmap in PowerLinux. The mmap doesn't > >> respect the address hints. In function get_virtual_area() in > >> eal_memory.c, mmap get the free virtual address range as the address > >> hint. However, when mapping the real memory in > >> rte_eal_hugepage_init(), mmap doesn't return the same address as the > >> requested address. When taking a look at the /proc//maps, the > >> requested address range is free for use. With this bug, pre-allocate some > >> free space doesn't work. > > Hi Chao, > > > > If I understand you correctly, the issue you are describing would cause > > DPDK to fail initialization even with the reverse reordering that you are > > doing for PPC. > > > > Basically (just showing relevant initialization steps): > > 1. map_all_hugepages(..., orig=1) > > - map all hugepages > > 2. find physical address for each hugepage 3. sort by physical address 4. > > map_all_hugepages(..., orig=0) > > - Now we try to get big chunk of virtual address for a block of > > contig hugepages > > so we know we have that virtual address chunk available. > > - Then we try to remap each page of that block of contig pages into > > that > > virtual address chunk. > > > > So the issue you are describing would make step 4 fail regardless of the > > different ordering that PPC does. > > I'm probably missing something, would you care to elaborate? > > > > Sergio > > > > > >> We're trying to create some test case and report it as a bug to kernel > >> community. > >> > >> Here's some logs: > >> === > >> EAL: Ask a virtual area of 0x1000 bytes > >> EAL: Virtual area found at 0x3fffa700 (size = 0x1000) > >> EAL: map_all_hugepages, /mnt/huge/rtemap_52,paddr 0x3ca600 > >> requested > >> addr: 0x3fffa700 mmaped addr: 0x3efff000 > >> EAL: map_all_hugepages, /mnt/huge/rtemap_53,paddr 0x3ca500 > >> requested > >> addr: 0x3fffa800 mmaped addr: 0x3effef00 > >> EAL: map_all_hugepages, /mnt/huge/rtemap_54,paddr 0x3ca400 > >> requested > >> addr: 0x3fffa900 mmaped addr: 0x3effee00 > >> EAL: map_all_hugepages, /mnt/huge/rtemap_55,paddr 0x3ca300 > >> requested > >> addr: 0x3fffaa00 mmaped addr: 0x3effed00 > >> EAL: map_all_hugepages, /mnt/huge/rtemap_56,paddr 0x3ca200 > >> requested > >> addr: 0x3fffab00 mmaped addr: 0x3effec00 > >> EAL: map_all_h
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
Sergio, The step 4 will not fail because each huge page will get an virtual address finally, though it's a different address. If you take a look at the function rte_eal_hugepage_init(), in the last loop, it uses both physical address and virtual address to determine a new memory segment. This step can make sure that the initialization is correct. What I want to say is, this bug also influence the secondary process in function rte_eal_hugepage_attach(). It can make the secondary process fail to init. I'm trying to figure out how to make it work. -Original Message- From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.mon...@intel.com] Sent: 2016?5?20? 16:01 To: Chao Zhu ; 'Bruce Richardson' Cc: 'Gowrishankar' ; dev at dpdk.org; 'David Marchand' Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order On 20/05/2016 04:03, Chao Zhu wrote: > Bruce, > > Recently, we find some bugs with mmap in PowerLinux. The mmap doesn't > respect the address hints. In function get_virtual_area() in > eal_memory.c, mmap get the free virtual address range as the address > hint. However, when mapping the real memory in > rte_eal_hugepage_init(), mmap doesn't return the same address as the > requested address. When taking a look at the /proc//maps, the > requested address range is free for use. With this bug, pre-allocate some > free space doesn't work. Hi Chao, If I understand you correctly, the issue you are describing would cause DPDK to fail initialization even with the reverse reordering that you are doing for PPC. Basically (just showing relevant initialization steps): 1. map_all_hugepages(..., orig=1) - map all hugepages 2. find physical address for each hugepage 3. sort by physical address 4. map_all_hugepages(..., orig=0) - Now we try to get big chunk of virtual address for a block of contig hugepages so we know we have that virtual address chunk available. - Then we try to remap each page of that block of contig pages into that virtual address chunk. So the issue you are describing would make step 4 fail regardless of the different ordering that PPC does. I'm probably missing something, would you care to elaborate? Sergio > We're trying to create some test case and report it as a bug to kernel > community. > > Here's some logs: > === > EAL: Ask a virtual area of 0x1000 bytes > EAL: Virtual area found at 0x3fffa700 (size = 0x1000) > EAL: map_all_hugepages, /mnt/huge/rtemap_52,paddr 0x3ca600 > requested > addr: 0x3fffa700 mmaped addr: 0x3efff000 > EAL: map_all_hugepages, /mnt/huge/rtemap_53,paddr 0x3ca500 > requested > addr: 0x3fffa800 mmaped addr: 0x3effef00 > EAL: map_all_hugepages, /mnt/huge/rtemap_54,paddr 0x3ca400 > requested > addr: 0x3fffa900 mmaped addr: 0x3effee00 > EAL: map_all_hugepages, /mnt/huge/rtemap_55,paddr 0x3ca300 > requested > addr: 0x3fffaa00 mmaped addr: 0x3effed00 > EAL: map_all_hugepages, /mnt/huge/rtemap_56,paddr 0x3ca200 > requested > addr: 0x3fffab00 mmaped addr: 0x3effec00 > EAL: map_all_hugepages, /mnt/huge/rtemap_57,paddr 0x3ca100 > requested > addr: 0x3fffac00 mmaped addr: 0x3effeb00 > EAL: map_all_hugepages, /mnt/huge/rtemap_58,paddr 0x3ca000 > requested > addr: 0x3fffad00 mmaped addr: 0x3effea00 > EAL: map_all_hugepages, /mnt/huge/rtemap_59,paddr 0x3c9f00 > requested > addr: 0x3fffae00 mmaped addr: 0x3effe900 > EAL: map_all_hugepages, /mnt/huge/rtemap_60,paddr 0x3c9e00 > requested > addr: 0x3fffaf00 mmaped addr: 0x3effe800 > EAL: map_all_hugepages, /mnt/huge/rtemap_61,paddr 0x3c9d00 > requested > addr: 0x3fffb000 mmaped addr: 0x3effe700 > EAL: map_all_hugepages, /mnt/huge/rtemap_62, paddr 0x3c9c00 > requested > addr: 0x3fffb100 mmaped addr: 0x3effe600 > EAL: map_all_hugepages, /mnt/huge/rtemap_63, paddr 0x3c9b00 > requested > addr: 0x3fffb200 mmaped addr: 0x3effe500 > EAL: map_all_hugepages, /mnt/huge/rtemap_51, paddr 0x3c9a00 > requested > addr: 0x3fffb300 mmaped addr: 0x3effe400 > EAL: map_all_hugepages, /mnt/huge/rtemap_50, paddr 0x3c9900 > requested > addr: 0x3fffb400 mmaped addr: 0x3effe300 > EAL: map_all_hugepages, /mnt/huge/rtemap_49, paddr 0x3c9800 > requested > addr: 0x3fffb500 mmaped addr: 0x3effe200 > EAL: map_all_hugepages, /mnt/huge/rtemap_48, paddr 0x3c9700 > requested > addr: 0x3fffb600 mmaped addr: 0x3effe100 > > # cat /proc/143765/maps > 0100-0200 rw-s 00:27 61162550 > /mnt/huge/rtemap_14 &g
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
On 20/05/2016 09:41, Chao Zhu wrote: > Sergio, > > The step 4 will not fail because each huge page will get an virtual address > finally, though it's a different address. If you take a look at the function > rte_eal_hugepage_init(), in the last loop, it uses both physical address and > virtual address to determine a new memory segment. This step can make sure > that the initialization is correct. What I want to say is, this bug also > influence the secondary process in function rte_eal_hugepage_attach(). It can > make the secondary process fail to init. I'm trying to figure out how to make > it work. You are right, I misread the code. So basically because mmap ignores the hint to mmap on the requested address, by default we get VA maps in decreasing address order. Knowing that, PPC orders pages by decreasing physical address order so when this happens we actually get hugepages in order in the "new" final_va. Not sure if that makes sense but I think I understand where you are coming from. I think we need to document this as know issue and/or add comments regarding this behavior , basically calling out that all this "reverse-ordering" is required because mmap fails to map on the requested VA. Thanks, Sergio > -Original Message- > From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy at intel.com] > Sent: 2016?5?20? 16:01 > To: Chao Zhu ; 'Bruce Richardson' > > Cc: 'Gowrishankar' ; dev at dpdk.org; > 'David Marchand' > Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map > hugepages in correct order > > On 20/05/2016 04:03, Chao Zhu wrote: >> Bruce, >> >> Recently, we find some bugs with mmap in PowerLinux. The mmap doesn't >> respect the address hints. In function get_virtual_area() in >> eal_memory.c, mmap get the free virtual address range as the address >> hint. However, when mapping the real memory in >> rte_eal_hugepage_init(), mmap doesn't return the same address as the >> requested address. When taking a look at the /proc//maps, the >> requested address range is free for use. With this bug, pre-allocate some >> free space doesn't work. > Hi Chao, > > If I understand you correctly, the issue you are describing would cause DPDK > to fail initialization even with the reverse reordering that you are doing > for PPC. > > Basically (just showing relevant initialization steps): > 1. map_all_hugepages(..., orig=1) > - map all hugepages > 2. find physical address for each hugepage 3. sort by physical address 4. > map_all_hugepages(..., orig=0) > - Now we try to get big chunk of virtual address for a block of contig > hugepages > so we know we have that virtual address chunk available. > - Then we try to remap each page of that block of contig pages into that > virtual address chunk. > > So the issue you are describing would make step 4 fail regardless of the > different ordering that PPC does. > I'm probably missing something, would you care to elaborate? > > Sergio > > >> We're trying to create some test case and report it as a bug to kernel >> community. >> >> Here's some logs: >> === >> EAL: Ask a virtual area of 0x1000 bytes >> EAL: Virtual area found at 0x3fffa700 (size = 0x1000) >> EAL: map_all_hugepages, /mnt/huge/rtemap_52,paddr 0x3ca600 >> requested >> addr: 0x3fffa700 mmaped addr: 0x3efff000 >> EAL: map_all_hugepages, /mnt/huge/rtemap_53,paddr 0x3ca500 >> requested >> addr: 0x3fffa800 mmaped addr: 0x3effef00 >> EAL: map_all_hugepages, /mnt/huge/rtemap_54,paddr 0x3ca400 >> requested >> addr: 0x3fffa900 mmaped addr: 0x3effee00 >> EAL: map_all_hugepages, /mnt/huge/rtemap_55,paddr 0x3ca300 >> requested >> addr: 0x3fffaa00 mmaped addr: 0x3effed00 >> EAL: map_all_hugepages, /mnt/huge/rtemap_56,paddr 0x3ca200 >> requested >> addr: 0x3fffab00 mmaped addr: 0x3effec00 >> EAL: map_all_hugepages, /mnt/huge/rtemap_57,paddr 0x3ca100 >> requested >> addr: 0x3fffac00 mmaped addr: 0x3effeb00 >> EAL: map_all_hugepages, /mnt/huge/rtemap_58,paddr 0x3ca000 >> requested >> addr: 0x3fffad00 mmaped addr: 0x3effea00 >> EAL: map_all_hugepages, /mnt/huge/rtemap_59,paddr 0x3c9f00 >> requested >> addr: 0x3fffae00 mmaped addr: 0x3effe900 >> EAL: map_all_hugepages, /mnt/huge/rtemap_60,paddr 0x3c9e00 >> requested >> addr: 0x3fffaf00 mmaped addr: 0x3effe800 >> EAL: map_all_hugepages, /mnt/huge/r
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
s 00:27 61162586 /mnt/huge/rtemap_50 3effe400-3effe500 rw-s 00:27 61162587 /mnt/huge/rtemap_51 3effe500-3effe600 rw-s 00:27 61162599 /mnt/huge/rtemap_63 3effe600-3effe700 rw-s 00:27 61162598 /mnt/huge/rtemap_62 3effe700-3effe800 rw-s 00:27 61162597 /mnt/huge/rtemap_61 3effe800-3effe900 rw-s 00:27 61162596 /mnt/huge/rtemap_60 3effe900-3effea00 rw-s 00:27 61162595 /mnt/huge/rtemap_59 3effea00-3effeb00 rw-s 00:27 61162594 /mnt/huge/rtemap_58 3effeb00-3effec00 rw-s 00:27 61162593 /mnt/huge/rtemap_57 3effec00-3effed00 rw-s 00:27 61162592 /mnt/huge/rtemap_56 3effed00-3effee00 rw-s 00:27 61162591 /mnt/huge/rtemap_55 3effee00-3effef00 rw-s 00:27 61162590 /mnt/huge/rtemap_54 3effef00-3efff000 rw-s 00:27 61162589 /mnt/huge/rtemap_53 3efff000-3efff100 rw-s 00:27 61162588 /mnt/huge/rtemap_52 3efff100-3efff200 rw-s 00:27 61162565 /mnt/huge/rtemap_29 3efff200-3efff300 rw-s 00:27 61162564 /mnt/huge/rtemap_28 3efff300-3efff400 rw-s 00:27 61162563 /mnt/huge/rtemap_27 3efff400-3efff500 rw-s 00:27 61162562 /mnt/huge/rtemap_26 3efff500-3efff600 rw-s 00:27 61162561 /mnt/huge/rtemap_25 3efff600-3efff700 rw-s 00:27 61162560 /mnt/huge/rtemap_24 3efff700-3efff800 rw-s 00:27 61162559 /mnt/huge/rtemap_23 3efff800-3efff900 rw-s 00:27 61162558 /mnt/huge/rtemap_22 3efff900-3efffa00 rw-s 00:27 61162557 /mnt/huge/rtemap_21 3efffa00-3efffb00 rw-s 00:27 61162556 /mnt/huge/rtemap_20 3efffb00-3efffc00 rw-s 00:27 61162555 /mnt/huge/rtemap_19 3efffc00-3efffd00 rw-s 00:27 61162554 /mnt/huge/rtemap_18 3efffd00-3efffe00 rw-s 00:27 61162553 /mnt/huge/rtemap_17 3efffe00-3e00 rw-s 00:27 61162552 /mnt/huge/rtemap_16 3e00-3f00 rw-s 00:27 61162551 /mnt/huge/rtemap_15 3fffb7bc-3fffb7c1 rw-p 00:00 0 3fffb7c1-3fffb7c5 rw-s 00:12 3926240 /run/.rte_config 3fffb7c5-3fffb7c7 rw-p 00:00 0 3fffb7c7-3fffb7e2 r-xp 08:32 7090531 /opt/at7.1/lib64/power8/libc-2.19.so 3fffb7e2-3fffb7e3 rw-p 001a 08:32 7090531 /opt/at7.1/lib64/power8/libc-2.19.so 3fffb7e3-3fffb7e5 rw-p 00:00 0 3fffb7e5-3fffb7e7 r-xp 08:32 7090563 /opt/at7.1/lib64/power8/libpthread-2.19.so 3fffb7e7-3fffb7e8 rw-p 0001 08:32 7090563 /opt/at7.1/lib64/power8/libpthread-2.19.so 3fffb7e8-3fffb7e9 r-xp 08:32 7090210 /opt/at7.1/lib64/libdl-2.19.so 3fffb7e9-3fffb7ea rw-p 08:32 7090210 /opt/at7.1/lib64/libdl-2.19.so 3fffb7ea-3fffb7ec r-xp 08:32 7090533 /opt/at7.1/lib64/power8/libz.so.1.2.6 3fffb7ec-3fffb7ed rw-p 0001 08:32 7090533 /opt/at7.1/lib64/power8/libz.so.1.2.6 3fffb7ed-3fffb7f9 r-xp 08:32 7090568 /opt/at7.1/lib64/power8/libm-2.19.so 3fffb7f9-3fffb7fa rw-p 000b 08:32 7090568 /opt/at7.1/lib64/power8/libm-2.19.so 3fffb7fa-3fffb7fc r-xp 00:00 0 [vdso] 3fffb7fc-3fffb7ff r-xp 08:32 7090048 /opt/at7.1/lib64/ld-2.19.so 3fffb7ff-3fffb800 rw-p 0002 08:32 7090048 /opt/at7.1/lib64/ld-2.19.so 3ffd-4000 rw-p 00:00 0 [stack] -Original Message- From: Bruce Richardson [mailto:bruce.richard...@intel.com] Sent: 2016?3?23? 1:11 To: Sergio Gonzalez Monroy Cc: Gowrishankar ; dev at dpdk.org; chaozhu at linux.vnet.ibm.com; David Marchand Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order On Tue, Mar 22, 2016 at 04:35:32PM +, Sergio Gonzalez Monroy wrote: > First of all, forgive my ignorance regarding ppc64 and if the > questions are naive but after having a look to the already existing > code for ppc64 and this patch now, why are we doing this reverse > mapping at all? > > I guess the question revolves around the comment in eal_memory.c: > 1316 /* On PPC64 architecture, the mmap always start from > higher > 1317 * virtual address to lower address. Here, both the > physical > 1318 * address and virtual address are in descending order > */ > > From looking at the code, for ppc64 we do qsort in reverse order and > thereafter everything looks to be is done to account for that reverse > sorting. > > CC: Chao Zhu and David Marchand as original author and reviewer of the code. > > Sergio > Just to add my 2c here. At one point, with I believe some i686 installs - don't remember the specific OS/kernel, we found that the mmap calls were returning the highest free address fi
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
0 rw-p 00:00 0 > 3fffb7e5-3fffb7e7 r-xp 08:32 7090563 > /opt/at7.1/lib64/power8/libpthread-2.19.so > 3fffb7e7-3fffb7e8 rw-p 0001 08:32 7090563 > /opt/at7.1/lib64/power8/libpthread-2.19.so > 3fffb7e8-3fffb7e9 r-xp 08:32 7090210 > /opt/at7.1/lib64/libdl-2.19.so > 3fffb7e90000-3fffb7ea0000 rw-p 08:32 7090210 > /opt/at7.1/lib64/libdl-2.19.so > 3fffb7ea-3fffb7ec r-xp 08:32 7090533 > /opt/at7.1/lib64/power8/libz.so.1.2.6 > 3fffb7ec-3fffb7ed rw-p 0001 08:32 7090533 > /opt/at7.1/lib64/power8/libz.so.1.2.6 > 3fffb7ed-3fffb7f9 r-xp 08:32 7090568 > /opt/at7.1/lib64/power8/libm-2.19.so > 3fffb7f9-3fffb7fa rw-p 000b 08:32 7090568 > /opt/at7.1/lib64/power8/libm-2.19.so > 3fffb7fa-3fffb7fc r-xp 00:00 0 > [vdso] > 3fffb7fc-3fffb7ff r-xp 08:32 7090048 > /opt/at7.1/lib64/ld-2.19.so > 3fffb7ff-3fffb800 rw-p 0002 08:32 7090048 > /opt/at7.1/lib64/ld-2.19.so > 3ffd-4000 rw-p 00:00 0 > [stack] > > > -Original Message- > From: Bruce Richardson [mailto:bruce.richardson at intel.com] > Sent: 2016?3?23? 1:11 > To: Sergio Gonzalez Monroy > Cc: Gowrishankar ; dev at dpdk.org; > chaozhu at linux.vnet.ibm.com; David Marchand > Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map > hugepages in correct order > > On Tue, Mar 22, 2016 at 04:35:32PM +, Sergio Gonzalez Monroy wrote: >> First of all, forgive my ignorance regarding ppc64 and if the >> questions are naive but after having a look to the already existing >> code for ppc64 and this patch now, why are we doing this reverse >> mapping at all? >> >> I guess the question revolves around the comment in eal_memory.c: >> 1316 /* On PPC64 architecture, the mmap always start from >> higher >> 1317 * virtual address to lower address. Here, both the >> physical >> 1318 * address and virtual address are in descending > order >> */ >> >> From looking at the code, for ppc64 we do qsort in reverse order and >> thereafter everything looks to be is done to account for that reverse >> sorting. >> >> CC: Chao Zhu and David Marchand as original author and reviewer of the > code. >> Sergio >> > Just to add my 2c here. At one point, with I believe some i686 installs - > don't remember the specific OS/kernel, we found that the mmap calls were > returning the highest free address first and then working downwards - must > like seems to be described here. To fix this we changed the mmap code from > assuming that addresses are mapped upwards, to instead explicitly requesting > a large free block of memory (mmap of /dev/zero) to find a free address > space range of the correct size, and then explicitly mmapping each > individual page to the appropriate place in that free range. With this > scheme it didn't matter whether the OS tried to mmap the pages from the > highest or lowest address because we always told the OS where to put the > page (and we knew the slot was free from the earlier block mmap). > Would this scheme not also work for PPC in a similar way? (Again, forgive > unfamiliarity with PPC! :-) ) > > /Bruce > >> On 07/03/2016 14:13, Gowrishankar wrote: >>> From: Gowri Shankar >>> >>> For a secondary process address space to map hugepages from every >>> segment of primary process, hugepage_file entries has to be mapped >>> reversely from the list that primary process updated for every >>> segment. This is for a reason that, in ppc64, hugepages are sorted for > decrementing addresses. >>> Signed-off-by: Gowrishankar >>> ---
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
On Tue, Mar 22, 2016 at 04:35:32PM +, Sergio Gonzalez Monroy wrote: > First of all, forgive my ignorance regarding ppc64 and if the questions are > naive but after having a > look to the already existing code for ppc64 and this patch now, why are we > doing this reverse mapping at all? > > I guess the question revolves around the comment in eal_memory.c: > 1316 /* On PPC64 architecture, the mmap always start from > higher > 1317 * virtual address to lower address. Here, both the > physical > 1318 * address and virtual address are in descending order > */ > > From looking at the code, for ppc64 we do qsort in reverse order and > thereafter everything looks to be is > done to account for that reverse sorting. > > CC: Chao Zhu and David Marchand as original author and reviewer of the code. > > Sergio > Just to add my 2c here. At one point, with I believe some i686 installs - don't remember the specific OS/kernel, we found that the mmap calls were returning the highest free address first and then working downwards - must like seems to be described here. To fix this we changed the mmap code from assuming that addresses are mapped upwards, to instead explicitly requesting a large free block of memory (mmap of /dev/zero) to find a free address space range of the correct size, and then explicitly mmapping each individual page to the appropriate place in that free range. With this scheme it didn't matter whether the OS tried to mmap the pages from the highest or lowest address because we always told the OS where to put the page (and we knew the slot was free from the earlier block mmap). Would this scheme not also work for PPC in a similar way? (Again, forgive unfamiliarity with PPC! :-) ) /Bruce > > On 07/03/2016 14:13, Gowrishankar wrote: > >From: Gowri Shankar > > > >For a secondary process address space to map hugepages from every segment of > >primary process, hugepage_file entries has to be mapped reversely from the > >list that primary process updated for every segment. This is for a reason > >that, > >in ppc64, hugepages are sorted for decrementing addresses. > > > >Signed-off-by: Gowrishankar > >--- > > lib/librte_eal/linuxapp/eal/eal_memory.c | 26 -- > > 1 file changed, 16 insertions(+), 10 deletions(-) > > > >diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > >b/lib/librte_eal/linuxapp/eal/eal_memory.c > >index 5b9132c..6aea5d0 100644 > >--- a/lib/librte_eal/linuxapp/eal/eal_memory.c > >+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > >@@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void) > > { > > const struct rte_mem_config *mcfg = > > rte_eal_get_configuration()->mem_config; > > const struct hugepage_file *hp = NULL; > >-unsigned num_hp = 0; > >+unsigned num_hp = 0, mapped_hp = 0; > > unsigned i, s = 0; /* s used to track the segment number */ > > off_t size; > > int fd, fd_zero = -1, fd_hugepage = -1; > >@@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void) > > goto error; > > } > >-num_hp = size / sizeof(struct hugepage_file); > >-RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp); > >- > > s = 0; > > while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){ > > void *addr, *base_addr; > > uintptr_t offset = 0; > > size_t mapping_size; > >+unsigned int index; > > #ifdef RTE_LIBRTE_IVSHMEM > > /* > > * if segment has ioremap address set, it's an IVSHMEM segment > > and > >@@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void) > > continue; > > } > > #endif > >+num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz; > >+RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", > >num_hp, s); > > /* > > * free previously mapped memory so we can map the > > * hugepages into the space > >@@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void) > > /* find the hugepages for this segment and map them > > * we don't need to worry about order, as the server sorted the > > * entries before it did the second mmap of them */ > >+#ifdef RTE_ARCH_PPC_64 > >+for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; > >i--){ > >+#else > > for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){ > >-if (hp[i].memseg_id == (int)s){ > >-fd = open(hp[i].filepath, O_RDWR); > >+#endif > >+index = i + mapped_hp; > >+if (hp[index].memseg_id == (int)s){ > >+fd = open(hp[index].filepath, O_RDWR); > > if (fd < 0) { > > RTE_LOG(ERR, EAL, "Could not open %s\n", > >-hp[i].filepath); > >+
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
First of all, forgive my ignorance regarding ppc64 and if the questions are naive but after having a look to the already existing code for ppc64 and this patch now, why are we doing this reverse mapping at all? I guess the question revolves around the comment in eal_memory.c: 1316 /* On PPC64 architecture, the mmap always start from higher 1317 * virtual address to lower address. Here, both the physical 1318 * address and virtual address are in descending order */ From looking at the code, for ppc64 we do qsort in reverse order and thereafter everything looks to be is done to account for that reverse sorting. CC: Chao Zhu and David Marchand as original author and reviewer of the code. Sergio On 07/03/2016 14:13, Gowrishankar wrote: > From: Gowri Shankar > > For a secondary process address space to map hugepages from every segment of > primary process, hugepage_file entries has to be mapped reversely from the > list that primary process updated for every segment. This is for a reason > that, > in ppc64, hugepages are sorted for decrementing addresses. > > Signed-off-by: Gowrishankar > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 26 -- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 5b9132c..6aea5d0 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void) > { > const struct rte_mem_config *mcfg = > rte_eal_get_configuration()->mem_config; > const struct hugepage_file *hp = NULL; > - unsigned num_hp = 0; > + unsigned num_hp = 0, mapped_hp = 0; > unsigned i, s = 0; /* s used to track the segment number */ > off_t size; > int fd, fd_zero = -1, fd_hugepage = -1; > @@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void) > goto error; > } > > - num_hp = size / sizeof(struct hugepage_file); > - RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp); > - > s = 0; > while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){ > void *addr, *base_addr; > uintptr_t offset = 0; > size_t mapping_size; > + unsigned int index; > #ifdef RTE_LIBRTE_IVSHMEM > /* >* if segment has ioremap address set, it's an IVSHMEM segment > and > @@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void) > continue; > } > #endif > + num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz; > + RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", > num_hp, s); > /* >* free previously mapped memory so we can map the >* hugepages into the space > @@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void) > /* find the hugepages for this segment and map them >* we don't need to worry about order, as the server sorted the >* entries before it did the second mmap of them */ > +#ifdef RTE_ARCH_PPC_64 > + for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; > i--){ > +#else > for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){ > - if (hp[i].memseg_id == (int)s){ > - fd = open(hp[i].filepath, O_RDWR); > +#endif > + index = i + mapped_hp; > + if (hp[index].memseg_id == (int)s){ > + fd = open(hp[index].filepath, O_RDWR); > if (fd < 0) { > RTE_LOG(ERR, EAL, "Could not open %s\n", > - hp[i].filepath); > + hp[index].filepath); > goto error; > } > #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS > - mapping_size = hp[i].size * hp[i].repeated; > + mapping_size = hp[index].size * > hp[index].repeated; > #else > - mapping_size = hp[i].size; > + mapping_size = hp[index].size; > #endif > addr = mmap(RTE_PTR_ADD(base_addr, offset), > mapping_size, PROT_READ | > PROT_WRITE, > @@ -1534,7 +1539,7 @@ rte_eal_hugepage_attach(void) > if (addr == MAP_FAILED || > addr != RTE_PTR_ADD(base_addr, > offset)) { > RTE_LOG(ERR, EAL, "Could not mmap %s\n", > - hp[i].filepath); > + hp[index].filepath); >
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
Sergio, your help is required here. Thanks 2016-03-17 10:35, gowrishankar: > Could this patch be reviewed please. > > Thanks, > Gowrishankar > > On Monday 07 March 2016 07:43 PM, Gowrishankar wrote: > > From: Gowri Shankar > > > > For a secondary process address space to map hugepages from every segment of > > primary process, hugepage_file entries has to be mapped reversely from the > > list that primary process updated for every segment. This is for a reason > > that, > > in ppc64, hugepages are sorted for decrementing addresses. > > > > Signed-off-by: Gowrishankar > > --- > > lib/librte_eal/linuxapp/eal/eal_memory.c | 26 -- > > 1 file changed, 16 insertions(+), 10 deletions(-)
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
On 22/03/2016 11:36, Thomas Monjalon wrote: > Sergio, your help is required here. I missed it with the /ppc tag. I'll get to it. Sergio > Thanks > > 2016-03-17 10:35, gowrishankar: >> Could this patch be reviewed please. >> >> Thanks, >> Gowrishankar >> >> On Monday 07 March 2016 07:43 PM, Gowrishankar wrote: >>> From: Gowri Shankar >>> >>> For a secondary process address space to map hugepages from every segment of >>> primary process, hugepage_file entries has to be mapped reversely from the >>> list that primary process updated for every segment. This is for a reason >>> that, >>> in ppc64, hugepages are sorted for decrementing addresses. >>> >>> Signed-off-by: Gowrishankar >>> --- >>>lib/librte_eal/linuxapp/eal/eal_memory.c | 26 >>> -- >>>1 file changed, 16 insertions(+), 10 deletions(-) >
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
Could this patch be reviewed please. Thanks, Gowrishankar On Monday 07 March 2016 07:43 PM, Gowrishankar wrote: > From: Gowri Shankar > > For a secondary process address space to map hugepages from every segment of > primary process, hugepage_file entries has to be mapped reversely from the > list that primary process updated for every segment. This is for a reason > that, > in ppc64, hugepages are sorted for decrementing addresses. > > Signed-off-by: Gowrishankar > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 26 -- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 5b9132c..6aea5d0 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void) > { > const struct rte_mem_config *mcfg = > rte_eal_get_configuration()->mem_config; > const struct hugepage_file *hp = NULL; > - unsigned num_hp = 0; > + unsigned num_hp = 0, mapped_hp = 0; > unsigned i, s = 0; /* s used to track the segment number */ > off_t size; > int fd, fd_zero = -1, fd_hugepage = -1; > @@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void) > goto error; > } > > - num_hp = size / sizeof(struct hugepage_file); > - RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp); > - > s = 0; > while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){ > void *addr, *base_addr; > uintptr_t offset = 0; > size_t mapping_size; > + unsigned int index; > #ifdef RTE_LIBRTE_IVSHMEM > /* >* if segment has ioremap address set, it's an IVSHMEM segment > and > @@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void) > continue; > } > #endif > + num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz; > + RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", > num_hp, s); > /* >* free previously mapped memory so we can map the >* hugepages into the space > @@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void) > /* find the hugepages for this segment and map them >* we don't need to worry about order, as the server sorted the >* entries before it did the second mmap of them */ > +#ifdef RTE_ARCH_PPC_64 > + for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; > i--){ > +#else > for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){ > - if (hp[i].memseg_id == (int)s){ > - fd = open(hp[i].filepath, O_RDWR); > +#endif > + index = i + mapped_hp; > + if (hp[index].memseg_id == (int)s){ > + fd = open(hp[index].filepath, O_RDWR); > if (fd < 0) { > RTE_LOG(ERR, EAL, "Could not open %s\n", > - hp[i].filepath); > + hp[index].filepath); > goto error; > } > #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS > - mapping_size = hp[i].size * hp[i].repeated; > + mapping_size = hp[index].size * > hp[index].repeated; > #else > - mapping_size = hp[i].size; > + mapping_size = hp[index].size; > #endif > addr = mmap(RTE_PTR_ADD(base_addr, offset), > mapping_size, PROT_READ | > PROT_WRITE, > @@ -1534,7 +1539,7 @@ rte_eal_hugepage_attach(void) > if (addr == MAP_FAILED || > addr != RTE_PTR_ADD(base_addr, > offset)) { > RTE_LOG(ERR, EAL, "Could not mmap %s\n", > - hp[i].filepath); > + hp[index].filepath); > goto error; > } > offset+=mapping_size; > @@ -1543,6 +1548,7 @@ rte_eal_hugepage_attach(void) > RTE_LOG(DEBUG, EAL, "Mapped segment %u of size 0x%llx\n", s, > (unsigned long long)mcfg->memseg[s].len); > s++; > + mapped_hp += num_hp; > } > /* unmap the hugepage config file, since we are done using it */ > munmap((void *)(uintptr_t)hp, size);
[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order
From: Gowri Shankar For a secondary process address space to map hugepages from every segment of primary process, hugepage_file entries has to be mapped reversely from the list that primary process updated for every segment. This is for a reason that, in ppc64, hugepages are sorted for decrementing addresses. Signed-off-by: Gowrishankar --- lib/librte_eal/linuxapp/eal/eal_memory.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 5b9132c..6aea5d0 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void) { const struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; const struct hugepage_file *hp = NULL; - unsigned num_hp = 0; + unsigned num_hp = 0, mapped_hp = 0; unsigned i, s = 0; /* s used to track the segment number */ off_t size; int fd, fd_zero = -1, fd_hugepage = -1; @@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void) goto error; } - num_hp = size / sizeof(struct hugepage_file); - RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp); - s = 0; while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){ void *addr, *base_addr; uintptr_t offset = 0; size_t mapping_size; + unsigned int index; #ifdef RTE_LIBRTE_IVSHMEM /* * if segment has ioremap address set, it's an IVSHMEM segment and @@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void) continue; } #endif + num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz; + RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", num_hp, s); /* * free previously mapped memory so we can map the * hugepages into the space @@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void) /* find the hugepages for this segment and map them * we don't need to worry about order, as the server sorted the * entries before it did the second mmap of them */ +#ifdef RTE_ARCH_PPC_64 + for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; i--){ +#else for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){ - if (hp[i].memseg_id == (int)s){ - fd = open(hp[i].filepath, O_RDWR); +#endif + index = i + mapped_hp; + if (hp[index].memseg_id == (int)s){ + fd = open(hp[index].filepath, O_RDWR); if (fd < 0) { RTE_LOG(ERR, EAL, "Could not open %s\n", - hp[i].filepath); + hp[index].filepath); goto error; } #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS - mapping_size = hp[i].size * hp[i].repeated; + mapping_size = hp[index].size * hp[index].repeated; #else - mapping_size = hp[i].size; + mapping_size = hp[index].size; #endif addr = mmap(RTE_PTR_ADD(base_addr, offset), mapping_size, PROT_READ | PROT_WRITE, @@ -1534,7 +1539,7 @@ rte_eal_hugepage_attach(void) if (addr == MAP_FAILED || addr != RTE_PTR_ADD(base_addr, offset)) { RTE_LOG(ERR, EAL, "Could not mmap %s\n", - hp[i].filepath); + hp[index].filepath); goto error; } offset+=mapping_size; @@ -1543,6 +1548,7 @@ rte_eal_hugepage_attach(void) RTE_LOG(DEBUG, EAL, "Mapped segment %u of size 0x%llx\n", s, (unsigned long long)mcfg->memseg[s].len); s++; + mapped_hp += num_hp; } /* unmap the hugepage config file, since we are done using it */ munmap((void *)(uintptr_t)hp, size); -- 1.7.10.4