Re: [PATCH 0/4] powernv: kvm: numa fault improvement
Sorry to update lately. It takes a long time to apply for test machine and then, I hit a series of other bugs which I could not resolve easily. And for now, I have some high priority task, and will come back to this topic when time is available. Besides this, I had do some basic test for numa-fault and no numa-fault test for HV guest, it shows that 10% drop in performance when numa-fault is on. (Test with $pg_access_random 60 4 200, and guest has 10GB mlocked pages ). I thought this is caused based on the following factors: cache-miss, tlb-miss, guest-host exit and hw-thread cooperate to exit from guest state. Hope my patches to be helpful to reduce the cost of guest-host exit and hw-thread cooperate to exit. My test case launches 4 threads on guest( as 4 hw-threads ), and each of them has random access to PAGE_ALIGN area. Hope from some suggestion about the test case, so when I had time, I could improve and finish the test. Thanks, Fan --- test case: usage: pg_random_access secs fork_num mem_size--- #include ctype.h #include errno.h #include libgen.h #include math.h #include stdio.h #include stdlib.h #include string.h #include signal.h #include time.h #include unistd.h #include sys/wait.h #include sys/types.h #include sys/stat.h #include fcntl.h #include sys/mman.h #include sys/timerfd.h #include time.h #include stdint.h/* Definition of uint64_t */ #include poll.h /* */ #define CMD_STOP 0x1234 #define SHM_FNAME /numafault_shm #define PAGE_SIZE (112) /* the protocol defined on the shm */ #define SHM_CMD_OFF 0x0 #define SHM_CNT_OFF 0x1 #define SHM_MESSAGE_OFF 0x2 #define handle_error(msg) \ do { perror(msg); exit(EXIT_FAILURE); } while (0) void __inline__ random_access(void *region_start, int len) { int *p; int num; num = random(); num = ~(PAGE_SIZE - 1); num = (len - 1); p = region_start + num; *p = 0x654321; } static int numafault_body(int size_MB) { /* since MB is always align on PAGE_SIZE, so it is ok to test fault on page */ int size = size_MB*1024*1024; void *region_start = malloc(size); unsigned long *pmap; int shm_fid; unsigned long cnt = 0; pid_t pid = getpid(); char *dst; char buf[128]; shm_fid = shm_open(SHM_FNAME, O_RDWR, S_IRUSR | S_IWUSR); ftruncate(shm_fid, 2*sizeof(long)); pmap = mmap(NULL, 2*sizeof(long), PROT_WRITE | PROT_READ, MAP_SHARED, shm_fid, 0); if (!pmap) { printf(child fail to setup mmap of shm\n); return -1; } while (*(pmap+SHM_CMD_OFF) != CMD_STOP){ random_access(region_start, size); cnt++; } __atomic_fetch_add((pmap+SHM_CNT_OFF), cnt, __ATOMIC_SEQ_CST); dst = (char *)(pmap+SHM_MESSAGE_OFF); //tofix, need lock sprintf(buf, child [%i] cnt=%u\n\0, pid, cnt); strcat(dst, buf); munmap(pmap, 2*sizeof(long)); shm_unlink(SHM_FNAME); fprintf(stdout, [%s] cnt=%lu\n, pid, cnt); fflush(stdout); exit(0); } int main(int argc, char **argv) { int i; pid_t pid; int shm_fid; unsigned long *pmap; int fork_num; int size; char *dst_info; struct itimerspec new_value; int fd; struct timespec now; uint64_t exp, tot_exp; ssize_t s; struct pollfd pfd; int elapsed; if (argc != 4){ fprintf(stderr, %s wait-secs [secs elapsed before parent asks the children to exit]\n \ fork-num [child num]\n \ size [memory region covered by each child in MB]\n, argv[0]); exit(EXIT_FAILURE); } elapsed = atoi(argv[1]); fork_num = atoi(argv[2]); size = atoi(argv[3]); printf(fork %i child process to test mem %i MB for a period: %i sec\n, fork_num, size, elapsed); fd = timerfd_create(CLOCK_REALTIME, 0); if (fd == -1) handle_error(timerfd_create); shm_fid = shm_open(SHM_FNAME, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); ftruncate(shm_fid, PAGE_SIZE); pmap = mmap(NULL, PAGE_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED, shm_fid, 0); if (!pmap) { printf(fail to setup mmap of shm\n); return -1; } memset(pmap, 0, 2*sizeof(long)); //wmb(); for (i = 0; i fork_num; i++){ switch (pid = fork()) { case 0:/* child */ numafault_body(size); exit(0); case -1: /* error */ err (stderr, fork failed: %s\n, strerror (errno)); break; default: /* parent */ printf(fork child [%i]\n, pid);
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On Wed, Jan 22, 2014 at 1:18 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Paul Mackerras pau...@samba.org writes: On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote: On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. Paul, could you please give this some thought and maybe benchmark it? OK, once I get Aneesh to tell me how I get to have ptes with _PAGE_NUMA set in the first place. :) I guess we want patch 2, Which Liu has sent separately and I have reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619 I am not sure about the rest of the patches in the series. We definitely don't want to numa migrate on henter. We may want to do that on fault. But even there, IMHO, we should let the host take the fault and do the numa migration instead of doing this in guest context. My patch does NOT do the numa migration in guest context( h_enter). Instead it just do a pre-check to see whether the numa migration is needed. If needed, the host will take the fault and do the numa migration as it currently does. Otherwise, h_enter can directly setup hpte without HPTE_V_ABSENT. And since pte_mknuma() is called system-wide periodly, so it has more possibility that guest will suffer from HPTE_V_ABSENT.(as my previous reply, I think we should also place the quick check in kvmppc_hpte_hv_fault ) Thx, Fan -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On Tue, Jan 21, 2014 at 11:40 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Liu ping fan kernelf...@gmail.com writes: On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Liu ping fan kernelf...@gmail.com writes: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) Can you explain more. Are we looking at hcall from guest and hypervisor handling them in real mode ? If so why would guest issue a hcall on a pte entry that have PAGE_NUMA set. Or is this about hypervisor handling a missing hpte, because of host swapping this page out ? In that case how we end up in h_enter ? IIUC for that case we should get to kvmppc_hpte_hv_fault. After setting _PAGE_NUMA, we should flush out all hptes both in host's htab and guest's. So when guest tries to access memory, host finds that there is not hpte ready for guest in guest's htab. And host should raise dsi to guest. Now guest receive that fault, removes the PAGE_NUMA bit and do an hpte_insert. So before we do an hpte_insert (or H_ENTER) we should have cleared PAGE_NUMA bit. This incurs that guest ends up in h_enter. And you can see in current code, we also try this quick path firstly. Only if fail, we will resort to slow path -- kvmppc_hpte_hv_fault. hmm ? hpte_hv_fault is the hypervisor handling the fault. After we discuss in irc. I think we should also do the fast check in kvmppc_hpte_hv_fault() for the case of HPTE_V_ABSENT, and let H_ENTER take care of the rest case i.e. no hpte when pte_mknuma. Right? Thanks and regards, Fan -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On Tue, Jan 21, 2014 at 5:07 PM, Liu ping fan kernelf...@gmail.com wrote: On Tue, Jan 21, 2014 at 11:40 AM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Liu ping fan kernelf...@gmail.com writes: On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Liu ping fan kernelf...@gmail.com writes: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) Can you explain more. Are we looking at hcall from guest and hypervisor handling them in real mode ? If so why would guest issue a hcall on a pte entry that have PAGE_NUMA set. Or is this about hypervisor handling a missing hpte, because of host swapping this page out ? In that case how we end up in h_enter ? IIUC for that case we should get to kvmppc_hpte_hv_fault. After setting _PAGE_NUMA, we should flush out all hptes both in host's htab and guest's. So when guest tries to access memory, host finds that there is not hpte ready for guest in guest's htab. And host should raise dsi to guest. Now guest receive that fault, removes the PAGE_NUMA bit and do an hpte_insert. So before we do an hpte_insert (or H_ENTER) we should have cleared PAGE_NUMA bit. This incurs that guest ends up in h_enter. And you can see in current code, we also try this quick path firstly. Only if fail, we will resort to slow path -- kvmppc_hpte_hv_fault. hmm ? hpte_hv_fault is the hypervisor handling the fault. After we discuss in irc. I think we should also do the fast check in kvmppc_hpte_hv_fault() for the case of HPTE_V_ABSENT, and let H_ENTER take care of the rest case i.e. no hpte when pte_mknuma. Right? Or we can delay the quick fix in H_ENTER, and let the host fault again, so do the fix in kvmppc_hpte_hv_fault() Thanks and regards, Fan -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote: On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. Paul, could you please give this some thought and maybe benchmark it? OK, once I get Aneesh to tell me how I get to have ptes with _PAGE_NUMA set in the first place. :) Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
Paul Mackerras pau...@samba.org writes: On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote: On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. Paul, could you please give this some thought and maybe benchmark it? OK, once I get Aneesh to tell me how I get to have ptes with _PAGE_NUMA set in the first place. :) I guess we want patch 2, Which Liu has sent separately and I have reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619 I am not sure about the rest of the patches in the series. We definitely don't want to numa migrate on henter. We may want to do that on fault. But even there, IMHO, we should let the host take the fault and do the numa migration instead of doing this in guest context. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. Paul, could you please give this some thought and maybe benchmark it? Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
Liu ping fan kernelf...@gmail.com writes: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) Can you explain more. Are we looking at hcall from guest and hypervisor handling them in real mode ? If so why would guest issue a hcall on a pte entry that have PAGE_NUMA set. Or is this about hypervisor handling a missing hpte, because of host swapping this page out ? In that case how we end up in h_enter ? IIUC for that case we should get to kvmppc_hpte_hv_fault. If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Liu ping fan kernelf...@gmail.com writes: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) Can you explain more. Are we looking at hcall from guest and hypervisor handling them in real mode ? If so why would guest issue a hcall on a pte entry that have PAGE_NUMA set. Or is this about hypervisor handling a missing hpte, because of host swapping this page out ? In that case how we end up in h_enter ? IIUC for that case we should get to kvmppc_hpte_hv_fault. After setting _PAGE_NUMA, we should flush out all hptes both in host's htab and guest's. So when guest tries to access memory, host finds that there is not hpte ready for guest in guest's htab. And host should raise dsi to guest. This incurs that guest ends up in h_enter. And you can see in current code, we also try this quick path firstly. Only if fail, we will resort to slow path -- kvmppc_hpte_hv_fault. Thanks and regards, Fan If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
Liu ping fan kernelf...@gmail.com writes: On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com wrote: Liu ping fan kernelf...@gmail.com writes: On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) Can you explain more. Are we looking at hcall from guest and hypervisor handling them in real mode ? If so why would guest issue a hcall on a pte entry that have PAGE_NUMA set. Or is this about hypervisor handling a missing hpte, because of host swapping this page out ? In that case how we end up in h_enter ? IIUC for that case we should get to kvmppc_hpte_hv_fault. After setting _PAGE_NUMA, we should flush out all hptes both in host's htab and guest's. So when guest tries to access memory, host finds that there is not hpte ready for guest in guest's htab. And host should raise dsi to guest. Now guest receive that fault, removes the PAGE_NUMA bit and do an hpte_insert. So before we do an hpte_insert (or H_ENTER) we should have cleared PAGE_NUMA bit. This incurs that guest ends up in h_enter. And you can see in current code, we also try this quick path firstly. Only if fail, we will resort to slow path -- kvmppc_hpte_hv_fault. hmm ? hpte_hv_fault is the hypervisor handling the fault. -aneesh ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] powernv: kvm: numa fault improvement
On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote: On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote: This series is based on Aneesh's series [PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64 For this series, I apply the same idea from the previous thread [PATCH 0/3] optimize for powerpc _PAGE_NUMA (for which, I still try to get a machine to show nums) But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host, which is well known. This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it. Sorry for the unclear message. After introducing the _PAGE_NUMA, kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it should rely on host's kvmppc_book3s_hv_page_fault() to call do_numa_page() to do the numa fault check. This incurs the overhead when exiting from rmode to vmode. My idea is that in kvmppc_do_h_enter(), we do a quick check, if the page is right placed, there is no need to exit to vmode (i.e saving htab, slab switching) If my suppose is correct, will CCing k...@vger.kernel.org from next version. This translates to me as This is an RFC? Yes, I am not quite sure about it. I have no bare-metal to verify it. So I hope at least, from the theory, it is correct. Thanks and regards, Ping Fan Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev