Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-02-25 Thread Liu ping fan
Sorry to update lately. It takes a long time to apply for test machine
and then, I hit a series of other bugs which I could not resolve
easily. And for now, I have some high priority task, and will come
back to this topic when time is available.
Besides this, I had do some basic test for numa-fault and no
numa-fault test for HV guest, it shows that 10% drop in performance
when  numa-fault is on. (Test with $pg_access_random 60 4 200, and
guest has 10GB mlocked pages ).
I thought this is caused based on the following factors: cache-miss,
tlb-miss, guest-host exit and hw-thread cooperate to exit from guest
state.  Hope my patches to be helpful to reduce the cost of
guest-host exit and hw-thread cooperate to exit.

My test case launches 4 threads on guest( as 4 hw-threads ), and each
of them has random access to PAGE_ALIGN area.
Hope from some suggestion about the test case, so when I had time, I
could improve and finish the test.

Thanks,
Fan

--- test case: usage: pg_random_access  secs  fork_num  mem_size---
#include ctype.h
#include errno.h
#include libgen.h
#include math.h
#include stdio.h
#include stdlib.h
#include string.h
#include signal.h
#include time.h
#include unistd.h
#include sys/wait.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include sys/mman.h
#include sys/timerfd.h
#include time.h
#include stdint.h/* Definition of uint64_t */
#include poll.h


/* */
#define CMD_STOP 0x1234
#define SHM_FNAME /numafault_shm
#define PAGE_SIZE (112)

/* the protocol defined on the shm */
#define SHM_CMD_OFF 0x0
#define SHM_CNT_OFF 0x1
#define SHM_MESSAGE_OFF 0x2

#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)


void __inline__ random_access(void *region_start, int len)
{
int *p;
int num;

num = random();
num = ~(PAGE_SIZE - 1);
num = (len - 1);
p = region_start + num;
*p = 0x654321;
}

static int numafault_body(int size_MB)
{
/* since MB is always align on PAGE_SIZE, so it is ok to test
fault on page */
int size = size_MB*1024*1024;
void *region_start = malloc(size);
unsigned long *pmap;
int shm_fid;
unsigned long cnt = 0;
pid_t pid = getpid();
char *dst;
char buf[128];

shm_fid = shm_open(SHM_FNAME, O_RDWR, S_IRUSR | S_IWUSR);
ftruncate(shm_fid, 2*sizeof(long));
pmap = mmap(NULL, 2*sizeof(long), PROT_WRITE | PROT_READ,
MAP_SHARED, shm_fid, 0);
if (!pmap) {
printf(child fail to setup mmap of shm\n);
return -1;
}

while (*(pmap+SHM_CMD_OFF) != CMD_STOP){
random_access(region_start, size);
cnt++;
}

__atomic_fetch_add((pmap+SHM_CNT_OFF), cnt, __ATOMIC_SEQ_CST);
dst = (char *)(pmap+SHM_MESSAGE_OFF);
//tofix, need lock
sprintf(buf, child [%i] cnt=%u\n\0, pid, cnt);
strcat(dst, buf);

munmap(pmap, 2*sizeof(long));
shm_unlink(SHM_FNAME);
fprintf(stdout, [%s] cnt=%lu\n, pid, cnt);
fflush(stdout);
exit(0);

}

int main(int argc, char **argv)
{
int i;
pid_t pid;
int shm_fid;
unsigned long *pmap;
int fork_num;
int size;
char *dst_info;

struct itimerspec new_value;
int fd;
struct timespec now;
uint64_t exp, tot_exp;
ssize_t s;
struct pollfd pfd;
int elapsed;

if (argc != 4){
fprintf(stderr, %s wait-secs [secs elapsed before parent
asks the children to exit]\n \
fork-num [child num]\n \
size [memory region covered by each child in MB]\n,
argv[0]);
exit(EXIT_FAILURE);
}
elapsed = atoi(argv[1]);
fork_num = atoi(argv[2]);
size = atoi(argv[3]);
printf(fork %i child process to test mem %i MB for a period: %i sec\n,
fork_num, size, elapsed);

fd = timerfd_create(CLOCK_REALTIME, 0);
if (fd == -1)
handle_error(timerfd_create);


shm_fid = shm_open(SHM_FNAME, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
ftruncate(shm_fid, PAGE_SIZE);
pmap = mmap(NULL, PAGE_SIZE, PROT_WRITE | PROT_READ,
MAP_SHARED, shm_fid, 0);
if (!pmap) {
printf(fail to setup mmap of shm\n);
return -1;
}
memset(pmap, 0, 2*sizeof(long));
//wmb();

for (i = 0; i  fork_num; i++){
switch (pid = fork())
{
case 0:/* child */
numafault_body(size);
exit(0);
case -1:   /* error */
err (stderr, fork failed: %s\n, strerror (errno));
break;
default:   /* parent */
printf(fork child [%i]\n, pid);
  

Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-22 Thread Liu ping fan
On Wed, Jan 22, 2014 at 1:18 PM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 Paul Mackerras pau...@samba.org writes:

 On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote:

 On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote:

  On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:
 
  On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:
 
  This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
  Numa faults support for ppc64
 
  For this series, I apply the same idea from the previous thread [PATCH 
  0/3] optimize for powerpc _PAGE_NUMA
  (for which, I still try to get a machine to show nums)
 
  But for this series, I think that I have a good justification -- the 
  fact of heavy cost when switching context between guest and host,
  which is  well known.
 
  This cover letter isn't really telling me anything. Please put a proper 
  description of what you're trying to achieve, why you're trying to 
  achieve what you're trying and convince your readers that it's a good 
  idea to do it the way you do it.
 
  Sorry for the unclear message. After introducing the _PAGE_NUMA,
  kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
  should rely on host's kvmppc_book3s_hv_page_fault() to call
  do_numa_page() to do the numa fault check. This incurs the overhead
  when exiting from rmode to vmode.  My idea is that in
  kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
  there is no need to exit to vmode (i.e saving htab, slab switching)
 
  If my suppose is correct, will CCing k...@vger.kernel.org from next 
  version.
 
  This translates to me as This is an RFC?
 
  Yes, I am not quite sure about it. I have no bare-metal to verify it.
  So I hope at least, from the theory, it is correct.

 Paul, could you please give this some thought and maybe benchmark it?

 OK, once I get Aneesh to tell me how I get to have ptes with
 _PAGE_NUMA set in the first place. :)


 I guess we want patch 2, Which Liu has sent separately and I have
 reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619
 I am not sure about the rest of the patches in the series.
 We definitely don't want to numa migrate on henter. We may want to do
 that on fault. But even there, IMHO, we should let the host take the
 fault and do the numa migration instead of doing this in guest context.

My patch does NOT do the numa migration in guest context( h_enter).
Instead it just do a pre-check to see whether the numa migration is
needed. If needed, the host will take the fault and do the numa
migration as it currently does. Otherwise, h_enter can directly setup
hpte without HPTE_V_ABSENT.
And since pte_mknuma() is called system-wide periodly, so it has more
possibility that guest will suffer from HPTE_V_ABSENT.(as my previous
reply, I think we should also place the quick check in
kvmppc_hpte_hv_fault )

Thx,
Fan

 -aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-21 Thread Liu ping fan
On Tue, Jan 21, 2014 at 11:40 AM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 Liu ping fan kernelf...@gmail.com writes:

 On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
 Liu ping fan kernelf...@gmail.com writes:

 On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:

 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:

 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
 Numa faults support for ppc64

 For this series, I apply the same idea from the previous thread [PATCH 
 0/3] optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)

 But for this series, I think that I have a good justification -- the 
 fact of heavy cost when switching context between guest and host,
 which is  well known.

 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to 
 achieve what you're trying and convince your readers that it's a good 
 idea to do it the way you do it.

 Sorry for the unclear message. After introducing the _PAGE_NUMA,
 kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
 should rely on host's kvmppc_book3s_hv_page_fault() to call
 do_numa_page() to do the numa fault check. This incurs the overhead
 when exiting from rmode to vmode.  My idea is that in
 kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
 there is no need to exit to vmode (i.e saving htab, slab switching)

 Can you explain more. Are we looking at hcall from guest  and
 hypervisor handling them in real mode ? If so why would guest issue a
 hcall on a pte entry that have PAGE_NUMA set. Or is this about
 hypervisor handling a missing hpte, because of host swapping this page
 out ? In that case how we end up in h_enter ? IIUC for that case we
 should get to kvmppc_hpte_hv_fault.

 After setting _PAGE_NUMA, we should flush out all hptes both in host's
 htab and guest's. So when guest tries to access memory, host finds
 that there is not hpte ready for guest in guest's htab. And host
 should raise dsi to guest.

 Now guest receive that fault, removes the PAGE_NUMA bit and do an
 hpte_insert. So before we do an hpte_insert (or H_ENTER) we should have
 cleared PAGE_NUMA bit.

This incurs that guest ends up in h_enter.
 And you can see in current code, we also try this quick path firstly.
 Only if fail, we will resort to slow path --  kvmppc_hpte_hv_fault.

 hmm ? hpte_hv_fault is the hypervisor handling the fault.

After we discuss in irc. I think we should also do the fast check in
kvmppc_hpte_hv_fault() for the case of HPTE_V_ABSENT,
and let H_ENTER take care of the rest case i.e. no hpte when pte_mknuma. Right?

Thanks and regards,
Fan
 -aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-21 Thread Liu ping fan
On Tue, Jan 21, 2014 at 5:07 PM, Liu ping fan kernelf...@gmail.com wrote:
 On Tue, Jan 21, 2014 at 11:40 AM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
 Liu ping fan kernelf...@gmail.com writes:

 On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
 Liu ping fan kernelf...@gmail.com writes:

 On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:

 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:

 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
 Numa faults support for ppc64

 For this series, I apply the same idea from the previous thread [PATCH 
 0/3] optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)

 But for this series, I think that I have a good justification -- the 
 fact of heavy cost when switching context between guest and host,
 which is  well known.

 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to 
 achieve what you're trying and convince your readers that it's a good 
 idea to do it the way you do it.

 Sorry for the unclear message. After introducing the _PAGE_NUMA,
 kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
 should rely on host's kvmppc_book3s_hv_page_fault() to call
 do_numa_page() to do the numa fault check. This incurs the overhead
 when exiting from rmode to vmode.  My idea is that in
 kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
 there is no need to exit to vmode (i.e saving htab, slab switching)

 Can you explain more. Are we looking at hcall from guest  and
 hypervisor handling them in real mode ? If so why would guest issue a
 hcall on a pte entry that have PAGE_NUMA set. Or is this about
 hypervisor handling a missing hpte, because of host swapping this page
 out ? In that case how we end up in h_enter ? IIUC for that case we
 should get to kvmppc_hpte_hv_fault.

 After setting _PAGE_NUMA, we should flush out all hptes both in host's
 htab and guest's. So when guest tries to access memory, host finds
 that there is not hpte ready for guest in guest's htab. And host
 should raise dsi to guest.

 Now guest receive that fault, removes the PAGE_NUMA bit and do an
 hpte_insert. So before we do an hpte_insert (or H_ENTER) we should have
 cleared PAGE_NUMA bit.

This incurs that guest ends up in h_enter.
 And you can see in current code, we also try this quick path firstly.
 Only if fail, we will resort to slow path --  kvmppc_hpte_hv_fault.

 hmm ? hpte_hv_fault is the hypervisor handling the fault.

 After we discuss in irc. I think we should also do the fast check in
 kvmppc_hpte_hv_fault() for the case of HPTE_V_ABSENT,
 and let H_ENTER take care of the rest case i.e. no hpte when pte_mknuma. 
 Right?

Or we can delay the quick fix in H_ENTER, and let the host fault
again, so do the fix in kvmppc_hpte_hv_fault()

 Thanks and regards,
 Fan
 -aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-21 Thread Paul Mackerras
On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote:
 
 On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote:
 
  On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:
  
  On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:
  
  This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
  Numa faults support for ppc64
  
  For this series, I apply the same idea from the previous thread [PATCH 
  0/3] optimize for powerpc _PAGE_NUMA
  (for which, I still try to get a machine to show nums)
  
  But for this series, I think that I have a good justification -- the fact 
  of heavy cost when switching context between guest and host,
  which is  well known.
  
  This cover letter isn't really telling me anything. Please put a proper 
  description of what you're trying to achieve, why you're trying to achieve 
  what you're trying and convince your readers that it's a good idea to do 
  it the way you do it.
  
  Sorry for the unclear message. After introducing the _PAGE_NUMA,
  kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
  should rely on host's kvmppc_book3s_hv_page_fault() to call
  do_numa_page() to do the numa fault check. This incurs the overhead
  when exiting from rmode to vmode.  My idea is that in
  kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
  there is no need to exit to vmode (i.e saving htab, slab switching)
  
  If my suppose is correct, will CCing k...@vger.kernel.org from next 
  version.
  
  This translates to me as This is an RFC?
  
  Yes, I am not quite sure about it. I have no bare-metal to verify it.
  So I hope at least, from the theory, it is correct.
 
 Paul, could you please give this some thought and maybe benchmark it?

OK, once I get Aneesh to tell me how I get to have ptes with
_PAGE_NUMA set in the first place. :)

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-21 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Mon, Jan 20, 2014 at 03:48:36PM +0100, Alexander Graf wrote:
 
 On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote:
 
  On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:
  
  On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:
  
  This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
  Numa faults support for ppc64
  
  For this series, I apply the same idea from the previous thread [PATCH 
  0/3] optimize for powerpc _PAGE_NUMA
  (for which, I still try to get a machine to show nums)
  
  But for this series, I think that I have a good justification -- the 
  fact of heavy cost when switching context between guest and host,
  which is  well known.
  
  This cover letter isn't really telling me anything. Please put a proper 
  description of what you're trying to achieve, why you're trying to 
  achieve what you're trying and convince your readers that it's a good 
  idea to do it the way you do it.
  
  Sorry for the unclear message. After introducing the _PAGE_NUMA,
  kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
  should rely on host's kvmppc_book3s_hv_page_fault() to call
  do_numa_page() to do the numa fault check. This incurs the overhead
  when exiting from rmode to vmode.  My idea is that in
  kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
  there is no need to exit to vmode (i.e saving htab, slab switching)
  
  If my suppose is correct, will CCing k...@vger.kernel.org from next 
  version.
  
  This translates to me as This is an RFC?
  
  Yes, I am not quite sure about it. I have no bare-metal to verify it.
  So I hope at least, from the theory, it is correct.
 
 Paul, could you please give this some thought and maybe benchmark it?

 OK, once I get Aneesh to tell me how I get to have ptes with
 _PAGE_NUMA set in the first place. :)


I guess we want patch 2, Which Liu has sent separately and I have
reviewed. http://article.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/8619
I am not sure about the rest of the patches in the series.
We definitely don't want to numa migrate on henter. We may want to do
that on fault. But even there, IMHO, we should let the host take the
fault and do the numa migration instead of doing this in guest context.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-20 Thread Alexander Graf

On 15.01.2014, at 07:36, Liu ping fan kernelf...@gmail.com wrote:

 On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:
 
 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:
 
 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: Numa 
 faults support for ppc64
 
 For this series, I apply the same idea from the previous thread [PATCH 
 0/3] optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)
 
 But for this series, I think that I have a good justification -- the fact 
 of heavy cost when switching context between guest and host,
 which is  well known.
 
 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to achieve 
 what you're trying and convince your readers that it's a good idea to do it 
 the way you do it.
 
 Sorry for the unclear message. After introducing the _PAGE_NUMA,
 kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
 should rely on host's kvmppc_book3s_hv_page_fault() to call
 do_numa_page() to do the numa fault check. This incurs the overhead
 when exiting from rmode to vmode.  My idea is that in
 kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
 there is no need to exit to vmode (i.e saving htab, slab switching)
 
 If my suppose is correct, will CCing k...@vger.kernel.org from next version.
 
 This translates to me as This is an RFC?
 
 Yes, I am not quite sure about it. I have no bare-metal to verify it.
 So I hope at least, from the theory, it is correct.

Paul, could you please give this some thought and maybe benchmark it?


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-20 Thread Aneesh Kumar K.V
Liu ping fan kernelf...@gmail.com writes:

 On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:

 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:

 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: Numa 
 faults support for ppc64

 For this series, I apply the same idea from the previous thread [PATCH 
 0/3] optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)

 But for this series, I think that I have a good justification -- the fact 
 of heavy cost when switching context between guest and host,
 which is  well known.

 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to achieve 
 what you're trying and convince your readers that it's a good idea to do it 
 the way you do it.

 Sorry for the unclear message. After introducing the _PAGE_NUMA,
 kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
 should rely on host's kvmppc_book3s_hv_page_fault() to call
 do_numa_page() to do the numa fault check. This incurs the overhead
 when exiting from rmode to vmode.  My idea is that in
 kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
 there is no need to exit to vmode (i.e saving htab, slab switching)

Can you explain more. Are we looking at hcall from guest  and
hypervisor handling them in real mode ? If so why would guest issue a
hcall on a pte entry that have PAGE_NUMA set. Or is this about
hypervisor handling a missing hpte, because of host swapping this page
out ? In that case how we end up in h_enter ? IIUC for that case we
should get to kvmppc_hpte_hv_fault. 



 If my suppose is correct, will CCing k...@vger.kernel.org from next version.

 This translates to me as This is an RFC?

 Yes, I am not quite sure about it. I have no bare-metal to verify it.
 So I hope at least, from the theory, it is correct.


-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-20 Thread Liu ping fan
On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V
aneesh.ku...@linux.vnet.ibm.com wrote:
 Liu ping fan kernelf...@gmail.com writes:

 On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:

 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:

 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
 Numa faults support for ppc64

 For this series, I apply the same idea from the previous thread [PATCH 
 0/3] optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)

 But for this series, I think that I have a good justification -- the fact 
 of heavy cost when switching context between guest and host,
 which is  well known.

 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to achieve 
 what you're trying and convince your readers that it's a good idea to do it 
 the way you do it.

 Sorry for the unclear message. After introducing the _PAGE_NUMA,
 kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
 should rely on host's kvmppc_book3s_hv_page_fault() to call
 do_numa_page() to do the numa fault check. This incurs the overhead
 when exiting from rmode to vmode.  My idea is that in
 kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
 there is no need to exit to vmode (i.e saving htab, slab switching)

 Can you explain more. Are we looking at hcall from guest  and
 hypervisor handling them in real mode ? If so why would guest issue a
 hcall on a pte entry that have PAGE_NUMA set. Or is this about
 hypervisor handling a missing hpte, because of host swapping this page
 out ? In that case how we end up in h_enter ? IIUC for that case we
 should get to kvmppc_hpte_hv_fault.

After setting _PAGE_NUMA, we should flush out all hptes both in host's
htab and guest's. So when guest tries to access memory, host finds
that there is not hpte ready for guest in guest's htab. And host
should raise dsi to guest. This incurs that guest ends up in h_enter.
And you can see in current code, we also try this quick path firstly.
Only if fail, we will resort to slow path --  kvmppc_hpte_hv_fault.

Thanks and regards,
Fan


 If my suppose is correct, will CCing k...@vger.kernel.org from next 
 version.

 This translates to me as This is an RFC?

 Yes, I am not quite sure about it. I have no bare-metal to verify it.
 So I hope at least, from the theory, it is correct.


 -aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-20 Thread Aneesh Kumar K.V
Liu ping fan kernelf...@gmail.com writes:

 On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V
 aneesh.ku...@linux.vnet.ibm.com wrote:
 Liu ping fan kernelf...@gmail.com writes:

 On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:

 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:

 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: 
 Numa faults support for ppc64

 For this series, I apply the same idea from the previous thread [PATCH 
 0/3] optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)

 But for this series, I think that I have a good justification -- the fact 
 of heavy cost when switching context between guest and host,
 which is  well known.

 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to achieve 
 what you're trying and convince your readers that it's a good idea to do 
 it the way you do it.

 Sorry for the unclear message. After introducing the _PAGE_NUMA,
 kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
 should rely on host's kvmppc_book3s_hv_page_fault() to call
 do_numa_page() to do the numa fault check. This incurs the overhead
 when exiting from rmode to vmode.  My idea is that in
 kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
 there is no need to exit to vmode (i.e saving htab, slab switching)

 Can you explain more. Are we looking at hcall from guest  and
 hypervisor handling them in real mode ? If so why would guest issue a
 hcall on a pte entry that have PAGE_NUMA set. Or is this about
 hypervisor handling a missing hpte, because of host swapping this page
 out ? In that case how we end up in h_enter ? IIUC for that case we
 should get to kvmppc_hpte_hv_fault.

 After setting _PAGE_NUMA, we should flush out all hptes both in host's
 htab and guest's. So when guest tries to access memory, host finds
 that there is not hpte ready for guest in guest's htab. And host
 should raise dsi to guest.

Now guest receive that fault, removes the PAGE_NUMA bit and do an
hpte_insert. So before we do an hpte_insert (or H_ENTER) we should have
cleared PAGE_NUMA bit.

This incurs that guest ends up in h_enter.
 And you can see in current code, we also try this quick path firstly.
 Only if fail, we will resort to slow path --  kvmppc_hpte_hv_fault.

hmm ? hpte_hv_fault is the hypervisor handling the fault.

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/4] powernv: kvm: numa fault improvement

2014-01-14 Thread Liu ping fan
On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf ag...@suse.de wrote:

 On 11.12.2013, at 09:47, Liu Ping Fan kernelf...@gmail.com wrote:

 This series is based on Aneesh's series  [PATCH -V2 0/5] powerpc: mm: Numa 
 faults support for ppc64

 For this series, I apply the same idea from the previous thread [PATCH 0/3] 
 optimize for powerpc _PAGE_NUMA
 (for which, I still try to get a machine to show nums)

 But for this series, I think that I have a good justification -- the fact of 
 heavy cost when switching context between guest and host,
 which is  well known.

 This cover letter isn't really telling me anything. Please put a proper 
 description of what you're trying to achieve, why you're trying to achieve 
 what you're trying and convince your readers that it's a good idea to do it 
 the way you do it.

Sorry for the unclear message. After introducing the _PAGE_NUMA,
kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
should rely on host's kvmppc_book3s_hv_page_fault() to call
do_numa_page() to do the numa fault check. This incurs the overhead
when exiting from rmode to vmode.  My idea is that in
kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
there is no need to exit to vmode (i.e saving htab, slab switching)

 If my suppose is correct, will CCing k...@vger.kernel.org from next version.

 This translates to me as This is an RFC?

Yes, I am not quite sure about it. I have no bare-metal to verify it.
So I hope at least, from the theory, it is correct.

Thanks and regards,
Ping Fan

 Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev