Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-05 Thread Zhang, Yanmin
On Fri, 2007-08-03 at 09:53 -0700, Nish Aravamudan wrote:
> On 8/3/07, Adam Litke <[EMAIL PROTECTED]> wrote:
> > On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
> > > On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
> > > > Hey... I am amazed at how quickly you came back with a patch for this :)
> > > > Thanks for looking at it.  Unfortunately there is one show-stopper and I
> > > > have some reservations (pun definitely intended) with your approach:
> > > Thanks for your great comments.
> >
> > Sorry for such a long delay in responding.  I have been pretty busy
> > lately.
> >
> > > > First, your patch does not pass the libhugetlbfs test
> > > > 'alloc-instantiate-race' which was written to tickle the the race which
> > > > the mutex was introduced to solve.  Your patch works for shared
> > > > mappings, but not for the private case.
> > > My testing about private might not be thorough. Function hugetlb_cow has 
> > > a race
> > > for multi-thread to fault on the same private page index. But after I 
> > > fixed it,
> > > alloc-instantiate-race still failed.
> > >
> > > I tried to google the source code tarball of libhugetlbfs test suite, but 
> > > couldn't
> > > find it. Would you like to send me a copy of the test source codes?
> >
> > http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz
> >
> > The tarball will contain a test called alloc-instantiate-race.  Make
> > sure to run it in private and shared mode.  Let me know what you find
> > out.
> 
> Actually, please use
> http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz.
> 1.2-pre1 had a build error that is fixed in the development snapshot.
Sorry for replying late. I got a fever last week.

The test case is very nice. I located the root cause.

In function hugetlb_no_page, if the thread couldn't get a quota/huge page while 
there
is no flight page, it will return VM_FAULT_SIGBUS or VM_FAULT_OOM. If the 
mapping
is private, there might be a narrow race for multi-thread fault on the same 
private
mapping index.

I added a checking "if (!pte_none(*ptep))" if the thread couldn't get a 
quota/huge page
while there is no flight page.

alloc-instantiate-race's both private and shared could pass now.

Thank all of you guys for the good pointer!

--Yanmin

The 3nd version of the patch-

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(>stat_lock);
sbinfo->max_blocks = config.nr_blocks;
sbinfo->free_blocks = config.nr_blocks;
+   sbinfo->flight_blocks = 0;
sbinfo->max_inodes = config.nr_inodes;
sbinfo->free_inodes = config.nr_inodes;
sb->s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   if (sbinfo->free_blocks > 0)
+   if (sbinfo->free_blocks > 0) {
sbinfo->free_blocks--;
+   sbinfo->flight_blocks ++;
+   } else if (sbinfo->flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(>stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   sbinfo->free_blocks++;
+   sbinfo->free_blocks ++;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->flight_blocks --;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->free_blocks ++;
+   sbinfo->flight_blocks --;
spin_unlock(>stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks free */
+   longflight_blocks;/* blocks allocated but still not be used */
longmax_inodes;   /* inodes allowed */
longfree_inodes;  /* inodes free */
spinlock_t  stat_lock;
@@ -166,6 +167,8 @@ extern struct vm_operations_struct huget
 struct file 

Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-05 Thread Zhang, Yanmin
On Fri, 2007-08-03 at 09:53 -0700, Nish Aravamudan wrote:
 On 8/3/07, Adam Litke [EMAIL PROTECTED] wrote:
  On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
   On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
Hey... I am amazed at how quickly you came back with a patch for this :)
Thanks for looking at it.  Unfortunately there is one show-stopper and I
have some reservations (pun definitely intended) with your approach:
   Thanks for your great comments.
 
  Sorry for such a long delay in responding.  I have been pretty busy
  lately.
 
First, your patch does not pass the libhugetlbfs test
'alloc-instantiate-race' which was written to tickle the the race which
the mutex was introduced to solve.  Your patch works for shared
mappings, but not for the private case.
   My testing about private might not be thorough. Function hugetlb_cow has 
   a race
   for multi-thread to fault on the same private page index. But after I 
   fixed it,
   alloc-instantiate-race still failed.
  
   I tried to google the source code tarball of libhugetlbfs test suite, but 
   couldn't
   find it. Would you like to send me a copy of the test source codes?
 
  http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz
 
  The tarball will contain a test called alloc-instantiate-race.  Make
  sure to run it in private and shared mode.  Let me know what you find
  out.
 
 Actually, please use
 http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz.
 1.2-pre1 had a build error that is fixed in the development snapshot.
Sorry for replying late. I got a fever last week.

The test case is very nice. I located the root cause.

In function hugetlb_no_page, if the thread couldn't get a quota/huge page while 
there
is no flight page, it will return VM_FAULT_SIGBUS or VM_FAULT_OOM. If the 
mapping
is private, there might be a narrow race for multi-thread fault on the same 
private
mapping index.

I added a checking if (!pte_none(*ptep)) if the thread couldn't get a 
quota/huge page
while there is no flight page.

alloc-instantiate-race's both private and shared could pass now.

Thank all of you guys for the good pointer!

--Yanmin

The 3nd version of the patch-

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(sbinfo-stat_lock);
sbinfo-max_blocks = config.nr_blocks;
sbinfo-free_blocks = config.nr_blocks;
+   sbinfo-flight_blocks = 0;
sbinfo-max_inodes = config.nr_inodes;
sbinfo-free_inodes = config.nr_inodes;
sb-s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo-free_blocks  -1) {
spin_lock(sbinfo-stat_lock);
-   if (sbinfo-free_blocks  0)
+   if (sbinfo-free_blocks  0) {
sbinfo-free_blocks--;
+   sbinfo-flight_blocks ++;
+   } else if (sbinfo-flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(sbinfo-stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo-free_blocks  -1) {
spin_lock(sbinfo-stat_lock);
-   sbinfo-free_blocks++;
+   sbinfo-free_blocks ++;
+   spin_unlock(sbinfo-stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb);
+
+   if (sbinfo-free_blocks  -1) {
+   spin_lock(sbinfo-stat_lock);
+   sbinfo-flight_blocks --;
+   spin_unlock(sbinfo-stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb);
+
+   if (sbinfo-free_blocks  -1) {
+   spin_lock(sbinfo-stat_lock);
+   sbinfo-free_blocks ++;
+   sbinfo-flight_blocks --;
spin_unlock(sbinfo-stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks free */
+   longflight_blocks;/* blocks allocated but still not be used */
longmax_inodes;   /* inodes allowed */
longfree_inodes;  /* inodes free */
spinlock_t  stat_lock;
@@ -166,6 +167,8 @@ extern struct vm_operations_struct huget
 struct file *hugetlb_file_setup(const char *name, size_t);
 int hugetlb_get_quota(struct 

Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-03 Thread Nish Aravamudan
On 8/3/07, Adam Litke <[EMAIL PROTECTED]> wrote:
> On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
> > On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
> > > Hey... I am amazed at how quickly you came back with a patch for this :)
> > > Thanks for looking at it.  Unfortunately there is one show-stopper and I
> > > have some reservations (pun definitely intended) with your approach:
> > Thanks for your great comments.
>
> Sorry for such a long delay in responding.  I have been pretty busy
> lately.
>
> > > First, your patch does not pass the libhugetlbfs test
> > > 'alloc-instantiate-race' which was written to tickle the the race which
> > > the mutex was introduced to solve.  Your patch works for shared
> > > mappings, but not for the private case.
> > My testing about private might not be thorough. Function hugetlb_cow has a 
> > race
> > for multi-thread to fault on the same private page index. But after I fixed 
> > it,
> > alloc-instantiate-race still failed.
> >
> > I tried to google the source code tarball of libhugetlbfs test suite, but 
> > couldn't
> > find it. Would you like to send me a copy of the test source codes?
>
> http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz
>
> The tarball will contain a test called alloc-instantiate-race.  Make
> sure to run it in private and shared mode.  Let me know what you find
> out.

Actually, please use
http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz.
1.2-pre1 had a build error that is fixed in the development snapshot.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-03 Thread Adam Litke
On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
> On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
> > Hey... I am amazed at how quickly you came back with a patch for this :)
> > Thanks for looking at it.  Unfortunately there is one show-stopper and I
> > have some reservations (pun definitely intended) with your approach:
> Thanks for your great comments.

Sorry for such a long delay in responding.  I have been pretty busy
lately.

> > First, your patch does not pass the libhugetlbfs test
> > 'alloc-instantiate-race' which was written to tickle the the race which
> > the mutex was introduced to solve.  Your patch works for shared
> > mappings, but not for the private case.
> My testing about private might not be thorough. Function hugetlb_cow has a 
> race
> for multi-thread to fault on the same private page index. But after I fixed 
> it,
> alloc-instantiate-race still failed.
> 
> I tried to google the source code tarball of libhugetlbfs test suite, but 
> couldn't
> find it. Would you like to send me a copy of the test source codes?

http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz

The tarball will contain a test called alloc-instantiate-race.  Make
sure to run it in private and shared mode.  Let me know what you find
out.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-03 Thread Nish Aravamudan
On 8/3/07, Adam Litke [EMAIL PROTECTED] wrote:
 On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
  On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
   Hey... I am amazed at how quickly you came back with a patch for this :)
   Thanks for looking at it.  Unfortunately there is one show-stopper and I
   have some reservations (pun definitely intended) with your approach:
  Thanks for your great comments.

 Sorry for such a long delay in responding.  I have been pretty busy
 lately.

   First, your patch does not pass the libhugetlbfs test
   'alloc-instantiate-race' which was written to tickle the the race which
   the mutex was introduced to solve.  Your patch works for shared
   mappings, but not for the private case.
  My testing about private might not be thorough. Function hugetlb_cow has a 
  race
  for multi-thread to fault on the same private page index. But after I fixed 
  it,
  alloc-instantiate-race still failed.
 
  I tried to google the source code tarball of libhugetlbfs test suite, but 
  couldn't
  find it. Would you like to send me a copy of the test source codes?

 http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz

 The tarball will contain a test called alloc-instantiate-race.  Make
 sure to run it in private and shared mode.  Let me know what you find
 out.

Actually, please use
http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz.
1.2-pre1 had a build error that is fixed in the development snapshot.

Thanks,
Nish
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-03 Thread Adam Litke
On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
 On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
  Hey... I am amazed at how quickly you came back with a patch for this :)
  Thanks for looking at it.  Unfortunately there is one show-stopper and I
  have some reservations (pun definitely intended) with your approach:
 Thanks for your great comments.

Sorry for such a long delay in responding.  I have been pretty busy
lately.

  First, your patch does not pass the libhugetlbfs test
  'alloc-instantiate-race' which was written to tickle the the race which
  the mutex was introduced to solve.  Your patch works for shared
  mappings, but not for the private case.
 My testing about private might not be thorough. Function hugetlb_cow has a 
 race
 for multi-thread to fault on the same private page index. But after I fixed 
 it,
 alloc-instantiate-race still failed.
 
 I tried to google the source code tarball of libhugetlbfs test suite, but 
 couldn't
 find it. Would you like to send me a copy of the test source codes?

http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz

The tarball will contain a test called alloc-instantiate-race.  Make
sure to run it in private and shared mode.  Let me know what you find
out.

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove hugetlb_instantiation_mutex

2007-07-30 Thread Zhang, Yanmin
On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
> Hey... I am amazed at how quickly you came back with a patch for this :)
> Thanks for looking at it.  Unfortunately there is one show-stopper and I
> have some reservations (pun definitely intended) with your approach:
Thanks for your great comments.

> 
> First, your patch does not pass the libhugetlbfs test
> 'alloc-instantiate-race' which was written to tickle the the race which
> the mutex was introduced to solve.  Your patch works for shared
> mappings, but not for the private case.
My testing about private might not be thorough. Function hugetlb_cow has a race
for multi-thread to fault on the same private page index. But after I fixed it,
alloc-instantiate-race still failed.

I tried to google the source code tarball of libhugetlbfs test suite, but 
couldn't
find it. Would you like to send me a copy of the test source codes?

> 
> Second, the introduction of another pair of global counters triggers my
> internal warning system... These global counters are known to cause
> problems with NUMA and cpusets.  Have you considered these interactions?
flight_blocks is a variable per superblock. As you know, quota is not used 
frequently,
so flight_blocks mostly isn't an issue of cache-miss.

flight_huge_pages might be a potential issue of cache miss with NUMA/cpusets. 

Currently, global variable free_huge_pages and resv_huge_pages have the same 
cache
issue with NUMA as well as the hugetlb_instantiation_mutex. I assume the new 
variable
flight_huge_pages shares the same cache line with hugetlb_lock, 
free_huge_pages, and
resv_huge_pages.

A possible improvement about it is to add a simple check without lock
before calling alloc_huge_page in function hugetlb_no_page. The
check could be:
if ( (!free_huge_pages && flight_huge_pages) ||
  (!(vma->vm_flags & VM_MAYSHARE) &&
   free_huge_pages <= resv_huge_pages &&
   flight_huge_pages) ) {
   hugetlb_rollback_quota(mapping);
   cond_resched();
   goto retry;
}

Then, it becomes read instead of write before changing the variables.

> Additionally, the commit/rollback logic you are using closely parallels
> what we already do with the huge page reservation mechanism.  Is there
> any way you could integrate your stuff into the reservation system to
> avoid all the duplicated logic?
I add a new parameter flight_delta to function hugetlb_acct_memory and use
hugetlb_acct_memory to replace both hugetlb_commit_page and 
hugetlb_rollback_page.
As for quota, it's hard to integrate it into current reservation mechanism.

Below is the new patch.

Yanmin

---

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(>stat_lock);
sbinfo->max_blocks = config.nr_blocks;
sbinfo->free_blocks = config.nr_blocks;
+   sbinfo->flight_blocks = 0;
sbinfo->max_inodes = config.nr_inodes;
sbinfo->free_inodes = config.nr_inodes;
sb->s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   if (sbinfo->free_blocks > 0)
+   if (sbinfo->free_blocks > 0) {
sbinfo->free_blocks--;
+   sbinfo->flight_blocks ++;
+   } else if (sbinfo->flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(>stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   sbinfo->free_blocks++;
+   sbinfo->free_blocks ++;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->flight_blocks --;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->free_blocks ++;
+   sbinfo->flight_blocks --;
spin_unlock(>stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks 

Re: [PATCH] remove hugetlb_instantiation_mutex

2007-07-30 Thread Zhang, Yanmin
On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
 Hey... I am amazed at how quickly you came back with a patch for this :)
 Thanks for looking at it.  Unfortunately there is one show-stopper and I
 have some reservations (pun definitely intended) with your approach:
Thanks for your great comments.

 
 First, your patch does not pass the libhugetlbfs test
 'alloc-instantiate-race' which was written to tickle the the race which
 the mutex was introduced to solve.  Your patch works for shared
 mappings, but not for the private case.
My testing about private might not be thorough. Function hugetlb_cow has a race
for multi-thread to fault on the same private page index. But after I fixed it,
alloc-instantiate-race still failed.

I tried to google the source code tarball of libhugetlbfs test suite, but 
couldn't
find it. Would you like to send me a copy of the test source codes?

 
 Second, the introduction of another pair of global counters triggers my
 internal warning system... These global counters are known to cause
 problems with NUMA and cpusets.  Have you considered these interactions?
flight_blocks is a variable per superblock. As you know, quota is not used 
frequently,
so flight_blocks mostly isn't an issue of cache-miss.

flight_huge_pages might be a potential issue of cache miss with NUMA/cpusets. 

Currently, global variable free_huge_pages and resv_huge_pages have the same 
cache
issue with NUMA as well as the hugetlb_instantiation_mutex. I assume the new 
variable
flight_huge_pages shares the same cache line with hugetlb_lock, 
free_huge_pages, and
resv_huge_pages.

A possible improvement about it is to add a simple check without lock
before calling alloc_huge_page in function hugetlb_no_page. The
check could be:
if ( (!free_huge_pages  flight_huge_pages) ||
  (!(vma-vm_flags  VM_MAYSHARE) 
   free_huge_pages = resv_huge_pages 
   flight_huge_pages) ) {
   hugetlb_rollback_quota(mapping);
   cond_resched();
   goto retry;
}

Then, it becomes read instead of write before changing the variables.

 Additionally, the commit/rollback logic you are using closely parallels
 what we already do with the huge page reservation mechanism.  Is there
 any way you could integrate your stuff into the reservation system to
 avoid all the duplicated logic?
I add a new parameter flight_delta to function hugetlb_acct_memory and use
hugetlb_acct_memory to replace both hugetlb_commit_page and 
hugetlb_rollback_page.
As for quota, it's hard to integrate it into current reservation mechanism.

Below is the new patch.

Yanmin

---

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(sbinfo-stat_lock);
sbinfo-max_blocks = config.nr_blocks;
sbinfo-free_blocks = config.nr_blocks;
+   sbinfo-flight_blocks = 0;
sbinfo-max_inodes = config.nr_inodes;
sbinfo-free_inodes = config.nr_inodes;
sb-s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo-free_blocks  -1) {
spin_lock(sbinfo-stat_lock);
-   if (sbinfo-free_blocks  0)
+   if (sbinfo-free_blocks  0) {
sbinfo-free_blocks--;
+   sbinfo-flight_blocks ++;
+   } else if (sbinfo-flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(sbinfo-stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo-free_blocks  -1) {
spin_lock(sbinfo-stat_lock);
-   sbinfo-free_blocks++;
+   sbinfo-free_blocks ++;
+   spin_unlock(sbinfo-stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb);
+
+   if (sbinfo-free_blocks  -1) {
+   spin_lock(sbinfo-stat_lock);
+   sbinfo-flight_blocks --;
+   spin_unlock(sbinfo-stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb);
+
+   if (sbinfo-free_blocks  -1) {
+   spin_lock(sbinfo-stat_lock);
+   sbinfo-free_blocks ++;
+   sbinfo-flight_blocks --;
spin_unlock(sbinfo-stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks 

Re: [PATCH] remove hugetlb_instantiation_mutex

2007-07-27 Thread Adam Litke
Hey... I am amazed at how quickly you came back with a patch for this :)
Thanks for looking at it.  Unfortunately there is one show-stopper and I
have some reservations (pun definitely intended) with your approach:

First, your patch does not pass the libhugetlbfs test
'alloc-instantiate-race' which was written to tickle the the race which
the mutex was introduced to solve.  Your patch works for shared
mappings, but not for the private case.

Second, the introduction of another pair of global counters triggers my
internal warning system... These global counters are known to cause
problems with NUMA and cpusets.  Have you considered these interactions?
Additionally, the commit/rollback logic you are using closely parallels
what we already do with the huge page reservation mechanism.  Is there
any way you could integrate your stuff into the reservation system to
avoid all the duplicated logic?

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] remove hugetlb_instantiation_mutex

2007-07-27 Thread Zhang, Yanmin
hugetlb_instantiation_mutex is a big lock in function hugetlb_fault. It 
serializes
all hugetlb page faults, no matter if the faults happen in the same address 
space,
and no matter if the faults are related to the same inode.

Why is there such a big lock? Huge pages are limited resources. When the free 
huge
pages are not many, for example only 1 is left, if many threads/processes 
trigger page
faults on the same logical huge page of the same inode at the same time, they 
compete
to apply for free huge page for the same index. So some of them couldn't get 
the huge
page and are killed by kernel. The root cause is when a huge page is dequeued 
from
the free list, just before being added to the page cache tree or mapping to 
page table,
other threads/processes might also apply for the huge page. Because the huge 
page is on
flight, neither in page cache tree, nor in page table mapping, so the late 
threads/processes
couldn't get the last free huge page and be killed. hugetlb_instantiation_mutex
prevents such behavior.

To remove the big lock, I worked out the patch against kernel 2.6.22.
1) Add flight_blocks in struct hugetlbfs_sb_info to record the quota in flight;
2) Add flight_huge_pages to record the huge pages in flight;
3) Add function hugetlb_commit_quota and hugetlb_rollback_quota. When a quota 
becomes
non-flight from flight, call hugetlb_commit_quota or hugetlb_rollback_quota, 
based on
if the fault succeeds.
4) Add function hugetlb_commit_page and hugetlb_rollback_page. When a huge page 
becomes
non-flight from flight, call hugetlb_commit_page or hugetlb_rollback_page, 
based on
if the fault succeeds.
5) When there is a flight page/quota, if the thread couldn't get a free 
page/quota, but
there is a flight page/quota, the thread will go back to retry.

My patch also fixed a bug in function hugetlb_no_page about quota. If a huge 
page
is in page cache, but 2 threads of a process fault on it at the same time, the 
late one
will call hugetlb_rollback_quota at the backout path incorrectly.

If there is no contention, the patch won't add much overhead. If there are many 
contentions,
mostly, the performance could be improved.

I have a hugetlb test suite. I ran it in a loop for a couple of hours and didn't
find anything bad.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

---

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(>stat_lock);
sbinfo->max_blocks = config.nr_blocks;
sbinfo->free_blocks = config.nr_blocks;
+   sbinfo->flight_blocks = 0;
sbinfo->max_inodes = config.nr_inodes;
sbinfo->free_inodes = config.nr_inodes;
sb->s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   if (sbinfo->free_blocks > 0)
+   if (sbinfo->free_blocks > 0) {
sbinfo->free_blocks--;
+   sbinfo->flight_blocks ++;
+   } else if (sbinfo->flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(>stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   sbinfo->free_blocks++;
+   sbinfo->free_blocks ++;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->flight_blocks --;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->free_blocks ++;
+   sbinfo->flight_blocks --;
spin_unlock(>stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks free */
+   longflight_blocks;/* blocks allocated but still not be used */
longmax_inodes;   /* inodes allowed */
longfree_inodes;  /* inodes free */
spinlock_t  stat_lock;
@@ -166,6 +167,8 @@ extern struct vm_operations_struct huget
 struct file *hugetlb_file_setup(const char 

[PATCH] remove hugetlb_instantiation_mutex

2007-07-27 Thread Zhang, Yanmin
hugetlb_instantiation_mutex is a big lock in function hugetlb_fault. It 
serializes
all hugetlb page faults, no matter if the faults happen in the same address 
space,
and no matter if the faults are related to the same inode.

Why is there such a big lock? Huge pages are limited resources. When the free 
huge
pages are not many, for example only 1 is left, if many threads/processes 
trigger page
faults on the same logical huge page of the same inode at the same time, they 
compete
to apply for free huge page for the same index. So some of them couldn't get 
the huge
page and are killed by kernel. The root cause is when a huge page is dequeued 
from
the free list, just before being added to the page cache tree or mapping to 
page table,
other threads/processes might also apply for the huge page. Because the huge 
page is on
flight, neither in page cache tree, nor in page table mapping, so the late 
threads/processes
couldn't get the last free huge page and be killed. hugetlb_instantiation_mutex
prevents such behavior.

To remove the big lock, I worked out the patch against kernel 2.6.22.
1) Add flight_blocks in struct hugetlbfs_sb_info to record the quota in flight;
2) Add flight_huge_pages to record the huge pages in flight;
3) Add function hugetlb_commit_quota and hugetlb_rollback_quota. When a quota 
becomes
non-flight from flight, call hugetlb_commit_quota or hugetlb_rollback_quota, 
based on
if the fault succeeds.
4) Add function hugetlb_commit_page and hugetlb_rollback_page. When a huge page 
becomes
non-flight from flight, call hugetlb_commit_page or hugetlb_rollback_page, 
based on
if the fault succeeds.
5) When there is a flight page/quota, if the thread couldn't get a free 
page/quota, but
there is a flight page/quota, the thread will go back to retry.

My patch also fixed a bug in function hugetlb_no_page about quota. If a huge 
page
is in page cache, but 2 threads of a process fault on it at the same time, the 
late one
will call hugetlb_rollback_quota at the backout path incorrectly.

If there is no contention, the patch won't add much overhead. If there are many 
contentions,
mostly, the performance could be improved.

I have a hugetlb test suite. I ran it in a loop for a couple of hours and didn't
find anything bad.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

---

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(sbinfo-stat_lock);
sbinfo-max_blocks = config.nr_blocks;
sbinfo-free_blocks = config.nr_blocks;
+   sbinfo-flight_blocks = 0;
sbinfo-max_inodes = config.nr_inodes;
sbinfo-free_inodes = config.nr_inodes;
sb-s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo-free_blocks  -1) {
spin_lock(sbinfo-stat_lock);
-   if (sbinfo-free_blocks  0)
+   if (sbinfo-free_blocks  0) {
sbinfo-free_blocks--;
+   sbinfo-flight_blocks ++;
+   } else if (sbinfo-flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(sbinfo-stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo-free_blocks  -1) {
spin_lock(sbinfo-stat_lock);
-   sbinfo-free_blocks++;
+   sbinfo-free_blocks ++;
+   spin_unlock(sbinfo-stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb);
+
+   if (sbinfo-free_blocks  -1) {
+   spin_lock(sbinfo-stat_lock);
+   sbinfo-flight_blocks --;
+   spin_unlock(sbinfo-stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb);
+
+   if (sbinfo-free_blocks  -1) {
+   spin_lock(sbinfo-stat_lock);
+   sbinfo-free_blocks ++;
+   sbinfo-flight_blocks --;
spin_unlock(sbinfo-stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks free */
+   longflight_blocks;/* blocks allocated but still not be used */
longmax_inodes;   /* inodes allowed */
longfree_inodes;  /* inodes free */
spinlock_t  stat_lock;
@@ -166,6 +167,8 @@ extern struct vm_operations_struct huget
 struct file 

Re: [PATCH] remove hugetlb_instantiation_mutex

2007-07-27 Thread Adam Litke
Hey... I am amazed at how quickly you came back with a patch for this :)
Thanks for looking at it.  Unfortunately there is one show-stopper and I
have some reservations (pun definitely intended) with your approach:

First, your patch does not pass the libhugetlbfs test
'alloc-instantiate-race' which was written to tickle the the race which
the mutex was introduced to solve.  Your patch works for shared
mappings, but not for the private case.

Second, the introduction of another pair of global counters triggers my
internal warning system... These global counters are known to cause
problems with NUMA and cpusets.  Have you considered these interactions?
Additionally, the commit/rollback logic you are using closely parallels
what we already do with the huge page reservation mechanism.  Is there
any way you could integrate your stuff into the reservation system to
avoid all the duplicated logic?

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/