Re: [PATCH] remove hugetlb_instantiation_mutex
On Fri, 2007-08-03 at 09:53 -0700, Nish Aravamudan wrote: > On 8/3/07, Adam Litke <[EMAIL PROTECTED]> wrote: > > On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote: > > > On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: > > > > Hey... I am amazed at how quickly you came back with a patch for this :) > > > > Thanks for looking at it. Unfortunately there is one show-stopper and I > > > > have some reservations (pun definitely intended) with your approach: > > > Thanks for your great comments. > > > > Sorry for such a long delay in responding. I have been pretty busy > > lately. > > > > > > First, your patch does not pass the libhugetlbfs test > > > > 'alloc-instantiate-race' which was written to tickle the the race which > > > > the mutex was introduced to solve. Your patch works for shared > > > > mappings, but not for the private case. > > > My testing about private might not be thorough. Function hugetlb_cow has > > > a race > > > for multi-thread to fault on the same private page index. But after I > > > fixed it, > > > alloc-instantiate-race still failed. > > > > > > I tried to google the source code tarball of libhugetlbfs test suite, but > > > couldn't > > > find it. Would you like to send me a copy of the test source codes? > > > > http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz > > > > The tarball will contain a test called alloc-instantiate-race. Make > > sure to run it in private and shared mode. Let me know what you find > > out. > > Actually, please use > http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz. > 1.2-pre1 had a build error that is fixed in the development snapshot. Sorry for replying late. I got a fever last week. The test case is very nice. I located the root cause. In function hugetlb_no_page, if the thread couldn't get a quota/huge page while there is no flight page, it will return VM_FAULT_SIGBUS or VM_FAULT_OOM. If the mapping is private, there might be a narrow race for multi-thread fault on the same private mapping index. I added a checking "if (!pte_none(*ptep))" if the thread couldn't get a quota/huge page while there is no flight page. alloc-instantiate-race's both private and shared could pass now. Thank all of you guys for the good pointer! --Yanmin The 3nd version of the patch- --- linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c 2007-07-26 14:52:04.0 +0800 @@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block spin_lock_init(>stat_lock); sbinfo->max_blocks = config.nr_blocks; sbinfo->free_blocks = config.nr_blocks; + sbinfo->flight_blocks = 0; sbinfo->max_inodes = config.nr_inodes; sbinfo->free_inodes = config.nr_inodes; sb->s_maxbytes = MAX_LFS_FILESIZE; @@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa if (sbinfo->free_blocks > -1) { spin_lock(>stat_lock); - if (sbinfo->free_blocks > 0) + if (sbinfo->free_blocks > 0) { sbinfo->free_blocks--; + sbinfo->flight_blocks ++; + } else if (sbinfo->flight_blocks) + ret = -EAGAIN; else ret = -ENOMEM; spin_unlock(>stat_lock); @@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp if (sbinfo->free_blocks > -1) { spin_lock(>stat_lock); - sbinfo->free_blocks++; + sbinfo->free_blocks ++; + spin_unlock(>stat_lock); + } +} + +void hugetlb_commit_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); + + if (sbinfo->free_blocks > -1) { + spin_lock(>stat_lock); + sbinfo->flight_blocks --; + spin_unlock(>stat_lock); + } +} + +void hugetlb_rollback_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); + + if (sbinfo->free_blocks > -1) { + spin_lock(>stat_lock); + sbinfo->free_blocks ++; + sbinfo->flight_blocks --; spin_unlock(>stat_lock); } } --- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 16:54:39.0 +0800 @@ -140,6 +140,7 @@ struct hugetlbfs_config { struct hugetlbfs_sb_info { longmax_blocks; /* blocks allowed */ longfree_blocks; /* blocks free */ + longflight_blocks;/* blocks allocated but still not be used */ longmax_inodes; /* inodes allowed */ longfree_inodes; /* inodes free */ spinlock_t stat_lock; @@ -166,6 +167,8 @@ extern struct vm_operations_struct huget struct file
Re: [PATCH] remove hugetlb_instantiation_mutex
On Fri, 2007-08-03 at 09:53 -0700, Nish Aravamudan wrote: On 8/3/07, Adam Litke [EMAIL PROTECTED] wrote: On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote: On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: Hey... I am amazed at how quickly you came back with a patch for this :) Thanks for looking at it. Unfortunately there is one show-stopper and I have some reservations (pun definitely intended) with your approach: Thanks for your great comments. Sorry for such a long delay in responding. I have been pretty busy lately. First, your patch does not pass the libhugetlbfs test 'alloc-instantiate-race' which was written to tickle the the race which the mutex was introduced to solve. Your patch works for shared mappings, but not for the private case. My testing about private might not be thorough. Function hugetlb_cow has a race for multi-thread to fault on the same private page index. But after I fixed it, alloc-instantiate-race still failed. I tried to google the source code tarball of libhugetlbfs test suite, but couldn't find it. Would you like to send me a copy of the test source codes? http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz The tarball will contain a test called alloc-instantiate-race. Make sure to run it in private and shared mode. Let me know what you find out. Actually, please use http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz. 1.2-pre1 had a build error that is fixed in the development snapshot. Sorry for replying late. I got a fever last week. The test case is very nice. I located the root cause. In function hugetlb_no_page, if the thread couldn't get a quota/huge page while there is no flight page, it will return VM_FAULT_SIGBUS or VM_FAULT_OOM. If the mapping is private, there might be a narrow race for multi-thread fault on the same private mapping index. I added a checking if (!pte_none(*ptep)) if the thread couldn't get a quota/huge page while there is no flight page. alloc-instantiate-race's both private and shared could pass now. Thank all of you guys for the good pointer! --Yanmin The 3nd version of the patch- --- linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c 2007-07-26 14:52:04.0 +0800 @@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block spin_lock_init(sbinfo-stat_lock); sbinfo-max_blocks = config.nr_blocks; sbinfo-free_blocks = config.nr_blocks; + sbinfo-flight_blocks = 0; sbinfo-max_inodes = config.nr_inodes; sbinfo-free_inodes = config.nr_inodes; sb-s_maxbytes = MAX_LFS_FILESIZE; @@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa if (sbinfo-free_blocks -1) { spin_lock(sbinfo-stat_lock); - if (sbinfo-free_blocks 0) + if (sbinfo-free_blocks 0) { sbinfo-free_blocks--; + sbinfo-flight_blocks ++; + } else if (sbinfo-flight_blocks) + ret = -EAGAIN; else ret = -ENOMEM; spin_unlock(sbinfo-stat_lock); @@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp if (sbinfo-free_blocks -1) { spin_lock(sbinfo-stat_lock); - sbinfo-free_blocks++; + sbinfo-free_blocks ++; + spin_unlock(sbinfo-stat_lock); + } +} + +void hugetlb_commit_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb); + + if (sbinfo-free_blocks -1) { + spin_lock(sbinfo-stat_lock); + sbinfo-flight_blocks --; + spin_unlock(sbinfo-stat_lock); + } +} + +void hugetlb_rollback_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb); + + if (sbinfo-free_blocks -1) { + spin_lock(sbinfo-stat_lock); + sbinfo-free_blocks ++; + sbinfo-flight_blocks --; spin_unlock(sbinfo-stat_lock); } } --- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 16:54:39.0 +0800 @@ -140,6 +140,7 @@ struct hugetlbfs_config { struct hugetlbfs_sb_info { longmax_blocks; /* blocks allowed */ longfree_blocks; /* blocks free */ + longflight_blocks;/* blocks allocated but still not be used */ longmax_inodes; /* inodes allowed */ longfree_inodes; /* inodes free */ spinlock_t stat_lock; @@ -166,6 +167,8 @@ extern struct vm_operations_struct huget struct file *hugetlb_file_setup(const char *name, size_t); int hugetlb_get_quota(struct
Re: [PATCH] remove hugetlb_instantiation_mutex
On 8/3/07, Adam Litke <[EMAIL PROTECTED]> wrote: > On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote: > > On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: > > > Hey... I am amazed at how quickly you came back with a patch for this :) > > > Thanks for looking at it. Unfortunately there is one show-stopper and I > > > have some reservations (pun definitely intended) with your approach: > > Thanks for your great comments. > > Sorry for such a long delay in responding. I have been pretty busy > lately. > > > > First, your patch does not pass the libhugetlbfs test > > > 'alloc-instantiate-race' which was written to tickle the the race which > > > the mutex was introduced to solve. Your patch works for shared > > > mappings, but not for the private case. > > My testing about private might not be thorough. Function hugetlb_cow has a > > race > > for multi-thread to fault on the same private page index. But after I fixed > > it, > > alloc-instantiate-race still failed. > > > > I tried to google the source code tarball of libhugetlbfs test suite, but > > couldn't > > find it. Would you like to send me a copy of the test source codes? > > http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz > > The tarball will contain a test called alloc-instantiate-race. Make > sure to run it in private and shared mode. Let me know what you find > out. Actually, please use http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz. 1.2-pre1 had a build error that is fixed in the development snapshot. Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] remove hugetlb_instantiation_mutex
On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote: > On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: > > Hey... I am amazed at how quickly you came back with a patch for this :) > > Thanks for looking at it. Unfortunately there is one show-stopper and I > > have some reservations (pun definitely intended) with your approach: > Thanks for your great comments. Sorry for such a long delay in responding. I have been pretty busy lately. > > First, your patch does not pass the libhugetlbfs test > > 'alloc-instantiate-race' which was written to tickle the the race which > > the mutex was introduced to solve. Your patch works for shared > > mappings, but not for the private case. > My testing about private might not be thorough. Function hugetlb_cow has a > race > for multi-thread to fault on the same private page index. But after I fixed > it, > alloc-instantiate-race still failed. > > I tried to google the source code tarball of libhugetlbfs test suite, but > couldn't > find it. Would you like to send me a copy of the test source codes? http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz The tarball will contain a test called alloc-instantiate-race. Make sure to run it in private and shared mode. Let me know what you find out. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] remove hugetlb_instantiation_mutex
On 8/3/07, Adam Litke [EMAIL PROTECTED] wrote: On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote: On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: Hey... I am amazed at how quickly you came back with a patch for this :) Thanks for looking at it. Unfortunately there is one show-stopper and I have some reservations (pun definitely intended) with your approach: Thanks for your great comments. Sorry for such a long delay in responding. I have been pretty busy lately. First, your patch does not pass the libhugetlbfs test 'alloc-instantiate-race' which was written to tickle the the race which the mutex was introduced to solve. Your patch works for shared mappings, but not for the private case. My testing about private might not be thorough. Function hugetlb_cow has a race for multi-thread to fault on the same private page index. But after I fixed it, alloc-instantiate-race still failed. I tried to google the source code tarball of libhugetlbfs test suite, but couldn't find it. Would you like to send me a copy of the test source codes? http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz The tarball will contain a test called alloc-instantiate-race. Make sure to run it in private and shared mode. Let me know what you find out. Actually, please use http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz. 1.2-pre1 had a build error that is fixed in the development snapshot. Thanks, Nish - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] remove hugetlb_instantiation_mutex
On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote: On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: Hey... I am amazed at how quickly you came back with a patch for this :) Thanks for looking at it. Unfortunately there is one show-stopper and I have some reservations (pun definitely intended) with your approach: Thanks for your great comments. Sorry for such a long delay in responding. I have been pretty busy lately. First, your patch does not pass the libhugetlbfs test 'alloc-instantiate-race' which was written to tickle the the race which the mutex was introduced to solve. Your patch works for shared mappings, but not for the private case. My testing about private might not be thorough. Function hugetlb_cow has a race for multi-thread to fault on the same private page index. But after I fixed it, alloc-instantiate-race still failed. I tried to google the source code tarball of libhugetlbfs test suite, but couldn't find it. Would you like to send me a copy of the test source codes? http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz The tarball will contain a test called alloc-instantiate-race. Make sure to run it in private and shared mode. Let me know what you find out. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] remove hugetlb_instantiation_mutex
On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: > Hey... I am amazed at how quickly you came back with a patch for this :) > Thanks for looking at it. Unfortunately there is one show-stopper and I > have some reservations (pun definitely intended) with your approach: Thanks for your great comments. > > First, your patch does not pass the libhugetlbfs test > 'alloc-instantiate-race' which was written to tickle the the race which > the mutex was introduced to solve. Your patch works for shared > mappings, but not for the private case. My testing about private might not be thorough. Function hugetlb_cow has a race for multi-thread to fault on the same private page index. But after I fixed it, alloc-instantiate-race still failed. I tried to google the source code tarball of libhugetlbfs test suite, but couldn't find it. Would you like to send me a copy of the test source codes? > > Second, the introduction of another pair of global counters triggers my > internal warning system... These global counters are known to cause > problems with NUMA and cpusets. Have you considered these interactions? flight_blocks is a variable per superblock. As you know, quota is not used frequently, so flight_blocks mostly isn't an issue of cache-miss. flight_huge_pages might be a potential issue of cache miss with NUMA/cpusets. Currently, global variable free_huge_pages and resv_huge_pages have the same cache issue with NUMA as well as the hugetlb_instantiation_mutex. I assume the new variable flight_huge_pages shares the same cache line with hugetlb_lock, free_huge_pages, and resv_huge_pages. A possible improvement about it is to add a simple check without lock before calling alloc_huge_page in function hugetlb_no_page. The check could be: if ( (!free_huge_pages && flight_huge_pages) || (!(vma->vm_flags & VM_MAYSHARE) && free_huge_pages <= resv_huge_pages && flight_huge_pages) ) { hugetlb_rollback_quota(mapping); cond_resched(); goto retry; } Then, it becomes read instead of write before changing the variables. > Additionally, the commit/rollback logic you are using closely parallels > what we already do with the huge page reservation mechanism. Is there > any way you could integrate your stuff into the reservation system to > avoid all the duplicated logic? I add a new parameter flight_delta to function hugetlb_acct_memory and use hugetlb_acct_memory to replace both hugetlb_commit_page and hugetlb_rollback_page. As for quota, it's hard to integrate it into current reservation mechanism. Below is the new patch. Yanmin --- --- linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c 2007-07-26 14:52:04.0 +0800 @@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block spin_lock_init(>stat_lock); sbinfo->max_blocks = config.nr_blocks; sbinfo->free_blocks = config.nr_blocks; + sbinfo->flight_blocks = 0; sbinfo->max_inodes = config.nr_inodes; sbinfo->free_inodes = config.nr_inodes; sb->s_maxbytes = MAX_LFS_FILESIZE; @@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa if (sbinfo->free_blocks > -1) { spin_lock(>stat_lock); - if (sbinfo->free_blocks > 0) + if (sbinfo->free_blocks > 0) { sbinfo->free_blocks--; + sbinfo->flight_blocks ++; + } else if (sbinfo->flight_blocks) + ret = -EAGAIN; else ret = -ENOMEM; spin_unlock(>stat_lock); @@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp if (sbinfo->free_blocks > -1) { spin_lock(>stat_lock); - sbinfo->free_blocks++; + sbinfo->free_blocks ++; + spin_unlock(>stat_lock); + } +} + +void hugetlb_commit_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); + + if (sbinfo->free_blocks > -1) { + spin_lock(>stat_lock); + sbinfo->flight_blocks --; + spin_unlock(>stat_lock); + } +} + +void hugetlb_rollback_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); + + if (sbinfo->free_blocks > -1) { + spin_lock(>stat_lock); + sbinfo->free_blocks ++; + sbinfo->flight_blocks --; spin_unlock(>stat_lock); } } --- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 16:54:39.0 +0800 @@ -140,6 +140,7 @@ struct hugetlbfs_config { struct hugetlbfs_sb_info { longmax_blocks; /* blocks allowed */ longfree_blocks; /* blocks
Re: [PATCH] remove hugetlb_instantiation_mutex
On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote: Hey... I am amazed at how quickly you came back with a patch for this :) Thanks for looking at it. Unfortunately there is one show-stopper and I have some reservations (pun definitely intended) with your approach: Thanks for your great comments. First, your patch does not pass the libhugetlbfs test 'alloc-instantiate-race' which was written to tickle the the race which the mutex was introduced to solve. Your patch works for shared mappings, but not for the private case. My testing about private might not be thorough. Function hugetlb_cow has a race for multi-thread to fault on the same private page index. But after I fixed it, alloc-instantiate-race still failed. I tried to google the source code tarball of libhugetlbfs test suite, but couldn't find it. Would you like to send me a copy of the test source codes? Second, the introduction of another pair of global counters triggers my internal warning system... These global counters are known to cause problems with NUMA and cpusets. Have you considered these interactions? flight_blocks is a variable per superblock. As you know, quota is not used frequently, so flight_blocks mostly isn't an issue of cache-miss. flight_huge_pages might be a potential issue of cache miss with NUMA/cpusets. Currently, global variable free_huge_pages and resv_huge_pages have the same cache issue with NUMA as well as the hugetlb_instantiation_mutex. I assume the new variable flight_huge_pages shares the same cache line with hugetlb_lock, free_huge_pages, and resv_huge_pages. A possible improvement about it is to add a simple check without lock before calling alloc_huge_page in function hugetlb_no_page. The check could be: if ( (!free_huge_pages flight_huge_pages) || (!(vma-vm_flags VM_MAYSHARE) free_huge_pages = resv_huge_pages flight_huge_pages) ) { hugetlb_rollback_quota(mapping); cond_resched(); goto retry; } Then, it becomes read instead of write before changing the variables. Additionally, the commit/rollback logic you are using closely parallels what we already do with the huge page reservation mechanism. Is there any way you could integrate your stuff into the reservation system to avoid all the duplicated logic? I add a new parameter flight_delta to function hugetlb_acct_memory and use hugetlb_acct_memory to replace both hugetlb_commit_page and hugetlb_rollback_page. As for quota, it's hard to integrate it into current reservation mechanism. Below is the new patch. Yanmin --- --- linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c 2007-07-26 14:52:04.0 +0800 @@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block spin_lock_init(sbinfo-stat_lock); sbinfo-max_blocks = config.nr_blocks; sbinfo-free_blocks = config.nr_blocks; + sbinfo-flight_blocks = 0; sbinfo-max_inodes = config.nr_inodes; sbinfo-free_inodes = config.nr_inodes; sb-s_maxbytes = MAX_LFS_FILESIZE; @@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa if (sbinfo-free_blocks -1) { spin_lock(sbinfo-stat_lock); - if (sbinfo-free_blocks 0) + if (sbinfo-free_blocks 0) { sbinfo-free_blocks--; + sbinfo-flight_blocks ++; + } else if (sbinfo-flight_blocks) + ret = -EAGAIN; else ret = -ENOMEM; spin_unlock(sbinfo-stat_lock); @@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp if (sbinfo-free_blocks -1) { spin_lock(sbinfo-stat_lock); - sbinfo-free_blocks++; + sbinfo-free_blocks ++; + spin_unlock(sbinfo-stat_lock); + } +} + +void hugetlb_commit_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb); + + if (sbinfo-free_blocks -1) { + spin_lock(sbinfo-stat_lock); + sbinfo-flight_blocks --; + spin_unlock(sbinfo-stat_lock); + } +} + +void hugetlb_rollback_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb); + + if (sbinfo-free_blocks -1) { + spin_lock(sbinfo-stat_lock); + sbinfo-free_blocks ++; + sbinfo-flight_blocks --; spin_unlock(sbinfo-stat_lock); } } --- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 16:54:39.0 +0800 @@ -140,6 +140,7 @@ struct hugetlbfs_config { struct hugetlbfs_sb_info { longmax_blocks; /* blocks allowed */ longfree_blocks; /* blocks
Re: [PATCH] remove hugetlb_instantiation_mutex
Hey... I am amazed at how quickly you came back with a patch for this :) Thanks for looking at it. Unfortunately there is one show-stopper and I have some reservations (pun definitely intended) with your approach: First, your patch does not pass the libhugetlbfs test 'alloc-instantiate-race' which was written to tickle the the race which the mutex was introduced to solve. Your patch works for shared mappings, but not for the private case. Second, the introduction of another pair of global counters triggers my internal warning system... These global counters are known to cause problems with NUMA and cpusets. Have you considered these interactions? Additionally, the commit/rollback logic you are using closely parallels what we already do with the huge page reservation mechanism. Is there any way you could integrate your stuff into the reservation system to avoid all the duplicated logic? -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] remove hugetlb_instantiation_mutex
hugetlb_instantiation_mutex is a big lock in function hugetlb_fault. It serializes all hugetlb page faults, no matter if the faults happen in the same address space, and no matter if the faults are related to the same inode. Why is there such a big lock? Huge pages are limited resources. When the free huge pages are not many, for example only 1 is left, if many threads/processes trigger page faults on the same logical huge page of the same inode at the same time, they compete to apply for free huge page for the same index. So some of them couldn't get the huge page and are killed by kernel. The root cause is when a huge page is dequeued from the free list, just before being added to the page cache tree or mapping to page table, other threads/processes might also apply for the huge page. Because the huge page is on flight, neither in page cache tree, nor in page table mapping, so the late threads/processes couldn't get the last free huge page and be killed. hugetlb_instantiation_mutex prevents such behavior. To remove the big lock, I worked out the patch against kernel 2.6.22. 1) Add flight_blocks in struct hugetlbfs_sb_info to record the quota in flight; 2) Add flight_huge_pages to record the huge pages in flight; 3) Add function hugetlb_commit_quota and hugetlb_rollback_quota. When a quota becomes non-flight from flight, call hugetlb_commit_quota or hugetlb_rollback_quota, based on if the fault succeeds. 4) Add function hugetlb_commit_page and hugetlb_rollback_page. When a huge page becomes non-flight from flight, call hugetlb_commit_page or hugetlb_rollback_page, based on if the fault succeeds. 5) When there is a flight page/quota, if the thread couldn't get a free page/quota, but there is a flight page/quota, the thread will go back to retry. My patch also fixed a bug in function hugetlb_no_page about quota. If a huge page is in page cache, but 2 threads of a process fault on it at the same time, the late one will call hugetlb_rollback_quota at the backout path incorrectly. If there is no contention, the patch won't add much overhead. If there are many contentions, mostly, the performance could be improved. I have a hugetlb test suite. I ran it in a loop for a couple of hours and didn't find anything bad. Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]> --- --- linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c 2007-07-26 14:52:04.0 +0800 @@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block spin_lock_init(>stat_lock); sbinfo->max_blocks = config.nr_blocks; sbinfo->free_blocks = config.nr_blocks; + sbinfo->flight_blocks = 0; sbinfo->max_inodes = config.nr_inodes; sbinfo->free_inodes = config.nr_inodes; sb->s_maxbytes = MAX_LFS_FILESIZE; @@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa if (sbinfo->free_blocks > -1) { spin_lock(>stat_lock); - if (sbinfo->free_blocks > 0) + if (sbinfo->free_blocks > 0) { sbinfo->free_blocks--; + sbinfo->flight_blocks ++; + } else if (sbinfo->flight_blocks) + ret = -EAGAIN; else ret = -ENOMEM; spin_unlock(>stat_lock); @@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp if (sbinfo->free_blocks > -1) { spin_lock(>stat_lock); - sbinfo->free_blocks++; + sbinfo->free_blocks ++; + spin_unlock(>stat_lock); + } +} + +void hugetlb_commit_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); + + if (sbinfo->free_blocks > -1) { + spin_lock(>stat_lock); + sbinfo->flight_blocks --; + spin_unlock(>stat_lock); + } +} + +void hugetlb_rollback_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); + + if (sbinfo->free_blocks > -1) { + spin_lock(>stat_lock); + sbinfo->free_blocks ++; + sbinfo->flight_blocks --; spin_unlock(>stat_lock); } } --- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 16:54:39.0 +0800 @@ -140,6 +140,7 @@ struct hugetlbfs_config { struct hugetlbfs_sb_info { longmax_blocks; /* blocks allowed */ longfree_blocks; /* blocks free */ + longflight_blocks;/* blocks allocated but still not be used */ longmax_inodes; /* inodes allowed */ longfree_inodes; /* inodes free */ spinlock_t stat_lock; @@ -166,6 +167,8 @@ extern struct vm_operations_struct huget struct file *hugetlb_file_setup(const char
[PATCH] remove hugetlb_instantiation_mutex
hugetlb_instantiation_mutex is a big lock in function hugetlb_fault. It serializes all hugetlb page faults, no matter if the faults happen in the same address space, and no matter if the faults are related to the same inode. Why is there such a big lock? Huge pages are limited resources. When the free huge pages are not many, for example only 1 is left, if many threads/processes trigger page faults on the same logical huge page of the same inode at the same time, they compete to apply for free huge page for the same index. So some of them couldn't get the huge page and are killed by kernel. The root cause is when a huge page is dequeued from the free list, just before being added to the page cache tree or mapping to page table, other threads/processes might also apply for the huge page. Because the huge page is on flight, neither in page cache tree, nor in page table mapping, so the late threads/processes couldn't get the last free huge page and be killed. hugetlb_instantiation_mutex prevents such behavior. To remove the big lock, I worked out the patch against kernel 2.6.22. 1) Add flight_blocks in struct hugetlbfs_sb_info to record the quota in flight; 2) Add flight_huge_pages to record the huge pages in flight; 3) Add function hugetlb_commit_quota and hugetlb_rollback_quota. When a quota becomes non-flight from flight, call hugetlb_commit_quota or hugetlb_rollback_quota, based on if the fault succeeds. 4) Add function hugetlb_commit_page and hugetlb_rollback_page. When a huge page becomes non-flight from flight, call hugetlb_commit_page or hugetlb_rollback_page, based on if the fault succeeds. 5) When there is a flight page/quota, if the thread couldn't get a free page/quota, but there is a flight page/quota, the thread will go back to retry. My patch also fixed a bug in function hugetlb_no_page about quota. If a huge page is in page cache, but 2 threads of a process fault on it at the same time, the late one will call hugetlb_rollback_quota at the backout path incorrectly. If there is no contention, the patch won't add much overhead. If there are many contentions, mostly, the performance could be improved. I have a hugetlb test suite. I ran it in a loop for a couple of hours and didn't find anything bad. Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] --- --- linux-2.6.22/fs/hugetlbfs/inode.c 2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c 2007-07-26 14:52:04.0 +0800 @@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block spin_lock_init(sbinfo-stat_lock); sbinfo-max_blocks = config.nr_blocks; sbinfo-free_blocks = config.nr_blocks; + sbinfo-flight_blocks = 0; sbinfo-max_inodes = config.nr_inodes; sbinfo-free_inodes = config.nr_inodes; sb-s_maxbytes = MAX_LFS_FILESIZE; @@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa if (sbinfo-free_blocks -1) { spin_lock(sbinfo-stat_lock); - if (sbinfo-free_blocks 0) + if (sbinfo-free_blocks 0) { sbinfo-free_blocks--; + sbinfo-flight_blocks ++; + } else if (sbinfo-flight_blocks) + ret = -EAGAIN; else ret = -ENOMEM; spin_unlock(sbinfo-stat_lock); @@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp if (sbinfo-free_blocks -1) { spin_lock(sbinfo-stat_lock); - sbinfo-free_blocks++; + sbinfo-free_blocks ++; + spin_unlock(sbinfo-stat_lock); + } +} + +void hugetlb_commit_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb); + + if (sbinfo-free_blocks -1) { + spin_lock(sbinfo-stat_lock); + sbinfo-flight_blocks --; + spin_unlock(sbinfo-stat_lock); + } +} + +void hugetlb_rollback_quota(struct address_space *mapping) +{ + struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping-host-i_sb); + + if (sbinfo-free_blocks -1) { + spin_lock(sbinfo-stat_lock); + sbinfo-free_blocks ++; + sbinfo-flight_blocks --; spin_unlock(sbinfo-stat_lock); } } --- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 +0800 +++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 16:54:39.0 +0800 @@ -140,6 +140,7 @@ struct hugetlbfs_config { struct hugetlbfs_sb_info { longmax_blocks; /* blocks allowed */ longfree_blocks; /* blocks free */ + longflight_blocks;/* blocks allocated but still not be used */ longmax_inodes; /* inodes allowed */ longfree_inodes; /* inodes free */ spinlock_t stat_lock; @@ -166,6 +167,8 @@ extern struct vm_operations_struct huget struct file
Re: [PATCH] remove hugetlb_instantiation_mutex
Hey... I am amazed at how quickly you came back with a patch for this :) Thanks for looking at it. Unfortunately there is one show-stopper and I have some reservations (pun definitely intended) with your approach: First, your patch does not pass the libhugetlbfs test 'alloc-instantiate-race' which was written to tickle the the race which the mutex was introduced to solve. Your patch works for shared mappings, but not for the private case. Second, the introduction of another pair of global counters triggers my internal warning system... These global counters are known to cause problems with NUMA and cpusets. Have you considered these interactions? Additionally, the commit/rollback logic you are using closely parallels what we already do with the huge page reservation mechanism. Is there any way you could integrate your stuff into the reservation system to avoid all the duplicated logic? -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/