Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
> Alas, include/asm-generic/mman.h doesn't exist now. git resolved it automagically > > Does this change touch all the hugetlb-capable architectures? I took a look at this again. So not every hugetlb capable architecture needs it, only architectures with multiple hugetlb page sizes. This is only x86, tile, powerpc I looked at tile and powerpc and they both have configurable hugetlb page sizes. So it's somewhat awkward to add defines for them. One disadvantage of this is also the user programs would need to know the page sizes that are configured. That is definitely awkward, but I don't know of any way around that. Luckily there's a way in /sys to query this. -Andi > > z:/usr/src/linux-3.6> grep -rl MAP_HUGETLB arch > arch/alpha/include/asm/mman.h > arch/xtensa/include/asm/mman.h > arch/parisc/include/asm/mman.h > arch/tile/include/asm/mman.h > arch/sparc/include/asm/mman.h > arch/powerpc/include/asm/mman.h > arch/mips/include/asm/mman.h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
Alas, include/asm-generic/mman.h doesn't exist now. git resolved it automagically Does this change touch all the hugetlb-capable architectures? I took a look at this again. So not every hugetlb capable architecture needs it, only architectures with multiple hugetlb page sizes. This is only x86, tile, powerpc I looked at tile and powerpc and they both have configurable hugetlb page sizes. So it's somewhat awkward to add defines for them. One disadvantage of this is also the user programs would need to know the page sizes that are configured. That is definitely awkward, but I don't know of any way around that. Luckily there's a way in /sys to query this. -Andi z:/usr/src/linux-3.6 grep -rl MAP_HUGETLB arch arch/alpha/include/asm/mman.h arch/xtensa/include/asm/mman.h arch/parisc/include/asm/mman.h arch/tile/include/asm/mman.h arch/sparc/include/asm/mman.h arch/powerpc/include/asm/mman.h arch/mips/include/asm/mman.h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
Thanks for the review. > > I also exported the new flags to the user headers > > (they were previously under __KERNEL__). Right now only symbols > > for x86 and some other architecture for 1GB and 2MB are defined. > > The interface should already work for all other architectures > > though. > > So some manpages need updating. I'm not sure which - mmap(2) surely, > but which for the IPC change? mmap and shmget. Was already planned. > > > v2: Port to new tree. Fix unmount. > > v3: Ported to latest tree. > > Acked-by: Rik van Riel > > Acked-by: KAMEZAWA Hiroyuki > > Signed-off-by: Andi Kleen > > --- > > arch/x86/include/asm/mman.h |3 ++ > > fs/hugetlbfs/inode.c| 63 > > ++- > > include/asm-generic/mman.h | 13 + > > include/linux/hugetlb.h | 12 +++- > > include/linux/shm.h | 19 + > > ipc/shm.c |3 +- > > mm/mmap.c |5 ++- > > Alas, include/asm-generic/mman.h doesn't exist now. > > Does this change touch all the hugetlb-capable architectures? Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. So they can add new symbols for their page sizes at their leisure. > > return capable(CAP_IPC_LOCK) || in_group_p(shm_group); > > } > > > > +static int get_hstate_idx(int page_size_log) > > nitlet: "page_size_order" would be more kernely. Or just "page_order". It's not really an order, just the index. I think I would prefer the current name, order would be misleading. For x86 it's only 0 and 1 > > + if (IS_ERR(hugetlbfs_vfsmount[i])) { > > + pr_err( > > + "hugetlb: Cannot mount internal hugetlbfs for page size > > %uK", > > + ps_kb); > > + error = PTR_ERR(hugetlbfs_vfsmount[i]); > > + } > > + i++; > > + } > > + /* Non default hstates are optional */ > > + if (hugetlbfs_vfsmount[default_hstate_idx]) > > + return 0; > > hm, so if I'm understanding this, the patch mounts hugetlbfs N times, > once for each page size. And presumably the shm code somehow selects > one of these mounts, based on incoming flags. And presumably if those > flags are all-zero, the behaviour is unaltered. Yes. > > Please update the changelog to describe all this - the overview of how > the patch actually operates. Ok. > > Also, all this affects the /proc/mounts contents, yes? Let's changelog > that very-slightly-non-back-compatible user-visible change as well. AFAIK not. The internal mounts are not visible. At least my laptop doesn't show them. > There's some overhead to doing all those additional mounts. Can we > quantify it? On x86 it's one more mount (1GB). AFAIK it's just the sb structure, there's nothing else preallocated. Maybe a couple hundred bytes per page size. The number of huge page sizes is normally small, I don't think any architecture has a large number. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
On Wed, 3 Oct 2012 15:24:23 -0700 Andi Kleen wrote: > From: Andi Kleen > > There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB > to use 1GB huge pages on some mappings, and stay with 2MB on others. This > is useful together with NUMA policy: use 2MB interleaving on some mappings, > but 1GB on local mappings. > > This patch extends the IPC/SHM syscall interfaces slightly to allow specifying > the page size. > > It borrows some upper bits in the existing flag arguments and allows encoding > the log of the desired page size in addition to the *_HUGETLB flag. > When 0 is specified the default size is used, this makes the change fully > compatible. > > Extending the internal hugetlb code to handle this is straight forward. > Instead > of a single mount it just keeps an array of them and selects the right > mount based on the specified page size. > > I also exported the new flags to the user headers > (they were previously under __KERNEL__). Right now only symbols > for x86 and some other architecture for 1GB and 2MB are defined. > The interface should already work for all other architectures > though. So some manpages need updating. I'm not sure which - mmap(2) surely, but which for the IPC change? > v2: Port to new tree. Fix unmount. > v3: Ported to latest tree. > Acked-by: Rik van Riel > Acked-by: KAMEZAWA Hiroyuki > Signed-off-by: Andi Kleen > --- > arch/x86/include/asm/mman.h |3 ++ > fs/hugetlbfs/inode.c| 63 > ++- > include/asm-generic/mman.h | 13 + > include/linux/hugetlb.h | 12 +++- > include/linux/shm.h | 19 + > ipc/shm.c |3 +- > mm/mmap.c |5 ++- Alas, include/asm-generic/mman.h doesn't exist now. Does this change touch all the hugetlb-capable architectures? z:/usr/src/linux-3.6> grep -rl MAP_HUGETLB arch arch/alpha/include/asm/mman.h arch/xtensa/include/asm/mman.h arch/parisc/include/asm/mman.h arch/tile/include/asm/mman.h arch/sparc/include/asm/mman.h arch/powerpc/include/asm/mman.h arch/mips/include/asm/mman.h > > ... > > @@ -933,9 +933,22 @@ static int can_do_hugetlb_shm(void) > return capable(CAP_IPC_LOCK) || in_group_p(shm_group); > } > > +static int get_hstate_idx(int page_size_log) nitlet: "page_size_order" would be more kernely. Or just "page_order". > +{ > + struct hstate *h; > + > + if (!page_size_log) > + return default_hstate_idx; > + h = size_to_hstate(1 << page_size_log); > + if (!h) > + return -1; > + return h - hstates; > +} > > ... > > static int __init init_hugetlbfs_fs(void) > { > + struct hstate *h; > int error; > - struct vfsmount *vfsmount; > + int i; > > error = bdi_init(_backing_dev_info); > if (error) > @@ -1030,14 +1049,26 @@ static int __init init_hugetlbfs_fs(void) > if (error) > goto out; > > - vfsmount = kern_mount(_fs_type); > + i = 0; > + for_each_hstate (h) { > + char buf[50]; > + unsigned ps_kb = 1U << (h->order + PAGE_SHIFT - 10); > > - if (!IS_ERR(vfsmount)) { > - hugetlbfs_vfsmount = vfsmount; > - return 0; > - } > + snprintf(buf, sizeof buf, "pagesize=%uK", ps_kb); > + hugetlbfs_vfsmount[i] = kern_mount_data(_fs_type, > + buf); > > - error = PTR_ERR(vfsmount); > + if (IS_ERR(hugetlbfs_vfsmount[i])) { > + pr_err( > + "hugetlb: Cannot mount internal hugetlbfs for page size > %uK", > +ps_kb); > + error = PTR_ERR(hugetlbfs_vfsmount[i]); > + } > + i++; > + } > + /* Non default hstates are optional */ > + if (hugetlbfs_vfsmount[default_hstate_idx]) > + return 0; hm, so if I'm understanding this, the patch mounts hugetlbfs N times, once for each page size. And presumably the shm code somehow selects one of these mounts, based on incoming flags. And presumably if those flags are all-zero, the behaviour is unaltered. Please update the changelog to describe all this - the overview of how the patch actually operates. Also, all this affects the /proc/mounts contents, yes? Let's changelog that very-slightly-non-back-compatible user-visible change as well. There's some overhead to doing all those additional mounts. Can we quantify it? > > ... > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
On Wed, 3 Oct 2012 15:24:23 -0700 Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. So some manpages need updating. I'm not sure which - mmap(2) surely, but which for the IPC change? v2: Port to new tree. Fix unmount. v3: Ported to latest tree. Acked-by: Rik van Riel r...@redhat.com Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Signed-off-by: Andi Kleen a...@linux.intel.com --- arch/x86/include/asm/mman.h |3 ++ fs/hugetlbfs/inode.c| 63 ++- include/asm-generic/mman.h | 13 + include/linux/hugetlb.h | 12 +++- include/linux/shm.h | 19 + ipc/shm.c |3 +- mm/mmap.c |5 ++- Alas, include/asm-generic/mman.h doesn't exist now. Does this change touch all the hugetlb-capable architectures? z:/usr/src/linux-3.6 grep -rl MAP_HUGETLB arch arch/alpha/include/asm/mman.h arch/xtensa/include/asm/mman.h arch/parisc/include/asm/mman.h arch/tile/include/asm/mman.h arch/sparc/include/asm/mman.h arch/powerpc/include/asm/mman.h arch/mips/include/asm/mman.h ... @@ -933,9 +933,22 @@ static int can_do_hugetlb_shm(void) return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } +static int get_hstate_idx(int page_size_log) nitlet: page_size_order would be more kernely. Or just page_order. +{ + struct hstate *h; + + if (!page_size_log) + return default_hstate_idx; + h = size_to_hstate(1 page_size_log); + if (!h) + return -1; + return h - hstates; +} ... static int __init init_hugetlbfs_fs(void) { + struct hstate *h; int error; - struct vfsmount *vfsmount; + int i; error = bdi_init(hugetlbfs_backing_dev_info); if (error) @@ -1030,14 +1049,26 @@ static int __init init_hugetlbfs_fs(void) if (error) goto out; - vfsmount = kern_mount(hugetlbfs_fs_type); + i = 0; + for_each_hstate (h) { + char buf[50]; + unsigned ps_kb = 1U (h-order + PAGE_SHIFT - 10); - if (!IS_ERR(vfsmount)) { - hugetlbfs_vfsmount = vfsmount; - return 0; - } + snprintf(buf, sizeof buf, pagesize=%uK, ps_kb); + hugetlbfs_vfsmount[i] = kern_mount_data(hugetlbfs_fs_type, + buf); - error = PTR_ERR(vfsmount); + if (IS_ERR(hugetlbfs_vfsmount[i])) { + pr_err( + hugetlb: Cannot mount internal hugetlbfs for page size %uK, +ps_kb); + error = PTR_ERR(hugetlbfs_vfsmount[i]); + } + i++; + } + /* Non default hstates are optional */ + if (hugetlbfs_vfsmount[default_hstate_idx]) + return 0; hm, so if I'm understanding this, the patch mounts hugetlbfs N times, once for each page size. And presumably the shm code somehow selects one of these mounts, based on incoming flags. And presumably if those flags are all-zero, the behaviour is unaltered. Please update the changelog to describe all this - the overview of how the patch actually operates. Also, all this affects the /proc/mounts contents, yes? Let's changelog that very-slightly-non-back-compatible user-visible change as well. There's some overhead to doing all those additional mounts. Can we quantify it? ... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
Thanks for the review. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. So some manpages need updating. I'm not sure which - mmap(2) surely, but which for the IPC change? mmap and shmget. Was already planned. v2: Port to new tree. Fix unmount. v3: Ported to latest tree. Acked-by: Rik van Riel r...@redhat.com Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Signed-off-by: Andi Kleen a...@linux.intel.com --- arch/x86/include/asm/mman.h |3 ++ fs/hugetlbfs/inode.c| 63 ++- include/asm-generic/mman.h | 13 + include/linux/hugetlb.h | 12 +++- include/linux/shm.h | 19 + ipc/shm.c |3 +- mm/mmap.c |5 ++- Alas, include/asm-generic/mman.h doesn't exist now. Does this change touch all the hugetlb-capable architectures? Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. So they can add new symbols for their page sizes at their leisure. return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } +static int get_hstate_idx(int page_size_log) nitlet: page_size_order would be more kernely. Or just page_order. It's not really an order, just the index. I think I would prefer the current name, order would be misleading. For x86 it's only 0 and 1 + if (IS_ERR(hugetlbfs_vfsmount[i])) { + pr_err( + hugetlb: Cannot mount internal hugetlbfs for page size %uK, + ps_kb); + error = PTR_ERR(hugetlbfs_vfsmount[i]); + } + i++; + } + /* Non default hstates are optional */ + if (hugetlbfs_vfsmount[default_hstate_idx]) + return 0; hm, so if I'm understanding this, the patch mounts hugetlbfs N times, once for each page size. And presumably the shm code somehow selects one of these mounts, based on incoming flags. And presumably if those flags are all-zero, the behaviour is unaltered. Yes. Please update the changelog to describe all this - the overview of how the patch actually operates. Ok. Also, all this affects the /proc/mounts contents, yes? Let's changelog that very-slightly-non-back-compatible user-visible change as well. AFAIK not. The internal mounts are not visible. At least my laptop doesn't show them. There's some overhead to doing all those additional mounts. Can we quantify it? On x86 it's one more mount (1GB). AFAIK it's just the sb structure, there's nothing else preallocated. Maybe a couple hundred bytes per page size. The number of huge page sizes is normally small, I don't think any architecture has a large number. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
From: Andi Kleen There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. v2: Port to new tree. Fix unmount. v3: Ported to latest tree. Acked-by: Rik van Riel Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andi Kleen --- arch/x86/include/asm/mman.h |3 ++ fs/hugetlbfs/inode.c| 63 ++- include/asm-generic/mman.h | 13 + include/linux/hugetlb.h | 12 +++- include/linux/shm.h | 19 + ipc/shm.c |3 +- mm/mmap.c |5 ++- 7 files changed, 100 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h index 593e51d..513b05f 100644 --- a/arch/x86/include/asm/mman.h +++ b/arch/x86/include/asm/mman.h @@ -3,6 +3,9 @@ #define MAP_32BIT 0x40/* only give out 32bit addresses */ +#define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT) +#define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT) + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9460120..f6fb699 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -924,7 +924,7 @@ static struct file_system_type hugetlbfs_fs_type = { .kill_sb= kill_litter_super, }; -static struct vfsmount *hugetlbfs_vfsmount; +static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE]; static int can_do_hugetlb_shm(void) { @@ -933,9 +933,22 @@ static int can_do_hugetlb_shm(void) return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } +static int get_hstate_idx(int page_size_log) +{ + struct hstate *h; + + if (!page_size_log) + return default_hstate_idx; + h = size_to_hstate(1 << page_size_log); + if (!h) + return -1; + return h - hstates; +} + struct file *hugetlb_file_setup(const char *name, unsigned long addr, size_t size, vm_flags_t acctflag, - struct user_struct **user, int creat_flags) + struct user_struct **user, + int creat_flags, int page_size_log) { int error = -ENOMEM; struct file *file; @@ -945,9 +958,14 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, struct qstr quick_string; struct hstate *hstate; unsigned long num_pages; + int hstate_idx; + + hstate_idx = get_hstate_idx(page_size_log); + if (hstate_idx < 0) + return ERR_PTR(-ENODEV); *user = NULL; - if (!hugetlbfs_vfsmount) + if (!hugetlbfs_vfsmount[hstate_idx]) return ERR_PTR(-ENOENT); if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) { @@ -964,7 +982,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, } } - root = hugetlbfs_vfsmount->mnt_root; + root = hugetlbfs_vfsmount[hstate_idx]->mnt_root; quick_string.name = name; quick_string.len = strlen(quick_string.name); quick_string.hash = 0; @@ -972,7 +990,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, if (!path.dentry) goto out_shm_unlock; - path.mnt = mntget(hugetlbfs_vfsmount); + path.mnt = mntget(hugetlbfs_vfsmount[hstate_idx]); error = -ENOSPC; inode = hugetlbfs_get_inode(root->d_sb, NULL, S_IFREG | S_IRWXUGO, 0); if (!inode) @@ -1012,8 +1030,9 @@ out_shm_unlock: static int __init init_hugetlbfs_fs(void) { + struct hstate *h; int error; - struct vfsmount *vfsmount; + int i; error = bdi_init(_backing_dev_info); if (error) @@ -1030,14 +1049,26 @@ static int __init init_hugetlbfs_fs(void) if (error) goto out; - vfsmount = kern_mount(_fs_type); + i = 0; + for_each_hstate (h) { + char buf[50]; + unsigned ps_kb = 1U <<
[PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
From: Andi Kleen a...@linux.intel.com There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. v2: Port to new tree. Fix unmount. v3: Ported to latest tree. Acked-by: Rik van Riel r...@redhat.com Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Signed-off-by: Andi Kleen a...@linux.intel.com --- arch/x86/include/asm/mman.h |3 ++ fs/hugetlbfs/inode.c| 63 ++- include/asm-generic/mman.h | 13 + include/linux/hugetlb.h | 12 +++- include/linux/shm.h | 19 + ipc/shm.c |3 +- mm/mmap.c |5 ++- 7 files changed, 100 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h index 593e51d..513b05f 100644 --- a/arch/x86/include/asm/mman.h +++ b/arch/x86/include/asm/mman.h @@ -3,6 +3,9 @@ #define MAP_32BIT 0x40/* only give out 32bit addresses */ +#define MAP_HUGE_2MB(21 MAP_HUGE_SHIFT) +#define MAP_HUGE_1GB(30 MAP_HUGE_SHIFT) + #include asm-generic/mman.h #endif /* _ASM_X86_MMAN_H */ diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9460120..f6fb699 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -924,7 +924,7 @@ static struct file_system_type hugetlbfs_fs_type = { .kill_sb= kill_litter_super, }; -static struct vfsmount *hugetlbfs_vfsmount; +static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE]; static int can_do_hugetlb_shm(void) { @@ -933,9 +933,22 @@ static int can_do_hugetlb_shm(void) return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } +static int get_hstate_idx(int page_size_log) +{ + struct hstate *h; + + if (!page_size_log) + return default_hstate_idx; + h = size_to_hstate(1 page_size_log); + if (!h) + return -1; + return h - hstates; +} + struct file *hugetlb_file_setup(const char *name, unsigned long addr, size_t size, vm_flags_t acctflag, - struct user_struct **user, int creat_flags) + struct user_struct **user, + int creat_flags, int page_size_log) { int error = -ENOMEM; struct file *file; @@ -945,9 +958,14 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, struct qstr quick_string; struct hstate *hstate; unsigned long num_pages; + int hstate_idx; + + hstate_idx = get_hstate_idx(page_size_log); + if (hstate_idx 0) + return ERR_PTR(-ENODEV); *user = NULL; - if (!hugetlbfs_vfsmount) + if (!hugetlbfs_vfsmount[hstate_idx]) return ERR_PTR(-ENOENT); if (creat_flags == HUGETLB_SHMFS_INODE !can_do_hugetlb_shm()) { @@ -964,7 +982,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, } } - root = hugetlbfs_vfsmount-mnt_root; + root = hugetlbfs_vfsmount[hstate_idx]-mnt_root; quick_string.name = name; quick_string.len = strlen(quick_string.name); quick_string.hash = 0; @@ -972,7 +990,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, if (!path.dentry) goto out_shm_unlock; - path.mnt = mntget(hugetlbfs_vfsmount); + path.mnt = mntget(hugetlbfs_vfsmount[hstate_idx]); error = -ENOSPC; inode = hugetlbfs_get_inode(root-d_sb, NULL, S_IFREG | S_IRWXUGO, 0); if (!inode) @@ -1012,8 +1030,9 @@ out_shm_unlock: static int __init init_hugetlbfs_fs(void) { + struct hstate *h; int error; - struct vfsmount *vfsmount; + int i; error = bdi_init(hugetlbfs_backing_dev_info); if (error) @@ -1030,14 +1049,26 @@ static int __init init_hugetlbfs_fs(void) if (error) goto out; - vfsmount = kern_mount(hugetlbfs_fs_type); +