Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
Hello Phillip, Sorry for late response. On Thu, Nov 14, 2013 at 05:04:36PM +, Phillip Lougher wrote: > CCing Junjiro Okijima and Stephen Hemminger > > On 08/11/13 02:42, Minchan Kim wrote: > > > >Hello Phillip, > > > >On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote: > >>Add a multi-threaded decompression implementation which uses > >>percpu variables. > >> > >>Using percpu variables has advantages and disadvantages over > >>implementations which do not use percpu variables. > >> > >>Advantages: the nature of percpu variables ensures decompression is > >>load-balanced across the multiple cores. > >> > >>Disadvantages: it limits decompression to one thread per core. > > > >At a glance, I understand your concern but I don't see benefit to > >make this feature as separate new config because we can modify the > >number of decompressor per core in the future. > >I don't want to create new config SQUASHFS_DECOMP_MULTI_3, > >SQUASHFS_DECOMP_MULTI_4 and so on. :) > > You misunderstand > > I have been sent two multi-threaded implementations in the > past which use percpu variables: > > 1. First patch set: > >http://www.spinics.net/lists/linux-fsdevel/msg34365.html > >Later in early 2011, I explained why I'd not merged the >patches, and promised to do so when I got time > >http://www.spinics.net/lists/linux-fsdevel/msg42392.html > > > 2. Second patch set sent in 2011 > >http://www.spinics.net/lists/linux-fsdevel/msg44111.html > > So, these patches have been in my inbox, waiting until I got > time to refactor Squashfs so that they could be merged... and > I finally got to do this last month, which is why I'm merging > a combined version of both patches now. > > As to why have *two* implementations, I previously explained these > two approaches are complementary, and merging both allows the > user to decide which method of parallelising Squashfs they want > to do. What I see for benefit decompressor_multi_percpu is only limit memory overhead but it could be solved by dynamic shrinking as I mentioned earlier. About CPU usage, I'm not sure how decompressor_multi is horrible. If it's really concern, we can fix it to limit number of decomp stream by admin via sysfs or something. Decompression load balance? Acutally, I don't understand your claim. Could you elaborate it a bit? I couldn't understand why decompressor_multi_percpu is better than decompressor_multi. > > The percpu implementation is a good approach to parallelising > Squashfs. It is extremely simple, both in code and overhead. I agree about code but not sure about overhead. I admit decompressor_multi is bigger but it consists of simple opeartions. Sometime, kmalloc's cost could be higher if system memory pressure is severe so that file system's performance would fluctuate. If it's your concern, we could make threshold min/high so that it could guarantee some speed at least. > The decompression hotpath simply consists of taking a percpu > variable, doing the decompression, and then a release. When decomp buffer is created dynamically, it could be rather overhead compared to percpu approach but once it did, the overhead of decompress would be marginal. > Looking at code sizes: > > fs/squashfs/decompressor_multi.c| 199 +++ > fs/squashfs/decompressor_multi_percpu.c | 104 > fs/squashfs/decompressor_single.c | 85 + > > The simplicity of the percpu approach is readily apparent, at 104 > lines it is only slightly larger than the single threaded > implementation. > > Personally I like both approaches, and I have no reason not to > merge both implementations I have been sent. Okay. I'm not a maintainer so I'm not strong against your thougt. I just wanted to unify them if either of them isn't sigificantly win because I thought it makes maintainer and contributors happy due to avoid more CONFIG options which should be considered when some feature is added. > > But what does the community think here? Do you want the percpu > implementation? Do you see value in having two implementations? > Feedback is appreciated. As I mentioned, my opinion is that let's unify them if either of them is significantly win. It would be better to see some benchmark result. > > > > >How about this? > > > >1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin > > in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it > > to sysfs so user can tune it in rumtime. > > > >2. put decompressor shrink logic by slab shrinker so if system has > > memory pressure, we could catch the event and free some of decompressor > > but memory pressure is not severe again in the future, we can create > > new decompressor until reaching threadhold user define. > > We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL > > in get_decomp_stream's allocation indirectly. > > This adds extra complexity to an implementation already
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
Phillip Lougher: > CCing Junjiro Okijima and Stephen Hemminger Thank you for CCing, and sorry for my slow responce. > >> Using percpu variables has advantages and disadvantages over > >> implementations which do not use percpu variables. > >> > >> Advantages: the nature of percpu variables ensures decompression is > >> load-balanced across the multiple cores. > >> > >> Disadvantages: it limits decompression to one thread per core. Honestly speaking, I don't remember the details of squashfs. It was a long long time ago when I read and modified squashfs. Anyway I will try replying. Percpu is a good approach. Obviously, as you mentioned as disadvantage, it depends the balance between these two things. - How many I/Os in parallel? - How much does the decompression cost? My current guess is the latter is heavier (for the performance), so I guess percpu is good. Is it guranteed that the decompressor never require any new resources? Under heavy I/O and memory pressure, if the decompressor wants some memory between get_cpu_ptr() and put_cpu_ptr(), and if the decompressor is running on all other cores at the same time, then does squashfs simply return ENOMEM because the memory shrinker cannot run on any core? If it is true, we may need a rule "no new resources for decompressing" since users may prefer the "slow but successful decompression" than getting ENOMEM. If this mail is totaly pointless, please ignore. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
Phillip Lougher: CCing Junjiro Okijima and Stephen Hemminger Thank you for CCing, and sorry for my slow responce. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: the nature of percpu variables ensures decompression is load-balanced across the multiple cores. Disadvantages: it limits decompression to one thread per core. Honestly speaking, I don't remember the details of squashfs. It was a long long time ago when I read and modified squashfs. Anyway I will try replying. Percpu is a good approach. Obviously, as you mentioned as disadvantage, it depends the balance between these two things. - How many I/Os in parallel? - How much does the decompression cost? My current guess is the latter is heavier (for the performance), so I guess percpu is good. Is it guranteed that the decompressor never require any new resources? Under heavy I/O and memory pressure, if the decompressor wants some memory between get_cpu_ptr() and put_cpu_ptr(), and if the decompressor is running on all other cores at the same time, then does squashfs simply return ENOMEM because the memory shrinker cannot run on any core? If it is true, we may need a rule no new resources for decompressing since users may prefer the slow but successful decompression than getting ENOMEM. If this mail is totaly pointless, please ignore. J. R. Okajima -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
Hello Phillip, Sorry for late response. On Thu, Nov 14, 2013 at 05:04:36PM +, Phillip Lougher wrote: CCing Junjiro Okijima and Stephen Hemminger On 08/11/13 02:42, Minchan Kim wrote: Hello Phillip, On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote: Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: the nature of percpu variables ensures decompression is load-balanced across the multiple cores. Disadvantages: it limits decompression to one thread per core. At a glance, I understand your concern but I don't see benefit to make this feature as separate new config because we can modify the number of decompressor per core in the future. I don't want to create new config SQUASHFS_DECOMP_MULTI_3, SQUASHFS_DECOMP_MULTI_4 and so on. :) You misunderstand I have been sent two multi-threaded implementations in the past which use percpu variables: 1. First patch set: http://www.spinics.net/lists/linux-fsdevel/msg34365.html Later in early 2011, I explained why I'd not merged the patches, and promised to do so when I got time http://www.spinics.net/lists/linux-fsdevel/msg42392.html 2. Second patch set sent in 2011 http://www.spinics.net/lists/linux-fsdevel/msg44111.html So, these patches have been in my inbox, waiting until I got time to refactor Squashfs so that they could be merged... and I finally got to do this last month, which is why I'm merging a combined version of both patches now. As to why have *two* implementations, I previously explained these two approaches are complementary, and merging both allows the user to decide which method of parallelising Squashfs they want to do. What I see for benefit decompressor_multi_percpu is only limit memory overhead but it could be solved by dynamic shrinking as I mentioned earlier. About CPU usage, I'm not sure how decompressor_multi is horrible. If it's really concern, we can fix it to limit number of decomp stream by admin via sysfs or something. Decompression load balance? Acutally, I don't understand your claim. Could you elaborate it a bit? I couldn't understand why decompressor_multi_percpu is better than decompressor_multi. The percpu implementation is a good approach to parallelising Squashfs. It is extremely simple, both in code and overhead. I agree about code but not sure about overhead. I admit decompressor_multi is bigger but it consists of simple opeartions. Sometime, kmalloc's cost could be higher if system memory pressure is severe so that file system's performance would fluctuate. If it's your concern, we could make threshold min/high so that it could guarantee some speed at least. The decompression hotpath simply consists of taking a percpu variable, doing the decompression, and then a release. When decomp buffer is created dynamically, it could be rather overhead compared to percpu approach but once it did, the overhead of decompress would be marginal. Looking at code sizes: fs/squashfs/decompressor_multi.c| 199 +++ fs/squashfs/decompressor_multi_percpu.c | 104 fs/squashfs/decompressor_single.c | 85 + The simplicity of the percpu approach is readily apparent, at 104 lines it is only slightly larger than the single threaded implementation. Personally I like both approaches, and I have no reason not to merge both implementations I have been sent. Okay. I'm not a maintainer so I'm not strong against your thougt. I just wanted to unify them if either of them isn't sigificantly win because I thought it makes maintainer and contributors happy due to avoid more CONFIG options which should be considered when some feature is added. But what does the community think here? Do you want the percpu implementation? Do you see value in having two implementations? Feedback is appreciated. As I mentioned, my opinion is that let's unify them if either of them is significantly win. It would be better to see some benchmark result. How about this? 1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it to sysfs so user can tune it in rumtime. 2. put decompressor shrink logic by slab shrinker so if system has memory pressure, we could catch the event and free some of decompressor but memory pressure is not severe again in the future, we can create new decompressor until reaching threadhold user define. We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL in get_decomp_stream's allocation indirectly. This adds extra complexity to an implementation already 199 lines long (as opposed to 104 for the percpu implementation). The whole point of the percpu implementation is to add a
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
CCing Junjiro Okijima and Stephen Hemminger On 08/11/13 02:42, Minchan Kim wrote: Hello Phillip, On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote: Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: the nature of percpu variables ensures decompression is load-balanced across the multiple cores. Disadvantages: it limits decompression to one thread per core. At a glance, I understand your concern but I don't see benefit to make this feature as separate new config because we can modify the number of decompressor per core in the future. I don't want to create new config SQUASHFS_DECOMP_MULTI_3, SQUASHFS_DECOMP_MULTI_4 and so on. :) You misunderstand I have been sent two multi-threaded implementations in the past which use percpu variables: 1. First patch set: http://www.spinics.net/lists/linux-fsdevel/msg34365.html Later in early 2011, I explained why I'd not merged the patches, and promised to do so when I got time http://www.spinics.net/lists/linux-fsdevel/msg42392.html 2. Second patch set sent in 2011 http://www.spinics.net/lists/linux-fsdevel/msg44111.html So, these patches have been in my inbox, waiting until I got time to refactor Squashfs so that they could be merged... and I finally got to do this last month, which is why I'm merging a combined version of both patches now. As to why have *two* implementations, I previously explained these two approaches are complementary, and merging both allows the user to decide which method of parallelising Squashfs they want to do. The percpu implementation is a good approach to parallelising Squashfs. It is extremely simple, both in code and overhead. The decompression hotpath simply consists of taking a percpu variable, doing the decompression, and then a release. Looking at code sizes: fs/squashfs/decompressor_multi.c| 199 +++ fs/squashfs/decompressor_multi_percpu.c | 104 fs/squashfs/decompressor_single.c | 85 + The simplicity of the percpu approach is readily apparent, at 104 lines it is only slightly larger than the single threaded implementation. Personally I like both approaches, and I have no reason not to merge both implementations I have been sent. But what does the community think here? Do you want the percpu implementation? Do you see value in having two implementations? Feedback is appreciated. How about this? 1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it to sysfs so user can tune it in rumtime. 2. put decompressor shrink logic by slab shrinker so if system has memory pressure, we could catch the event and free some of decompressor but memory pressure is not severe again in the future, we can create new decompressor until reaching threadhold user define. We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL in get_decomp_stream's allocation indirectly. This adds extra complexity to an implementation already 199 lines long (as opposed to 104 for the percpu implementation). The whole point of the percpu implementation is to add a simple implementation that may suit many systems. Phillip In short, let's make decompressor_multi as dynamically tuned system and user can limit the max. Signed-off-by: Phillip Lougher --- fs/squashfs/Kconfig | 57 + fs/squashfs/Makefile| 10 +-- fs/squashfs/decompressor_multi_percpu.c | 105 +++ 3 files changed, 152 insertions(+), 20 deletions(-) create mode 100644 fs/squashfs/decompressor_multi_percpu.c diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig index 1c6d340..c92c75f 100644 --- a/fs/squashfs/Kconfig +++ b/fs/squashfs/Kconfig @@ -25,6 +25,50 @@ config SQUASHFS If unsure, say N. +choice + prompt "Decompressor parallelisation options" + depends on SQUASHFS + help + Squashfs now supports three parallelisation options for + decompression. Each one exhibits various trade-offs between + decompression performance and CPU and memory usage. + + If in doubt, select "Single threaded compression" + +config SQUASHFS_DECOMP_SINGLE + bool "Single threaded compression" + help + Traditionally Squashfs has used single-threaded decompression. + Only one block (data or metadata) can be decompressed at any + one time. This limits CPU and memory usage to a minimum. + +config SQUASHFS_DECOMP_MULTI + bool "Use multiple decompressors for parallel I/O" + help + By default Squashfs uses a single decompressor but it gives + poor performance on parallel I/O workloads when using multiple CPU + machines due
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
CCing Junjiro Okijima and Stephen Hemminger On 08/11/13 02:42, Minchan Kim wrote: Hello Phillip, On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote: Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: the nature of percpu variables ensures decompression is load-balanced across the multiple cores. Disadvantages: it limits decompression to one thread per core. At a glance, I understand your concern but I don't see benefit to make this feature as separate new config because we can modify the number of decompressor per core in the future. I don't want to create new config SQUASHFS_DECOMP_MULTI_3, SQUASHFS_DECOMP_MULTI_4 and so on. :) You misunderstand I have been sent two multi-threaded implementations in the past which use percpu variables: 1. First patch set: http://www.spinics.net/lists/linux-fsdevel/msg34365.html Later in early 2011, I explained why I'd not merged the patches, and promised to do so when I got time http://www.spinics.net/lists/linux-fsdevel/msg42392.html 2. Second patch set sent in 2011 http://www.spinics.net/lists/linux-fsdevel/msg44111.html So, these patches have been in my inbox, waiting until I got time to refactor Squashfs so that they could be merged... and I finally got to do this last month, which is why I'm merging a combined version of both patches now. As to why have *two* implementations, I previously explained these two approaches are complementary, and merging both allows the user to decide which method of parallelising Squashfs they want to do. The percpu implementation is a good approach to parallelising Squashfs. It is extremely simple, both in code and overhead. The decompression hotpath simply consists of taking a percpu variable, doing the decompression, and then a release. Looking at code sizes: fs/squashfs/decompressor_multi.c| 199 +++ fs/squashfs/decompressor_multi_percpu.c | 104 fs/squashfs/decompressor_single.c | 85 + The simplicity of the percpu approach is readily apparent, at 104 lines it is only slightly larger than the single threaded implementation. Personally I like both approaches, and I have no reason not to merge both implementations I have been sent. But what does the community think here? Do you want the percpu implementation? Do you see value in having two implementations? Feedback is appreciated. How about this? 1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it to sysfs so user can tune it in rumtime. 2. put decompressor shrink logic by slab shrinker so if system has memory pressure, we could catch the event and free some of decompressor but memory pressure is not severe again in the future, we can create new decompressor until reaching threadhold user define. We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL in get_decomp_stream's allocation indirectly. This adds extra complexity to an implementation already 199 lines long (as opposed to 104 for the percpu implementation). The whole point of the percpu implementation is to add a simple implementation that may suit many systems. Phillip In short, let's make decompressor_multi as dynamically tuned system and user can limit the max. Signed-off-by: Phillip Lougher phil...@squashfs.org.uk --- fs/squashfs/Kconfig | 57 + fs/squashfs/Makefile| 10 +-- fs/squashfs/decompressor_multi_percpu.c | 105 +++ 3 files changed, 152 insertions(+), 20 deletions(-) create mode 100644 fs/squashfs/decompressor_multi_percpu.c diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig index 1c6d340..c92c75f 100644 --- a/fs/squashfs/Kconfig +++ b/fs/squashfs/Kconfig @@ -25,6 +25,50 @@ config SQUASHFS If unsure, say N. +choice + prompt Decompressor parallelisation options + depends on SQUASHFS + help + Squashfs now supports three parallelisation options for + decompression. Each one exhibits various trade-offs between + decompression performance and CPU and memory usage. + + If in doubt, select Single threaded compression + +config SQUASHFS_DECOMP_SINGLE + bool Single threaded compression + help + Traditionally Squashfs has used single-threaded decompression. + Only one block (data or metadata) can be decompressed at any + one time. This limits CPU and memory usage to a minimum. + +config SQUASHFS_DECOMP_MULTI + bool Use multiple decompressors for parallel I/O + help + By default Squashfs uses a single decompressor but it gives + poor performance on parallel I/O workloads when using multiple CPU +
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
Hello Phillip, On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote: > Add a multi-threaded decompression implementation which uses > percpu variables. > > Using percpu variables has advantages and disadvantages over > implementations which do not use percpu variables. > > Advantages: the nature of percpu variables ensures decompression is > load-balanced across the multiple cores. > > Disadvantages: it limits decompression to one thread per core. At a glance, I understand your concern but I don't see benefit to make this feature as separate new config because we can modify the number of decompressor per core in the future. I don't want to create new config SQUASHFS_DECOMP_MULTI_3, SQUASHFS_DECOMP_MULTI_4 and so on. :) How about this? 1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it to sysfs so user can tune it in rumtime. 2. put decompressor shrink logic by slab shrinker so if system has memory pressure, we could catch the event and free some of decompressor but memory pressure is not severe again in the future, we can create new decompressor until reaching threadhold user define. We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL in get_decomp_stream's allocation indirectly. In short, let's make decompressor_multi as dynamically tuned system and user can limit the max. > > Signed-off-by: Phillip Lougher > --- > fs/squashfs/Kconfig | 57 + > fs/squashfs/Makefile| 10 +-- > fs/squashfs/decompressor_multi_percpu.c | 105 > +++ > 3 files changed, 152 insertions(+), 20 deletions(-) > create mode 100644 fs/squashfs/decompressor_multi_percpu.c > > diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig > index 1c6d340..c92c75f 100644 > --- a/fs/squashfs/Kconfig > +++ b/fs/squashfs/Kconfig > @@ -25,6 +25,50 @@ config SQUASHFS > > If unsure, say N. > > +choice > + prompt "Decompressor parallelisation options" > + depends on SQUASHFS > + help > + Squashfs now supports three parallelisation options for > + decompression. Each one exhibits various trade-offs between > + decompression performance and CPU and memory usage. > + > + If in doubt, select "Single threaded compression" > + > +config SQUASHFS_DECOMP_SINGLE > + bool "Single threaded compression" > + help > + Traditionally Squashfs has used single-threaded decompression. > + Only one block (data or metadata) can be decompressed at any > + one time. This limits CPU and memory usage to a minimum. > + > +config SQUASHFS_DECOMP_MULTI > + bool "Use multiple decompressors for parallel I/O" > + help > + By default Squashfs uses a single decompressor but it gives > + poor performance on parallel I/O workloads when using multiple CPU > + machines due to waiting on decompressor availability. > + > + If you have a parallel I/O workload and your system has enough memory, > + using this option may improve overall I/O performance. > + > + This decompressor implementation uses up to two parallel > + decompressors per core. It dynamically allocates decompressors > + on a demand basis. I'm not sure it's good idea to write how many of parallel decompressor per core. IMO, It's implementation stuff and user don't need to know internal. And we could modify it in the future. > + > +config SQUASHFS_DECOMP_MULTI_PERCPU > + bool "Use percpu multiple decompressors for parallel I/O" > + help Indentation. > + By default Squashfs uses a single decompressor but it gives > + poor performance on parallel I/O workloads when using multiple CPU > + machines due to waiting on decompressor availability. > + > + This decompressor implementation uses a maximum of one > + decompressor per core. It uses percpu variables to ensure > + decompression is load-balanced across the cores. > + > +endchoice > + > config SQUASHFS_XATTR > bool "Squashfs XATTR support" > depends on SQUASHFS > @@ -63,19 +107,6 @@ config SQUASHFS_LZO > > If unsure, say N. > > -config SQUASHFS_MULTI_DECOMPRESSOR > - bool "Use multiple decompressors for handling parallel I/O" > - depends on SQUASHFS > - help > - By default Squashfs uses a single decompressor but it gives > - poor performance on parallel I/O workloads when using multiple CPU > - machines due to waiting on decompressor availability. > - > - If you have a parallel I/O workload and your system has enough memory, > - using this option may improve overall I/O performance. > - > - If unsure, say N. > - > config SQUASHFS_XZ > bool "Include support for XZ compressed file systems" > depends on SQUASHFS > diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile > index
Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables
Hello Phillip, On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote: Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: the nature of percpu variables ensures decompression is load-balanced across the multiple cores. Disadvantages: it limits decompression to one thread per core. At a glance, I understand your concern but I don't see benefit to make this feature as separate new config because we can modify the number of decompressor per core in the future. I don't want to create new config SQUASHFS_DECOMP_MULTI_3, SQUASHFS_DECOMP_MULTI_4 and so on. :) How about this? 1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it to sysfs so user can tune it in rumtime. 2. put decompressor shrink logic by slab shrinker so if system has memory pressure, we could catch the event and free some of decompressor but memory pressure is not severe again in the future, we can create new decompressor until reaching threadhold user define. We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL in get_decomp_stream's allocation indirectly. In short, let's make decompressor_multi as dynamically tuned system and user can limit the max. Signed-off-by: Phillip Lougher phil...@squashfs.org.uk --- fs/squashfs/Kconfig | 57 + fs/squashfs/Makefile| 10 +-- fs/squashfs/decompressor_multi_percpu.c | 105 +++ 3 files changed, 152 insertions(+), 20 deletions(-) create mode 100644 fs/squashfs/decompressor_multi_percpu.c diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig index 1c6d340..c92c75f 100644 --- a/fs/squashfs/Kconfig +++ b/fs/squashfs/Kconfig @@ -25,6 +25,50 @@ config SQUASHFS If unsure, say N. +choice + prompt Decompressor parallelisation options + depends on SQUASHFS + help + Squashfs now supports three parallelisation options for + decompression. Each one exhibits various trade-offs between + decompression performance and CPU and memory usage. + + If in doubt, select Single threaded compression + +config SQUASHFS_DECOMP_SINGLE + bool Single threaded compression + help + Traditionally Squashfs has used single-threaded decompression. + Only one block (data or metadata) can be decompressed at any + one time. This limits CPU and memory usage to a minimum. + +config SQUASHFS_DECOMP_MULTI + bool Use multiple decompressors for parallel I/O + help + By default Squashfs uses a single decompressor but it gives + poor performance on parallel I/O workloads when using multiple CPU + machines due to waiting on decompressor availability. + + If you have a parallel I/O workload and your system has enough memory, + using this option may improve overall I/O performance. + + This decompressor implementation uses up to two parallel + decompressors per core. It dynamically allocates decompressors + on a demand basis. I'm not sure it's good idea to write how many of parallel decompressor per core. IMO, It's implementation stuff and user don't need to know internal. And we could modify it in the future. + +config SQUASHFS_DECOMP_MULTI_PERCPU + bool Use percpu multiple decompressors for parallel I/O + help Indentation. + By default Squashfs uses a single decompressor but it gives + poor performance on parallel I/O workloads when using multiple CPU + machines due to waiting on decompressor availability. + + This decompressor implementation uses a maximum of one + decompressor per core. It uses percpu variables to ensure + decompression is load-balanced across the cores. + +endchoice + config SQUASHFS_XATTR bool Squashfs XATTR support depends on SQUASHFS @@ -63,19 +107,6 @@ config SQUASHFS_LZO If unsure, say N. -config SQUASHFS_MULTI_DECOMPRESSOR - bool Use multiple decompressors for handling parallel I/O - depends on SQUASHFS - help - By default Squashfs uses a single decompressor but it gives - poor performance on parallel I/O workloads when using multiple CPU - machines due to waiting on decompressor availability. - - If you have a parallel I/O workload and your system has enough memory, - using this option may improve overall I/O performance. - - If unsure, say N. - config SQUASHFS_XZ bool Include support for XZ compressed file systems depends on SQUASHFS diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile index dfebc3b..5833b96 100644 --- a/fs/squashfs/Makefile +++ b/fs/squashfs/Makefile @@ -5,14 +5,10 @@