Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-18 Thread Minchan Kim
Hello Phillip,

Sorry for late response.

On Thu, Nov 14, 2013 at 05:04:36PM +, Phillip Lougher wrote:
> CCing Junjiro Okijima and Stephen Hemminger
> 
> On 08/11/13 02:42, Minchan Kim wrote:
> >
> >Hello Phillip,
> >
> >On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:
> >>Add a multi-threaded decompression implementation which uses
> >>percpu variables.
> >>
> >>Using percpu variables has advantages and disadvantages over
> >>implementations which do not use percpu variables.
> >>
> >>Advantages: the nature of percpu variables ensures decompression is
> >>load-balanced across the multiple cores.
> >>
> >>Disadvantages: it limits decompression to one thread per core.
> >
> >At a glance, I understand your concern but I don't see benefit to
> >make this feature as separate new config because we can modify the
> >number of decompressor per core in the future.
> >I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
> >SQUASHFS_DECOMP_MULTI_4 and so on. :)
> 
> You misunderstand
> 
> I have been sent two multi-threaded implementations in the
> past which use percpu variables:
> 
> 1.  First patch set:
> 
>http://www.spinics.net/lists/linux-fsdevel/msg34365.html
> 
>Later in early 2011, I explained why I'd not merged the
>patches, and promised to do so when I got time
> 
>http://www.spinics.net/lists/linux-fsdevel/msg42392.html
> 
> 
> 2.  Second patch set sent in 2011
> 
>http://www.spinics.net/lists/linux-fsdevel/msg44111.html
> 
> So, these patches have been in my inbox, waiting until I got
> time to refactor Squashfs so that they could be merged... and
> I finally got to do this last month, which is why I'm merging
> a combined version of both patches now.
> 
> As to why have *two* implementations, I previously explained these
> two approaches are complementary, and merging both allows the
> user to decide which method of parallelising Squashfs they want
> to do.

What I see for benefit decompressor_multi_percpu is only limit
memory overhead but it could be solved by dynamic shrinking as
I mentioned earlier.

About CPU usage, I'm not sure how decompressor_multi is horrible.
If it's really concern, we can fix it to limit number of decomp
stream by admin via sysfs or something.

Decompression load balance? Acutally, I don't understand your claim.
Could you elaborate it a bit?
I couldn't understand why decompressor_multi_percpu is better than
decompressor_multi.

> 
> The percpu implementation is a good approach to parallelising
> Squashfs.  It is extremely simple, both in code and overhead.

I agree about code but not sure about overhead.
I admit decompressor_multi is bigger but it consists of simple opeartions.
Sometime, kmalloc's cost could be higher if system memory pressure is
severe so that file system's performance would fluctuate. If it's
your concern, we could make threshold min/high so that it could guarantee
some speed at least.


> The decompression hotpath simply consists of taking a percpu
> variable, doing the decompression, and then a release.

When decomp buffer is created dynamically, it could be rather overhead
compared to percpu approach but once it did, the overhead of decompress
would be marginal.

> Looking at code sizes:
> 
> fs/squashfs/decompressor_multi.c|  199 +++
> fs/squashfs/decompressor_multi_percpu.c |  104 
> fs/squashfs/decompressor_single.c   |   85 +
> 
> The simplicity of the percpu approach is readily apparent, at 104
> lines it is only slightly larger than the single threaded
> implementation.
> 
> Personally I like both approaches, and I have no reason not to
> merge both implementations I have been sent.

Okay. I'm not a maintainer so I'm not strong against your thougt.
I just wanted to unify them if either of them isn't sigificantly win
because I thought it makes maintainer and contributors happy due to
avoid more CONFIG options which should be considered when some feature
is added.

> 
> But what does the community think here?  Do you want the percpu
> implementation?  Do you see value in having two implementations?
> Feedback is appreciated.

As I mentioned, my opinion is that let's unify them if either of them
is significantly win. It would be better to see some benchmark result.

> 
> >
> >How about this?
> >
> >1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
> >  in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
> >  to sysfs so user can tune it in rumtime.
> >
> >2. put decompressor shrink logic by slab shrinker so if system has
> >  memory pressure, we could catch the event and free some of decompressor
> >  but memory pressure is not severe again in the future, we can create
> >  new decompressor until reaching threadhold user define.
> >  We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
> >  in get_decomp_stream's allocation indirectly.
> 
> This adds extra complexity to an implementation already 

Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-18 Thread J. R. Okajima

Phillip Lougher:
> CCing Junjiro Okijima and Stephen Hemminger

Thank you for CCing, and sorry for my slow responce.


> >> Using percpu variables has advantages and disadvantages over
> >> implementations which do not use percpu variables.
> >>
> >> Advantages: the nature of percpu variables ensures decompression is
> >> load-balanced across the multiple cores.
> >>
> >> Disadvantages: it limits decompression to one thread per core.

Honestly speaking, I don't remember the details of squashfs. It was a
long long time ago when I read and modified squashfs.
Anyway I will try replying.

Percpu is a good approach. Obviously, as you mentioned as
disadvantage, it depends the balance between these two things.
- How many I/Os in parallel?
- How much does the decompression cost?
My current guess is the latter is heavier (for the performance), so I
guess percpu is good.

Is it guranteed that the decompressor never require any new resources?
Under heavy I/O and memory pressure,
if the decompressor wants some memory between get_cpu_ptr() and
put_cpu_ptr(),
and if the decompressor is running on all other cores at the same time,
then does squashfs simply return ENOMEM because the memory shrinker
cannot run on any core?
If it is true, we may need a rule "no new resources for decompressing"
since users may prefer the "slow but successful decompression" than
getting ENOMEM.

If this mail is totaly pointless, please ignore.


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-18 Thread J. R. Okajima

Phillip Lougher:
 CCing Junjiro Okijima and Stephen Hemminger

Thank you for CCing, and sorry for my slow responce.


  Using percpu variables has advantages and disadvantages over
  implementations which do not use percpu variables.
 
  Advantages: the nature of percpu variables ensures decompression is
  load-balanced across the multiple cores.
 
  Disadvantages: it limits decompression to one thread per core.

Honestly speaking, I don't remember the details of squashfs. It was a
long long time ago when I read and modified squashfs.
Anyway I will try replying.

Percpu is a good approach. Obviously, as you mentioned as
disadvantage, it depends the balance between these two things.
- How many I/Os in parallel?
- How much does the decompression cost?
My current guess is the latter is heavier (for the performance), so I
guess percpu is good.

Is it guranteed that the decompressor never require any new resources?
Under heavy I/O and memory pressure,
if the decompressor wants some memory between get_cpu_ptr() and
put_cpu_ptr(),
and if the decompressor is running on all other cores at the same time,
then does squashfs simply return ENOMEM because the memory shrinker
cannot run on any core?
If it is true, we may need a rule no new resources for decompressing
since users may prefer the slow but successful decompression than
getting ENOMEM.

If this mail is totaly pointless, please ignore.


J. R. Okajima
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-18 Thread Minchan Kim
Hello Phillip,

Sorry for late response.

On Thu, Nov 14, 2013 at 05:04:36PM +, Phillip Lougher wrote:
 CCing Junjiro Okijima and Stephen Hemminger
 
 On 08/11/13 02:42, Minchan Kim wrote:
 
 Hello Phillip,
 
 On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:
 Add a multi-threaded decompression implementation which uses
 percpu variables.
 
 Using percpu variables has advantages and disadvantages over
 implementations which do not use percpu variables.
 
 Advantages: the nature of percpu variables ensures decompression is
 load-balanced across the multiple cores.
 
 Disadvantages: it limits decompression to one thread per core.
 
 At a glance, I understand your concern but I don't see benefit to
 make this feature as separate new config because we can modify the
 number of decompressor per core in the future.
 I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
 SQUASHFS_DECOMP_MULTI_4 and so on. :)
 
 You misunderstand
 
 I have been sent two multi-threaded implementations in the
 past which use percpu variables:
 
 1.  First patch set:
 
http://www.spinics.net/lists/linux-fsdevel/msg34365.html
 
Later in early 2011, I explained why I'd not merged the
patches, and promised to do so when I got time
 
http://www.spinics.net/lists/linux-fsdevel/msg42392.html
 
 
 2.  Second patch set sent in 2011
 
http://www.spinics.net/lists/linux-fsdevel/msg44111.html
 
 So, these patches have been in my inbox, waiting until I got
 time to refactor Squashfs so that they could be merged... and
 I finally got to do this last month, which is why I'm merging
 a combined version of both patches now.
 
 As to why have *two* implementations, I previously explained these
 two approaches are complementary, and merging both allows the
 user to decide which method of parallelising Squashfs they want
 to do.

What I see for benefit decompressor_multi_percpu is only limit
memory overhead but it could be solved by dynamic shrinking as
I mentioned earlier.

About CPU usage, I'm not sure how decompressor_multi is horrible.
If it's really concern, we can fix it to limit number of decomp
stream by admin via sysfs or something.

Decompression load balance? Acutally, I don't understand your claim.
Could you elaborate it a bit?
I couldn't understand why decompressor_multi_percpu is better than
decompressor_multi.

 
 The percpu implementation is a good approach to parallelising
 Squashfs.  It is extremely simple, both in code and overhead.

I agree about code but not sure about overhead.
I admit decompressor_multi is bigger but it consists of simple opeartions.
Sometime, kmalloc's cost could be higher if system memory pressure is
severe so that file system's performance would fluctuate. If it's
your concern, we could make threshold min/high so that it could guarantee
some speed at least.


 The decompression hotpath simply consists of taking a percpu
 variable, doing the decompression, and then a release.

When decomp buffer is created dynamically, it could be rather overhead
compared to percpu approach but once it did, the overhead of decompress
would be marginal.

 Looking at code sizes:
 
 fs/squashfs/decompressor_multi.c|  199 +++
 fs/squashfs/decompressor_multi_percpu.c |  104 
 fs/squashfs/decompressor_single.c   |   85 +
 
 The simplicity of the percpu approach is readily apparent, at 104
 lines it is only slightly larger than the single threaded
 implementation.
 
 Personally I like both approaches, and I have no reason not to
 merge both implementations I have been sent.

Okay. I'm not a maintainer so I'm not strong against your thougt.
I just wanted to unify them if either of them isn't sigificantly win
because I thought it makes maintainer and contributors happy due to
avoid more CONFIG options which should be considered when some feature
is added.

 
 But what does the community think here?  Do you want the percpu
 implementation?  Do you see value in having two implementations?
 Feedback is appreciated.

As I mentioned, my opinion is that let's unify them if either of them
is significantly win. It would be better to see some benchmark result.

 
 
 How about this?
 
 1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
   in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
   to sysfs so user can tune it in rumtime.
 
 2. put decompressor shrink logic by slab shrinker so if system has
   memory pressure, we could catch the event and free some of decompressor
   but memory pressure is not severe again in the future, we can create
   new decompressor until reaching threadhold user define.
   We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
   in get_decomp_stream's allocation indirectly.
 
 This adds extra complexity to an implementation already 199 lines long
 (as opposed to 104 for the percpu implementation).   The whole point of
 the percpu implementation is to add a 

Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-14 Thread Phillip Lougher

CCing Junjiro Okijima and Stephen Hemminger

On 08/11/13 02:42, Minchan Kim wrote:


Hello Phillip,

On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:

Add a multi-threaded decompression implementation which uses
percpu variables.

Using percpu variables has advantages and disadvantages over
implementations which do not use percpu variables.

Advantages: the nature of percpu variables ensures decompression is
load-balanced across the multiple cores.

Disadvantages: it limits decompression to one thread per core.


At a glance, I understand your concern but I don't see benefit to
make this feature as separate new config because we can modify the
number of decompressor per core in the future.
I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
SQUASHFS_DECOMP_MULTI_4 and so on. :)


You misunderstand

I have been sent two multi-threaded implementations in the
past which use percpu variables:

1.  First patch set:

   http://www.spinics.net/lists/linux-fsdevel/msg34365.html

   Later in early 2011, I explained why I'd not merged the
   patches, and promised to do so when I got time

   http://www.spinics.net/lists/linux-fsdevel/msg42392.html


2.  Second patch set sent in 2011

   http://www.spinics.net/lists/linux-fsdevel/msg44111.html

So, these patches have been in my inbox, waiting until I got
time to refactor Squashfs so that they could be merged... and
I finally got to do this last month, which is why I'm merging
a combined version of both patches now.

As to why have *two* implementations, I previously explained these
two approaches are complementary, and merging both allows the
user to decide which method of parallelising Squashfs they want
to do.

The percpu implementation is a good approach to parallelising
Squashfs.  It is extremely simple, both in code and overhead.
The decompression hotpath simply consists of taking a percpu
variable, doing the decompression, and then a release.

Looking at code sizes:

fs/squashfs/decompressor_multi.c|  199 +++
fs/squashfs/decompressor_multi_percpu.c |  104 
fs/squashfs/decompressor_single.c   |   85 +

The simplicity of the percpu approach is readily apparent, at 104
lines it is only slightly larger than the single threaded
implementation.

Personally I like both approaches, and I have no reason not to
merge both implementations I have been sent.

But what does the community think here?  Do you want the percpu
implementation?  Do you see value in having two implementations?
Feedback is appreciated.



How about this?

1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
  in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
  to sysfs so user can tune it in rumtime.

2. put decompressor shrink logic by slab shrinker so if system has
  memory pressure, we could catch the event and free some of decompressor
  but memory pressure is not severe again in the future, we can create
  new decompressor until reaching threadhold user define.
  We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
  in get_decomp_stream's allocation indirectly.


This adds extra complexity to an implementation already 199 lines long
(as opposed to 104 for the percpu implementation).   The whole point of
the percpu implementation is to add a simple implementation that
may suit many systems.

Phillip



In short, let's make decompressor_multi as dynamically tuned system
and user can limit the max.




Signed-off-by: Phillip Lougher 
---
 fs/squashfs/Kconfig |   57 +
 fs/squashfs/Makefile|   10 +--
 fs/squashfs/decompressor_multi_percpu.c |  105 +++
 3 files changed, 152 insertions(+), 20 deletions(-)
 create mode 100644 fs/squashfs/decompressor_multi_percpu.c

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index 1c6d340..c92c75f 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -25,6 +25,50 @@ config SQUASHFS

  If unsure, say N.

+choice
+   prompt "Decompressor parallelisation options"
+   depends on SQUASHFS
+   help
+ Squashfs now supports three parallelisation options for
+ decompression.  Each one exhibits various trade-offs between
+ decompression performance and CPU and memory usage.
+
+ If in doubt, select "Single threaded compression"
+
+config SQUASHFS_DECOMP_SINGLE
+   bool "Single threaded compression"
+   help
+ Traditionally Squashfs has used single-threaded decompression.
+ Only one block (data or metadata) can be decompressed at any
+ one time.  This limits CPU and memory usage to a minimum.
+
+config SQUASHFS_DECOMP_MULTI
+   bool "Use multiple decompressors for parallel I/O"
+   help
+ By default Squashfs uses a single decompressor but it gives
+ poor performance on parallel I/O workloads when using multiple CPU
+ machines due 

Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-14 Thread Phillip Lougher

CCing Junjiro Okijima and Stephen Hemminger

On 08/11/13 02:42, Minchan Kim wrote:


Hello Phillip,

On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:

Add a multi-threaded decompression implementation which uses
percpu variables.

Using percpu variables has advantages and disadvantages over
implementations which do not use percpu variables.

Advantages: the nature of percpu variables ensures decompression is
load-balanced across the multiple cores.

Disadvantages: it limits decompression to one thread per core.


At a glance, I understand your concern but I don't see benefit to
make this feature as separate new config because we can modify the
number of decompressor per core in the future.
I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
SQUASHFS_DECOMP_MULTI_4 and so on. :)


You misunderstand

I have been sent two multi-threaded implementations in the
past which use percpu variables:

1.  First patch set:

   http://www.spinics.net/lists/linux-fsdevel/msg34365.html

   Later in early 2011, I explained why I'd not merged the
   patches, and promised to do so when I got time

   http://www.spinics.net/lists/linux-fsdevel/msg42392.html


2.  Second patch set sent in 2011

   http://www.spinics.net/lists/linux-fsdevel/msg44111.html

So, these patches have been in my inbox, waiting until I got
time to refactor Squashfs so that they could be merged... and
I finally got to do this last month, which is why I'm merging
a combined version of both patches now.

As to why have *two* implementations, I previously explained these
two approaches are complementary, and merging both allows the
user to decide which method of parallelising Squashfs they want
to do.

The percpu implementation is a good approach to parallelising
Squashfs.  It is extremely simple, both in code and overhead.
The decompression hotpath simply consists of taking a percpu
variable, doing the decompression, and then a release.

Looking at code sizes:

fs/squashfs/decompressor_multi.c|  199 +++
fs/squashfs/decompressor_multi_percpu.c |  104 
fs/squashfs/decompressor_single.c   |   85 +

The simplicity of the percpu approach is readily apparent, at 104
lines it is only slightly larger than the single threaded
implementation.

Personally I like both approaches, and I have no reason not to
merge both implementations I have been sent.

But what does the community think here?  Do you want the percpu
implementation?  Do you see value in having two implementations?
Feedback is appreciated.



How about this?

1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
  in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
  to sysfs so user can tune it in rumtime.

2. put decompressor shrink logic by slab shrinker so if system has
  memory pressure, we could catch the event and free some of decompressor
  but memory pressure is not severe again in the future, we can create
  new decompressor until reaching threadhold user define.
  We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
  in get_decomp_stream's allocation indirectly.


This adds extra complexity to an implementation already 199 lines long
(as opposed to 104 for the percpu implementation).   The whole point of
the percpu implementation is to add a simple implementation that
may suit many systems.

Phillip



In short, let's make decompressor_multi as dynamically tuned system
and user can limit the max.




Signed-off-by: Phillip Lougher phil...@squashfs.org.uk
---
 fs/squashfs/Kconfig |   57 +
 fs/squashfs/Makefile|   10 +--
 fs/squashfs/decompressor_multi_percpu.c |  105 +++
 3 files changed, 152 insertions(+), 20 deletions(-)
 create mode 100644 fs/squashfs/decompressor_multi_percpu.c

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index 1c6d340..c92c75f 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -25,6 +25,50 @@ config SQUASHFS

  If unsure, say N.

+choice
+   prompt Decompressor parallelisation options
+   depends on SQUASHFS
+   help
+ Squashfs now supports three parallelisation options for
+ decompression.  Each one exhibits various trade-offs between
+ decompression performance and CPU and memory usage.
+
+ If in doubt, select Single threaded compression
+
+config SQUASHFS_DECOMP_SINGLE
+   bool Single threaded compression
+   help
+ Traditionally Squashfs has used single-threaded decompression.
+ Only one block (data or metadata) can be decompressed at any
+ one time.  This limits CPU and memory usage to a minimum.
+
+config SQUASHFS_DECOMP_MULTI
+   bool Use multiple decompressors for parallel I/O
+   help
+ By default Squashfs uses a single decompressor but it gives
+ poor performance on parallel I/O workloads when using multiple CPU
+

Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-07 Thread Minchan Kim
Hello Phillip,

On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:
> Add a multi-threaded decompression implementation which uses
> percpu variables.
> 
> Using percpu variables has advantages and disadvantages over
> implementations which do not use percpu variables.
> 
> Advantages: the nature of percpu variables ensures decompression is
> load-balanced across the multiple cores.
> 
> Disadvantages: it limits decompression to one thread per core.

At a glance, I understand your concern but I don't see benefit to
make this feature as separate new config because we can modify the
number of decompressor per core in the future.
I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
SQUASHFS_DECOMP_MULTI_4 and so on. :)

How about this?

1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
   in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
   to sysfs so user can tune it in rumtime.

2. put decompressor shrink logic by slab shrinker so if system has
   memory pressure, we could catch the event and free some of decompressor
   but memory pressure is not severe again in the future, we can create
   new decompressor until reaching threadhold user define.
   We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
   in get_decomp_stream's allocation indirectly.

In short, let's make decompressor_multi as dynamically tuned system
and user can limit the max.


> 
> Signed-off-by: Phillip Lougher 
> ---
>  fs/squashfs/Kconfig |   57 +
>  fs/squashfs/Makefile|   10 +--
>  fs/squashfs/decompressor_multi_percpu.c |  105 
> +++
>  3 files changed, 152 insertions(+), 20 deletions(-)
>  create mode 100644 fs/squashfs/decompressor_multi_percpu.c
> 
> diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
> index 1c6d340..c92c75f 100644
> --- a/fs/squashfs/Kconfig
> +++ b/fs/squashfs/Kconfig
> @@ -25,6 +25,50 @@ config SQUASHFS
>  
> If unsure, say N.
>  
> +choice
> + prompt "Decompressor parallelisation options"
> + depends on SQUASHFS
> + help
> +   Squashfs now supports three parallelisation options for
> +   decompression.  Each one exhibits various trade-offs between
> +   decompression performance and CPU and memory usage.
> +
> +   If in doubt, select "Single threaded compression"
> +
> +config SQUASHFS_DECOMP_SINGLE
> + bool "Single threaded compression"
> + help
> +   Traditionally Squashfs has used single-threaded decompression.
> +   Only one block (data or metadata) can be decompressed at any
> +   one time.  This limits CPU and memory usage to a minimum.
> +
> +config SQUASHFS_DECOMP_MULTI
> + bool "Use multiple decompressors for parallel I/O"
> + help
> +   By default Squashfs uses a single decompressor but it gives
> +   poor performance on parallel I/O workloads when using multiple CPU
> +   machines due to waiting on decompressor availability.
> +
> +   If you have a parallel I/O workload and your system has enough memory,
> +   using this option may improve overall I/O performance.
> +
> +   This decompressor implementation uses up to two parallel
> +   decompressors per core.  It dynamically allocates decompressors
> +   on a demand basis.

I'm not sure it's good idea to write how many of parallel decompressor
per core. IMO, It's implementation stuff and user don't need to know internal.
And we could modify it in the future.

> +
> +config SQUASHFS_DECOMP_MULTI_PERCPU
> +   bool "Use percpu multiple decompressors for parallel I/O"
> + help

Indentation.

> +   By default Squashfs uses a single decompressor but it gives
> +   poor performance on parallel I/O workloads when using multiple CPU
> +   machines due to waiting on decompressor availability.
> +
> +   This decompressor implementation uses a maximum of one
> +   decompressor per core.  It uses percpu variables to ensure
> +   decompression is load-balanced across the cores.
> +
> +endchoice
> +
>  config SQUASHFS_XATTR
>   bool "Squashfs XATTR support"
>   depends on SQUASHFS
> @@ -63,19 +107,6 @@ config SQUASHFS_LZO
>  
> If unsure, say N.
>  
> -config SQUASHFS_MULTI_DECOMPRESSOR
> - bool "Use multiple decompressors for handling parallel I/O"
> - depends on SQUASHFS
> - help
> -   By default Squashfs uses a single decompressor but it gives
> -   poor performance on parallel I/O workloads when using multiple CPU
> -   machines due to waiting on decompressor availability.
> -
> -   If you have a parallel I/O workload and your system has enough memory,
> -   using this option may improve overall I/O performance.
> -
> -   If unsure, say N.
> -
>  config SQUASHFS_XZ
>   bool "Include support for XZ compressed file systems"
>   depends on SQUASHFS
> diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
> index 

Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-07 Thread Minchan Kim
Hello Phillip,

On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:
 Add a multi-threaded decompression implementation which uses
 percpu variables.
 
 Using percpu variables has advantages and disadvantages over
 implementations which do not use percpu variables.
 
 Advantages: the nature of percpu variables ensures decompression is
 load-balanced across the multiple cores.
 
 Disadvantages: it limits decompression to one thread per core.

At a glance, I understand your concern but I don't see benefit to
make this feature as separate new config because we can modify the
number of decompressor per core in the future.
I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
SQUASHFS_DECOMP_MULTI_4 and so on. :)

How about this?

1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
   in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
   to sysfs so user can tune it in rumtime.

2. put decompressor shrink logic by slab shrinker so if system has
   memory pressure, we could catch the event and free some of decompressor
   but memory pressure is not severe again in the future, we can create
   new decompressor until reaching threadhold user define.
   We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
   in get_decomp_stream's allocation indirectly.

In short, let's make decompressor_multi as dynamically tuned system
and user can limit the max.


 
 Signed-off-by: Phillip Lougher phil...@squashfs.org.uk
 ---
  fs/squashfs/Kconfig |   57 +
  fs/squashfs/Makefile|   10 +--
  fs/squashfs/decompressor_multi_percpu.c |  105 
 +++
  3 files changed, 152 insertions(+), 20 deletions(-)
  create mode 100644 fs/squashfs/decompressor_multi_percpu.c
 
 diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
 index 1c6d340..c92c75f 100644
 --- a/fs/squashfs/Kconfig
 +++ b/fs/squashfs/Kconfig
 @@ -25,6 +25,50 @@ config SQUASHFS
  
 If unsure, say N.
  
 +choice
 + prompt Decompressor parallelisation options
 + depends on SQUASHFS
 + help
 +   Squashfs now supports three parallelisation options for
 +   decompression.  Each one exhibits various trade-offs between
 +   decompression performance and CPU and memory usage.
 +
 +   If in doubt, select Single threaded compression
 +
 +config SQUASHFS_DECOMP_SINGLE
 + bool Single threaded compression
 + help
 +   Traditionally Squashfs has used single-threaded decompression.
 +   Only one block (data or metadata) can be decompressed at any
 +   one time.  This limits CPU and memory usage to a minimum.
 +
 +config SQUASHFS_DECOMP_MULTI
 + bool Use multiple decompressors for parallel I/O
 + help
 +   By default Squashfs uses a single decompressor but it gives
 +   poor performance on parallel I/O workloads when using multiple CPU
 +   machines due to waiting on decompressor availability.
 +
 +   If you have a parallel I/O workload and your system has enough memory,
 +   using this option may improve overall I/O performance.
 +
 +   This decompressor implementation uses up to two parallel
 +   decompressors per core.  It dynamically allocates decompressors
 +   on a demand basis.

I'm not sure it's good idea to write how many of parallel decompressor
per core. IMO, It's implementation stuff and user don't need to know internal.
And we could modify it in the future.

 +
 +config SQUASHFS_DECOMP_MULTI_PERCPU
 +   bool Use percpu multiple decompressors for parallel I/O
 + help

Indentation.

 +   By default Squashfs uses a single decompressor but it gives
 +   poor performance on parallel I/O workloads when using multiple CPU
 +   machines due to waiting on decompressor availability.
 +
 +   This decompressor implementation uses a maximum of one
 +   decompressor per core.  It uses percpu variables to ensure
 +   decompression is load-balanced across the cores.
 +
 +endchoice
 +
  config SQUASHFS_XATTR
   bool Squashfs XATTR support
   depends on SQUASHFS
 @@ -63,19 +107,6 @@ config SQUASHFS_LZO
  
 If unsure, say N.
  
 -config SQUASHFS_MULTI_DECOMPRESSOR
 - bool Use multiple decompressors for handling parallel I/O
 - depends on SQUASHFS
 - help
 -   By default Squashfs uses a single decompressor but it gives
 -   poor performance on parallel I/O workloads when using multiple CPU
 -   machines due to waiting on decompressor availability.
 -
 -   If you have a parallel I/O workload and your system has enough memory,
 -   using this option may improve overall I/O performance.
 -
 -   If unsure, say N.
 -
  config SQUASHFS_XZ
   bool Include support for XZ compressed file systems
   depends on SQUASHFS
 diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
 index dfebc3b..5833b96 100644
 --- a/fs/squashfs/Makefile
 +++ b/fs/squashfs/Makefile
 @@ -5,14 +5,10 @@