Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-12 Thread Andreas Dilger

On Feb 11, 2014, at 12:58 PM, Thavatchai Makphaibulchoke 
 wrote:
> On 01/24/2014 11:09 PM, Andreas Dilger wrote:
>> I think the ext4 block groups are locked with the blockgroup_lock that has 
>> about the same number of locks as the number of cores, with a max of 128, 
>> IIRC.  See blockgroup_lock.h. 
>> 
>> While there is some chance of contention, it is also unlikely that all of 
>> the cores are locking this area at the same time.  
>> 
>> Cheers, Andreas
>> 
> 
> Andreas, looks like your assumption is correct.  On all 3 systems, 80, 60 and 
> 20 cores, I got almost identical aim7 results using either a smaller 
> dedicated lock array or the block group lock.  I'm inclined to go with using 
> the block group lock as it does not incur any extra space.
> 
> One problem is that, with the current implementation mbcache has no knowledge 
> of the super block, including its block group lock, of the filesystem.  In my 
> implementation I have to change the first argument of mb_cache_create() from 
> char * to struct super_block * to be able to access the super block's block 
> group lock.

Note that you don't have to use the ext4_sb_info->s_blockgroup_lock.
You can allocate and use a separate struct blockgroup_lock for mbcache
instead of allocating a spinlock array (and essentially reimplementing
the bgl_lock_*() code).  While it isn't a huge amount of duplication,
that code is already tuned for different SMP core configurations and
there is no reason NOT to use struct blockgroup_lock.

> This works with my proposed change to allocate an mb_cache for each mounted 
> ext4 filesystem.  This would also require the same change, allocating an 
> mb_cache for each mounted filesystem, to both ext2 and ext3, which would 
> increase the scope of the patch.  The other alternative, allocating a new 
> smaller spinlock array, would not require any change to either ext2 and ext3.
> 
> I'm working on resubmitting my patches using the block group locks and 
> extending the changes to also include both ext2 and ext3.  With this 
> approach, not only that no addition space for dedicated new spinlock array is 
> required, the e_bdev member of struct mb_cache_entry could also be removed, 
> reducing the space required for each mb_cache_entry.
> 
> Please let me know if you have any concern or suggestion.

I'm not against re-using the s_blockgroup_lock in the superblock, since
the chance of contention on this lock between threads is very small, as
there are normally 4x the number of spinlocks as cores.

You might consider starting with a dedicated struct blockgroup_lock in
the mbcache code, then move to use the in-superblock struct in a later
patch.  That would allow you to push and land the mbcache and ext4 patches
independently of the ext2 and ext3 patches (if they are big).  If the
ext2 and ext3 patches are relatively small then this extra complexity
in the patches may not be needed.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-12 Thread Andreas Dilger

On Feb 11, 2014, at 12:58 PM, Thavatchai Makphaibulchoke 
thavatchai.makpahibulch...@hp.com wrote:
 On 01/24/2014 11:09 PM, Andreas Dilger wrote:
 I think the ext4 block groups are locked with the blockgroup_lock that has 
 about the same number of locks as the number of cores, with a max of 128, 
 IIRC.  See blockgroup_lock.h. 
 
 While there is some chance of contention, it is also unlikely that all of 
 the cores are locking this area at the same time.  
 
 Cheers, Andreas
 
 
 Andreas, looks like your assumption is correct.  On all 3 systems, 80, 60 and 
 20 cores, I got almost identical aim7 results using either a smaller 
 dedicated lock array or the block group lock.  I'm inclined to go with using 
 the block group lock as it does not incur any extra space.
 
 One problem is that, with the current implementation mbcache has no knowledge 
 of the super block, including its block group lock, of the filesystem.  In my 
 implementation I have to change the first argument of mb_cache_create() from 
 char * to struct super_block * to be able to access the super block's block 
 group lock.

Note that you don't have to use the ext4_sb_info-s_blockgroup_lock.
You can allocate and use a separate struct blockgroup_lock for mbcache
instead of allocating a spinlock array (and essentially reimplementing
the bgl_lock_*() code).  While it isn't a huge amount of duplication,
that code is already tuned for different SMP core configurations and
there is no reason NOT to use struct blockgroup_lock.

 This works with my proposed change to allocate an mb_cache for each mounted 
 ext4 filesystem.  This would also require the same change, allocating an 
 mb_cache for each mounted filesystem, to both ext2 and ext3, which would 
 increase the scope of the patch.  The other alternative, allocating a new 
 smaller spinlock array, would not require any change to either ext2 and ext3.
 
 I'm working on resubmitting my patches using the block group locks and 
 extending the changes to also include both ext2 and ext3.  With this 
 approach, not only that no addition space for dedicated new spinlock array is 
 required, the e_bdev member of struct mb_cache_entry could also be removed, 
 reducing the space required for each mb_cache_entry.
 
 Please let me know if you have any concern or suggestion.

I'm not against re-using the s_blockgroup_lock in the superblock, since
the chance of contention on this lock between threads is very small, as
there are normally 4x the number of spinlocks as cores.

You might consider starting with a dedicated struct blockgroup_lock in
the mbcache code, then move to use the in-superblock struct in a later
patch.  That would allow you to push and land the mbcache and ext4 patches
independently of the ext2 and ext3 patches (if they are big).  If the
ext2 and ext3 patches are relatively small then this extra complexity
in the patches may not be needed.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-11 Thread Thavatchai Makphaibulchoke
On 01/24/2014 11:09 PM, Andreas Dilger wrote:
> I think the ext4 block groups are locked with the blockgroup_lock that has 
> about the same number of locks as the number of cores, with a max of 128, 
> IIRC.  See blockgroup_lock.h. 
> 
> While there is some chance of contention, it is also unlikely that all of the 
> cores are locking this area at the same time.  
> 
> Cheers, Andreas
> 

Andreas, looks like your assumption is correct.  On all 3 systems, 80, 60 and 
20 cores, I got almost identical aim7 results using either a smaller dedicated 
lock array or the block group lock.  I'm inclined to go with using the block 
group lock as it does not incur any extra space.

One problem is that, with the current implementation mbcache has no knowledge 
of the super block, including its block group lock, of the filesystem.  In my 
implementation I have to change the first argument of mb_cache_create() from 
char * to struct super_block * to be able to access the super block's block 
group lock. This works with my proposed change to allocate an mb_cache for each 
mounted ext4 filesystem.  This would also require the same change, allocating 
an mb_cache for each mounted filesystem, to both ext2 and ext3, which would 
increase the scope of the patch.  The other alternative, allocating a new 
smaller spinlock array, would not require any change to either ext2 and ext3.

I'm working on resubmitting my patches using the block group locks and 
extending the changes to also include both ext2 and ext3.  With this approach, 
not only that no addition space for dedicated new spinlock array is required, 
the e_bdev member of struct mb_cache_entry could also be removed, reducing the 
space required for each mb_cache_entry.

Please let me know if you have any concern or suggestion.

Thanks,
Mak.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-11 Thread Thavatchai Makphaibulchoke
On 01/28/2014 02:09 PM, Andreas Dilger wrote:
> On Jan 28, 2014, at 5:26 AM, George Spelvin  wrote:
>>> The third part of the patch further increases the scalablity of an ext4
>>> filesystem by having each ext4 fielsystem allocate and use its own private
>>> mbcache structure, instead of sharing a single mcache structures across all
>>> ext4 filesystems, and increases the size of its mbcache hash tables.
>>
>> Are you sure this helps?  The idea behind having one large mbcache is
>> that one large hash table will always be at least as well balanced as
>> multiple separate tables, if the total size is the same.
>>
>> If you have two size 2^n hash tables, the chance of collision is equal to
>> one size  2^(n+1) table if they're equally busy, and if they're unequally
>> busy. the latter is better.  The busier file system will take less time
>> per search, and since it's searched more often than the less-busy one,
>> net win.
>>
>> How does it compare with just increasing the hash table size but leaving
>> them combined?
> 
> Except that having one mbcache per block device would avoid the need
> to store the e_bdev pointer in thousands/millions of entries.  Since
> the blocks are never shared between different block devices, there
> is no caching benefit even if the same block is on two block devices.
> 
> Cheers, Andreas
> 

On all 3 systems, with 80, 60 and 20 cores, that I ran aim7 on, spreading test 
files across 4 ext4 filesystems, there seems to be no different in performance 
either with a single large hash table or a smaller one per filesystem.

Having said that, I still believe that having a separate hash table for each 
filesystem should scale better, as the size of a larger single hash table would 
be very arbitrary. As Andres mentioned above, with an mbcache per filesystem we 
would be able to remove the e_bdev member from the mb_cache_entry. It would 
also work well and also result in less mb_cache_entry lock contention, if we 
are to use the blockgroup locks, which are also on a per filesystem base, to 
implement the mb_cache_entry lock as suggested by Andreas.  

Please let me know if you have any further comment or concerns.

Thanks,
Mak.

 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-11 Thread Thavatchai Makphaibulchoke
On 01/28/2014 02:09 PM, Andreas Dilger wrote:
 On Jan 28, 2014, at 5:26 AM, George Spelvin li...@horizon.com wrote:
 The third part of the patch further increases the scalablity of an ext4
 filesystem by having each ext4 fielsystem allocate and use its own private
 mbcache structure, instead of sharing a single mcache structures across all
 ext4 filesystems, and increases the size of its mbcache hash tables.

 Are you sure this helps?  The idea behind having one large mbcache is
 that one large hash table will always be at least as well balanced as
 multiple separate tables, if the total size is the same.

 If you have two size 2^n hash tables, the chance of collision is equal to
 one size  2^(n+1) table if they're equally busy, and if they're unequally
 busy. the latter is better.  The busier file system will take less time
 per search, and since it's searched more often than the less-busy one,
 net win.

 How does it compare with just increasing the hash table size but leaving
 them combined?
 
 Except that having one mbcache per block device would avoid the need
 to store the e_bdev pointer in thousands/millions of entries.  Since
 the blocks are never shared between different block devices, there
 is no caching benefit even if the same block is on two block devices.
 
 Cheers, Andreas
 

On all 3 systems, with 80, 60 and 20 cores, that I ran aim7 on, spreading test 
files across 4 ext4 filesystems, there seems to be no different in performance 
either with a single large hash table or a smaller one per filesystem.

Having said that, I still believe that having a separate hash table for each 
filesystem should scale better, as the size of a larger single hash table would 
be very arbitrary. As Andres mentioned above, with an mbcache per filesystem we 
would be able to remove the e_bdev member from the mb_cache_entry. It would 
also work well and also result in less mb_cache_entry lock contention, if we 
are to use the blockgroup locks, which are also on a per filesystem base, to 
implement the mb_cache_entry lock as suggested by Andreas.  

Please let me know if you have any further comment or concerns.

Thanks,
Mak.

 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-11 Thread Thavatchai Makphaibulchoke
On 01/24/2014 11:09 PM, Andreas Dilger wrote:
 I think the ext4 block groups are locked with the blockgroup_lock that has 
 about the same number of locks as the number of cores, with a max of 128, 
 IIRC.  See blockgroup_lock.h. 
 
 While there is some chance of contention, it is also unlikely that all of the 
 cores are locking this area at the same time.  
 
 Cheers, Andreas
 

Andreas, looks like your assumption is correct.  On all 3 systems, 80, 60 and 
20 cores, I got almost identical aim7 results using either a smaller dedicated 
lock array or the block group lock.  I'm inclined to go with using the block 
group lock as it does not incur any extra space.

One problem is that, with the current implementation mbcache has no knowledge 
of the super block, including its block group lock, of the filesystem.  In my 
implementation I have to change the first argument of mb_cache_create() from 
char * to struct super_block * to be able to access the super block's block 
group lock. This works with my proposed change to allocate an mb_cache for each 
mounted ext4 filesystem.  This would also require the same change, allocating 
an mb_cache for each mounted filesystem, to both ext2 and ext3, which would 
increase the scope of the patch.  The other alternative, allocating a new 
smaller spinlock array, would not require any change to either ext2 and ext3.

I'm working on resubmitting my patches using the block group locks and 
extending the changes to also include both ext2 and ext3.  With this approach, 
not only that no addition space for dedicated new spinlock array is required, 
the e_bdev member of struct mb_cache_entry could also be removed, reducing the 
space required for each mb_cache_entry.

Please let me know if you have any concern or suggestion.

Thanks,
Mak.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-09 Thread Thavatchai Makphaibulchoke
On 01/24/2014 11:09 PM, Andreas Dilger wrote:
> I think the ext4 block groups are locked with the blockgroup_lock that has 
> about the same number of locks as the number of cores, with a max of 128, 
> IIRC.  See blockgroup_lock.h. 
> 
> While there is some chance of contention, it is also unlikely that all of the 
> cores are locking this area at the same time.  
> 
> Cheers, Andreas
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-02-09 Thread Thavatchai Makphaibulchoke
On 01/24/2014 11:09 PM, Andreas Dilger wrote:
 I think the ext4 block groups are locked with the blockgroup_lock that has 
 about the same number of locks as the number of cores, with a max of 128, 
 IIRC.  See blockgroup_lock.h. 
 
 While there is some chance of contention, it is also unlikely that all of the 
 cores are locking this area at the same time.  
 
 Cheers, Andreas
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-28 Thread Thavatchai Makphaibulchoke
On 01/28/2014 02:09 PM, Andreas Dilger wrote:
> On Jan 28, 2014, at 5:26 AM, George Spelvin  wrote:
>>> The third part of the patch further increases the scalablity of an ext4
>>> filesystem by having each ext4 fielsystem allocate and use its own private
>>> mbcache structure, instead of sharing a single mcache structures across all
>>> ext4 filesystems, and increases the size of its mbcache hash tables.
>>
>> Are you sure this helps?  The idea behind having one large mbcache is
>> that one large hash table will always be at least as well balanced as
>> multiple separate tables, if the total size is the same.
>>
>> If you have two size 2^n hash tables, the chance of collision is equal to
>> one size  2^(n+1) table if they're equally busy, and if they're unequally
>> busy. the latter is better.  The busier file system will take less time
>> per search, and since it's searched more often than the less-busy one,
>> net win.
>>
>> How does it compare with just increasing the hash table size but leaving
>> them combined?
> 
> Except that having one mbcache per block device would avoid the need
> to store the e_bdev pointer in thousands/millions of entries.  Since
> the blocks are never shared between different block devices, there
> is no caching benefit even if the same block is on two block devices.
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

Thanks George and Andreas for the comments.  Andreas you mentions a good point, 
e_bdev pointer is not needed when having one mb_cache for each block device.  
I'll integrate that into my patch, removing the e_bdev pointer, and run some 
comparison between one large hash table vs multiple hash tables, as suggested 
by George.

Thanks,
Mak.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-28 Thread Andreas Dilger
On Jan 28, 2014, at 5:26 AM, George Spelvin  wrote:
>> The third part of the patch further increases the scalablity of an ext4
>> filesystem by having each ext4 fielsystem allocate and use its own private
>> mbcache structure, instead of sharing a single mcache structures across all
>> ext4 filesystems, and increases the size of its mbcache hash tables.
> 
> Are you sure this helps?  The idea behind having one large mbcache is
> that one large hash table will always be at least as well balanced as
> multiple separate tables, if the total size is the same.
> 
> If you have two size 2^n hash tables, the chance of collision is equal to
> one size  2^(n+1) table if they're equally busy, and if they're unequally
> busy. the latter is better.  The busier file system will take less time
> per search, and since it's searched more often than the less-busy one,
> net win.
> 
> How does it compare with just increasing the hash table size but leaving
> them combined?

Except that having one mbcache per block device would avoid the need
to store the e_bdev pointer in thousands/millions of entries.  Since
the blocks are never shared between different block devices, there
is no caching benefit even if the same block is on two block devices.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-28 Thread George Spelvin
> The third part of the patch further increases the scalablity of an ext4
> filesystem by having each ext4 fielsystem allocate and use its own private
> mbcache structure, instead of sharing a single mcache structures across all
> ext4 filesystems, and increases the size of its mbcache hash tables.

Are you sure this helps?  The idea behind having one large mbcache is
that one large hash table will always be at least as well balanced as
multiple separate tables, if the total size is the same.

If you have two size 2^n hash tables, the chance of collision is equal to
one size  2^(n+1) table if they're equally busy, and if they're unequally
busy. the latter is better.  The busier file system will take less time
per search, and since it's searcehed more often than the less-busy one,
net win.

How does it compare with just increasing the hash table size but leaving
them combined?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-28 Thread George Spelvin
 The third part of the patch further increases the scalablity of an ext4
 filesystem by having each ext4 fielsystem allocate and use its own private
 mbcache structure, instead of sharing a single mcache structures across all
 ext4 filesystems, and increases the size of its mbcache hash tables.

Are you sure this helps?  The idea behind having one large mbcache is
that one large hash table will always be at least as well balanced as
multiple separate tables, if the total size is the same.

If you have two size 2^n hash tables, the chance of collision is equal to
one size  2^(n+1) table if they're equally busy, and if they're unequally
busy. the latter is better.  The busier file system will take less time
per search, and since it's searcehed more often than the less-busy one,
net win.

How does it compare with just increasing the hash table size but leaving
them combined?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-28 Thread Andreas Dilger
On Jan 28, 2014, at 5:26 AM, George Spelvin li...@horizon.com wrote:
 The third part of the patch further increases the scalablity of an ext4
 filesystem by having each ext4 fielsystem allocate and use its own private
 mbcache structure, instead of sharing a single mcache structures across all
 ext4 filesystems, and increases the size of its mbcache hash tables.
 
 Are you sure this helps?  The idea behind having one large mbcache is
 that one large hash table will always be at least as well balanced as
 multiple separate tables, if the total size is the same.
 
 If you have two size 2^n hash tables, the chance of collision is equal to
 one size  2^(n+1) table if they're equally busy, and if they're unequally
 busy. the latter is better.  The busier file system will take less time
 per search, and since it's searched more often than the less-busy one,
 net win.
 
 How does it compare with just increasing the hash table size but leaving
 them combined?

Except that having one mbcache per block device would avoid the need
to store the e_bdev pointer in thousands/millions of entries.  Since
the blocks are never shared between different block devices, there
is no caching benefit even if the same block is on two block devices.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-28 Thread Thavatchai Makphaibulchoke
On 01/28/2014 02:09 PM, Andreas Dilger wrote:
 On Jan 28, 2014, at 5:26 AM, George Spelvin li...@horizon.com wrote:
 The third part of the patch further increases the scalablity of an ext4
 filesystem by having each ext4 fielsystem allocate and use its own private
 mbcache structure, instead of sharing a single mcache structures across all
 ext4 filesystems, and increases the size of its mbcache hash tables.

 Are you sure this helps?  The idea behind having one large mbcache is
 that one large hash table will always be at least as well balanced as
 multiple separate tables, if the total size is the same.

 If you have two size 2^n hash tables, the chance of collision is equal to
 one size  2^(n+1) table if they're equally busy, and if they're unequally
 busy. the latter is better.  The busier file system will take less time
 per search, and since it's searched more often than the less-busy one,
 net win.

 How does it compare with just increasing the hash table size but leaving
 them combined?
 
 Except that having one mbcache per block device would avoid the need
 to store the e_bdev pointer in thousands/millions of entries.  Since
 the blocks are never shared between different block devices, there
 is no caching benefit even if the same block is on two block devices.
 
 Cheers, Andreas
 
 
 
 
 

Thanks George and Andreas for the comments.  Andreas you mentions a good point, 
e_bdev pointer is not needed when having one mb_cache for each block device.  
I'll integrate that into my patch, removing the e_bdev pointer, and run some 
comparison between one large hash table vs multiple hash tables, as suggested 
by George.

Thanks,
Mak.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-27 Thread Thavatchai Makphaibulchoke
On 01/24/2014 11:09 PM, Andreas Dilger wrote:
> I think the ext4 block groups are locked with the blockgroup_lock that has 
> about the same number of locks as the number of cores, with a max of 128, 
> IIRC.  See blockgroup_lock.h. 
> 
> While there is some chance of contention, it is also unlikely that all of the 
> cores are locking this area at the same time.  
> 
> Cheers, Andreas
> 


Thanks Andreas for the suggestion. Will try that versus adding just a new 
private spinlock array in mbcache and compare the performance.

Thanks,
Mak.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-27 Thread Thavatchai Makphaibulchoke
On 01/24/2014 11:09 PM, Andreas Dilger wrote:
 I think the ext4 block groups are locked with the blockgroup_lock that has 
 about the same number of locks as the number of cores, with a max of 128, 
 IIRC.  See blockgroup_lock.h. 
 
 While there is some chance of contention, it is also unlikely that all of the 
 cores are locking this area at the same time.  
 
 Cheers, Andreas
 


Thanks Andreas for the suggestion. Will try that versus adding just a new 
private spinlock array in mbcache and compare the performance.

Thanks,
Mak.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread Thavatchai Makphaibulchoke
On 01/24/2014 02:38 PM, Andi Kleen wrote:
> T Makphaibulchoke  writes:
> 
>> The patch consists of three parts.
>>
>> The first part changes the implementation of both the block and hash chains 
>> of
>> an mb_cache from list_head to hlist_bl_head and also introduces new members,
>> including a spinlock to mb_cache_entry, as required by the second part.
> 
> spinlock per entry is usually overkill for larger hash tables.
> 
> Can you use a second smaller lock table that just has locks and is 
> indexed by a subset of the hash key. Most likely a very small 
> table is good enough.
> 
> Also I would be good to have some data on the additional memory consumption.
> 
> -Andi
> 

Thanks Andi for the comments.  Will look into that.

Thanks,
Mak.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread Andreas Dilger
I think the ext4 block groups are locked with the blockgroup_lock that has 
about the same number of locks as the number of cores, with a max of 128, IIRC. 
 See blockgroup_lock.h. 

While there is some chance of contention, it is also unlikely that all of the 
cores are locking this area at the same time.  

Cheers, Andreas

> On Jan 24, 2014, at 14:38, Andi Kleen  wrote:
> 
> T Makphaibulchoke  writes:
> 
>> The patch consists of three parts.
>> 
>> The first part changes the implementation of both the block and hash chains 
>> of
>> an mb_cache from list_head to hlist_bl_head and also introduces new members,
>> including a spinlock to mb_cache_entry, as required by the second part.
> 
> spinlock per entry is usually overkill for larger hash tables.
> 
> Can you use a second smaller lock table that just has locks and is 
> indexed by a subset of the hash key. Most likely a very small 
> table is good enough.
> 
> Also I would be good to have some data on the additional memory consumption.
> 
> -Andi
> 
> -- 
> a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread Andi Kleen
T Makphaibulchoke  writes:

> The patch consists of three parts.
>
> The first part changes the implementation of both the block and hash chains of
> an mb_cache from list_head to hlist_bl_head and also introduces new members,
> including a spinlock to mb_cache_entry, as required by the second part.

spinlock per entry is usually overkill for larger hash tables.

Can you use a second smaller lock table that just has locks and is 
indexed by a subset of the hash key. Most likely a very small 
table is good enough.

Also I would be good to have some data on the additional memory consumption.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread T Makphaibulchoke
The patch consists of three parts.

The first part changes the implementation of both the block and hash chains of
an mb_cache from list_head to hlist_bl_head and also introduces new members,
including a spinlock to mb_cache_entry, as required by the second part.

The second part introduces higher degree of parallelism to the usages of the
mb_cache and mb_cache_entries and impacts all ext filesystems.

The third part of the patch further increases the scalablity of an ext4
filesystem by having each ext4 fielsystem allocate and use its own private
mbcache structure, instead of sharing a single mcache structures across all
ext4 filesystems, and increases the size of its mbcache hash tables.

Here are some of the benchmark results with the changes.

Using ram disk, there seems to be no peformance differences with aim7 for all
workloads on all platforms tested.

With regular disk filesystems with inode size of 128 bytes, forcing the uses
of external xattr, there seems to be some good peformance increases with
some of the aim7's workloads on all platforms tested.

Here are some of the performance improvement on aim7 with 2000 users.

On a 60 core machine:

---
| | % increase |
---
| alltests| 30.97  |
---
| custom  | 41.18  |
---
| dbase   | 32.06  |
---
| disk|125.02  |
---
| fserver | 10.23  |
---
| new_dbase   | 14.58  |
---
| new_fserve  |  9.62  |
---
| shared  | 52.56  |
---

On an 40 core machine:

---
| | % increase |
---
| custom  | 63.45  |
---
| disk| 133.25  |
---
| fserver | 78.29  |
---
| new_fserver | 80.66  |
---
| shared  | 56.34  |
---

The changes have been tested with ext4 xfstests to verify that no regression
has been introduced.

Changed in v4:
- New performance data
- New diff summary
- New patch architecture

Changed in v3:
- New idff summary

Changed in v2:
- New performance data
- New diff summary
T Makphaibulchoke (3):
  fs/mbcache.c change block and index hash chain to hlist_bl_node
  mbcache: decoupling the locking of local from global data
  ext4: each filesystem creates and uses its own mc_cache

 fs/ext4/ext4.h  |   1 +
 fs/ext4/super.c |  24 ++-
 fs/ext4/xattr.c |  51 ++---
 fs/ext4/xattr.h |   6 +-
 fs/mbcache.c| 491 ++--
 include/linux/mbcache.h |  11 +-
 6 files changed, 402 insertions(+), 182 deletions(-)

-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread T Makphaibulchoke
The patch consists of three parts.

The first part changes the implementation of both the block and hash chains of
an mb_cache from list_head to hlist_bl_head and also introduces new members,
including a spinlock to mb_cache_entry, as required by the second part.

The second part introduces higher degree of parallelism to the usages of the
mb_cache and mb_cache_entries and impacts all ext filesystems.

The third part of the patch further increases the scalablity of an ext4
filesystem by having each ext4 fielsystem allocate and use its own private
mbcache structure, instead of sharing a single mcache structures across all
ext4 filesystems, and increases the size of its mbcache hash tables.

Here are some of the benchmark results with the changes.

Using ram disk, there seems to be no peformance differences with aim7 for all
workloads on all platforms tested.

With regular disk filesystems with inode size of 128 bytes, forcing the uses
of external xattr, there seems to be some good peformance increases with
some of the aim7's workloads on all platforms tested.

Here are some of the performance improvement on aim7 with 2000 users.

On a 60 core machine:

---
| | % increase |
---
| alltests| 30.97  |
---
| custom  | 41.18  |
---
| dbase   | 32.06  |
---
| disk|125.02  |
---
| fserver | 10.23  |
---
| new_dbase   | 14.58  |
---
| new_fserve  |  9.62  |
---
| shared  | 52.56  |
---

On an 40 core machine:

---
| | % increase |
---
| custom  | 63.45  |
---
| disk| 133.25  |
---
| fserver | 78.29  |
---
| new_fserver | 80.66  |
---
| shared  | 56.34  |
---

The changes have been tested with ext4 xfstests to verify that no regression
has been introduced.

Changed in v4:
- New performance data
- New diff summary
- New patch architecture

Changed in v3:
- New idff summary

Changed in v2:
- New performance data
- New diff summary
T Makphaibulchoke (3):
  fs/mbcache.c change block and index hash chain to hlist_bl_node
  mbcache: decoupling the locking of local from global data
  ext4: each filesystem creates and uses its own mc_cache

 fs/ext4/ext4.h  |   1 +
 fs/ext4/super.c |  24 ++-
 fs/ext4/xattr.c |  51 ++---
 fs/ext4/xattr.h |   6 +-
 fs/mbcache.c| 491 ++--
 include/linux/mbcache.h |  11 +-
 6 files changed, 402 insertions(+), 182 deletions(-)

-- 
1.7.11.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread Andi Kleen
T Makphaibulchoke t...@hp.com writes:

 The patch consists of three parts.

 The first part changes the implementation of both the block and hash chains of
 an mb_cache from list_head to hlist_bl_head and also introduces new members,
 including a spinlock to mb_cache_entry, as required by the second part.

spinlock per entry is usually overkill for larger hash tables.

Can you use a second smaller lock table that just has locks and is 
indexed by a subset of the hash key. Most likely a very small 
table is good enough.

Also I would be good to have some data on the additional memory consumption.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread Andreas Dilger
I think the ext4 block groups are locked with the blockgroup_lock that has 
about the same number of locks as the number of cores, with a max of 128, IIRC. 
 See blockgroup_lock.h. 

While there is some chance of contention, it is also unlikely that all of the 
cores are locking this area at the same time.  

Cheers, Andreas

 On Jan 24, 2014, at 14:38, Andi Kleen a...@firstfloor.org wrote:
 
 T Makphaibulchoke t...@hp.com writes:
 
 The patch consists of three parts.
 
 The first part changes the implementation of both the block and hash chains 
 of
 an mb_cache from list_head to hlist_bl_head and also introduces new members,
 including a spinlock to mb_cache_entry, as required by the second part.
 
 spinlock per entry is usually overkill for larger hash tables.
 
 Can you use a second smaller lock table that just has locks and is 
 indexed by a subset of the hash key. Most likely a very small 
 table is good enough.
 
 Also I would be good to have some data on the additional memory consumption.
 
 -Andi
 
 -- 
 a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] ext4: increase mbcache scalability

2014-01-24 Thread Thavatchai Makphaibulchoke
On 01/24/2014 02:38 PM, Andi Kleen wrote:
 T Makphaibulchoke t...@hp.com writes:
 
 The patch consists of three parts.

 The first part changes the implementation of both the block and hash chains 
 of
 an mb_cache from list_head to hlist_bl_head and also introduces new members,
 including a spinlock to mb_cache_entry, as required by the second part.
 
 spinlock per entry is usually overkill for larger hash tables.
 
 Can you use a second smaller lock table that just has locks and is 
 indexed by a subset of the hash key. Most likely a very small 
 table is good enough.
 
 Also I would be good to have some data on the additional memory consumption.
 
 -Andi
 

Thanks Andi for the comments.  Will look into that.

Thanks,
Mak.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/