Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread Nick Piggin
On Mon, Mar 05, 2007 at 05:27:24AM +0100, Nick Piggin wrote:
> On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote:
> > One minor nit:
> > 
> > > +struct dentry_hash {
> > > + unsigned int shift;
> > > + unsigned long mask;
> > > + struct hlist_head *table;
> > > +};
> > 
> > I don't see any reason to make 'mask' an unsigned long
> > and this makes this structure take up 8 bytes more than
> > necessary on 64-bit.
> 
> You're right, the mask is currently just an int, so my patch should
> not be messing with that. Thanks.
> 
> The other thing you'll have to be careful of when looking at doing
> an implementation, is that I think I forgot to use the RCU accessors
> (rcu_assign_pointer, rcu_dereference) when assigning and loading
> the new/old hash table pointers.

Also, now that I think of it, if resize performance is far far less
important than read-side performance, then you should be able to get
away without the additional smp_rmb() that I add to the read-side
in the algorithm I described.

All you have to do is introduce an additional RCU grace period between
assigning the old pointer to the current table, and the current pointer
to the new table. This will still ensure that an (old_table == NULL &&
cur_table == new_table) condition is impossible (that condition would
mean that one of our active tables is invisible to the reader).

This should truely reduce read-side fastpath overhead to just a single,
predictable branch, an extra read_mostly load or two and the rcu_blah()
bits.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread David Miller
From: Nick Piggin <[EMAIL PROTECTED]>
Date: Mon, 5 Mar 2007 05:27:24 +0100

> Sounds great, I would be happy to help review it. If we can create a
> bit of common infrastructure, the dcache conversion might become a bit
> more palatable and we could look at other things like the inode hash
> as well.

Absolutely, I'll run my patch by you once I have it ready.

> The other thing you'll have to be careful of when looking at doing
> an implementation, is that I think I forgot to use the RCU accessors
> (rcu_assign_pointer, rcu_dereference) when assigning and loading
> the new/old hash table pointers.

Thanks for the heads-up.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread Nick Piggin
On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote:
> From: Nick Piggin <[EMAIL PROTECTED]>
> Date: Fri, 23 Feb 2007 16:37:43 +0100
> 
> > So I introduce a new method for resizing hash tables with RCU, and apply
> > that to the dentry hash.
> 
> Thanks for doing this work Nick.  I'm going to take your ideas
> and apply them to an ipv4 routing cache dynamic growth patch I
> worked on a while ago.

Sounds great, I would be happy to help review it. If we can create a
bit of common infrastructure, the dcache conversion might become a bit
more palatable and we could look at other things like the inode hash
as well.

> One minor nit:
> 
> > +struct dentry_hash {
> > +   unsigned int shift;
> > +   unsigned long mask;
> > +   struct hlist_head *table;
> > +};
> 
> I don't see any reason to make 'mask' an unsigned long
> and this makes this structure take up 8 bytes more than
> necessary on 64-bit.

You're right, the mask is currently just an int, so my patch should
not be messing with that. Thanks.

The other thing you'll have to be careful of when looking at doing
an implementation, is that I think I forgot to use the RCU accessors
(rcu_assign_pointer, rcu_dereference) when assigning and loading
the new/old hash table pointers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread David Miller
From: Nick Piggin <[EMAIL PROTECTED]>
Date: Fri, 23 Feb 2007 16:37:43 +0100

> So I introduce a new method for resizing hash tables with RCU, and apply
> that to the dentry hash.

Thanks for doing this work Nick.  I'm going to take your ideas
and apply them to an ipv4 routing cache dynamic growth patch I
worked on a while ago.

One minor nit:

> +struct dentry_hash {
> + unsigned int shift;
> + unsigned long mask;
> + struct hlist_head *table;
> +};

I don't see any reason to make 'mask' an unsigned long
and this makes this structure take up 8 bytes more than
necessary on 64-bit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread David Miller
From: Nick Piggin [EMAIL PROTECTED]
Date: Fri, 23 Feb 2007 16:37:43 +0100

 So I introduce a new method for resizing hash tables with RCU, and apply
 that to the dentry hash.

Thanks for doing this work Nick.  I'm going to take your ideas
and apply them to an ipv4 routing cache dynamic growth patch I
worked on a while ago.

One minor nit:

 +struct dentry_hash {
 + unsigned int shift;
 + unsigned long mask;
 + struct hlist_head *table;
 +};

I don't see any reason to make 'mask' an unsigned long
and this makes this structure take up 8 bytes more than
necessary on 64-bit.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread Nick Piggin
On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote:
 From: Nick Piggin [EMAIL PROTECTED]
 Date: Fri, 23 Feb 2007 16:37:43 +0100
 
  So I introduce a new method for resizing hash tables with RCU, and apply
  that to the dentry hash.
 
 Thanks for doing this work Nick.  I'm going to take your ideas
 and apply them to an ipv4 routing cache dynamic growth patch I
 worked on a while ago.

Sounds great, I would be happy to help review it. If we can create a
bit of common infrastructure, the dcache conversion might become a bit
more palatable and we could look at other things like the inode hash
as well.

 One minor nit:
 
  +struct dentry_hash {
  +   unsigned int shift;
  +   unsigned long mask;
  +   struct hlist_head *table;
  +};
 
 I don't see any reason to make 'mask' an unsigned long
 and this makes this structure take up 8 bytes more than
 necessary on 64-bit.

You're right, the mask is currently just an int, so my patch should
not be messing with that. Thanks.

The other thing you'll have to be careful of when looking at doing
an implementation, is that I think I forgot to use the RCU accessors
(rcu_assign_pointer, rcu_dereference) when assigning and loading
the new/old hash table pointers.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread David Miller
From: Nick Piggin [EMAIL PROTECTED]
Date: Mon, 5 Mar 2007 05:27:24 +0100

 Sounds great, I would be happy to help review it. If we can create a
 bit of common infrastructure, the dcache conversion might become a bit
 more palatable and we could look at other things like the inode hash
 as well.

Absolutely, I'll run my patch by you once I have it ready.

 The other thing you'll have to be careful of when looking at doing
 an implementation, is that I think I forgot to use the RCU accessors
 (rcu_assign_pointer, rcu_dereference) when assigning and loading
 the new/old hash table pointers.

Thanks for the heads-up.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-03-04 Thread Nick Piggin
On Mon, Mar 05, 2007 at 05:27:24AM +0100, Nick Piggin wrote:
 On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote:
  One minor nit:
  
   +struct dentry_hash {
   + unsigned int shift;
   + unsigned long mask;
   + struct hlist_head *table;
   +};
  
  I don't see any reason to make 'mask' an unsigned long
  and this makes this structure take up 8 bytes more than
  necessary on 64-bit.
 
 You're right, the mask is currently just an int, so my patch should
 not be messing with that. Thanks.
 
 The other thing you'll have to be careful of when looking at doing
 an implementation, is that I think I forgot to use the RCU accessors
 (rcu_assign_pointer, rcu_dereference) when assigning and loading
 the new/old hash table pointers.

Also, now that I think of it, if resize performance is far far less
important than read-side performance, then you should be able to get
away without the additional smp_rmb() that I add to the read-side
in the algorithm I described.

All you have to do is introduce an additional RCU grace period between
assigning the old pointer to the current table, and the current pointer
to the new table. This will still ensure that an (old_table == NULL 
cur_table == new_table) condition is impossible (that condition would
mean that one of our active tables is invisible to the reader).

This should truely reduce read-side fastpath overhead to just a single,
predictable branch, an extra read_mostly load or two and the rcu_blah()
bits.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread Paul E. McKenney
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote:
> 
> The dentry hash uses up 8MB for 1 million entries on my 4GB system is one
> of the biggest wasters of memory for me. Because I rarely have more than one 
> or
> two hundred thousand dentries. And that's with several kernel trees worth of
> entries. Most desktop and probably even many types of servers will only use a
> fraction of that.
> 
> So I introduce a new method for resizing hash tables with RCU, and apply
> that to the dentry hash.
> 
> The primitive heuristic is that the hash size is doubled when the number of
> entries reaches 150% the hash size, and halved when the number is 50%.
> It should also be able to shrink under memory pressure, and scale up as
> large as we go.
> 
> A pity it uses vmalloc memory for the moment.
> 
> The implementation is not highly stress tested, but it is running now. It
> could do a bit more RCU stuff asynchronously rather than with synchronize_rcu,
> but who cares, for now.
> 
> The hash is costing me about 256K now, which is a 32x reduction in memory.
> 
> I don't know if it's worthwhile to do this, rather than move things to other
> data structures, but something just tempted me to have a go!  I'd be 
> interested
> to hear comments, and how many holes people can spot in my design ;)

The pseudocode looks sound to me.  The discussion on netdev has a few
alternatives that are interesting as well -- one of them gets rid of
the need for the sequence lock using a trick similar to one that Josh
Triplett came up with for atomically moving an element from one list
to another in face of RCU readers.  Another is Robert Ohlsson's "trash"
trie-hash implementation.

Some rather impressive algorithms all around!

Thanx, Paul

> Thanks,
> Nick
> 
> 
> RCU hash resizing algorithm as follows:
> 
> There is a 2 pointer system, the regular hash table, and an "old" hash table
> pointer which is NULL except when resizing the hash.
> 
> When resizing the hash table, the pointer is set to the current table, and the
> regular hash table pointer is switched from the current table to a newly
> initialised hash table (so the current table becomes the old table, and the 
> new
> table becomes the current table). Memory barriers are used so a reader won't
> find a NULL old table AND a new, empty current table.
> 
> Then we wait for a grace period, so that all readers can see this new
> world order. Readers are anybody who wants to get a handle on the hash table,
> including to add entries to it (ie. the data we're protecting is the actual
> table -- hash entry locking is pretty well decoupled from this algorithm).
> So any new insertions should be going into the current table at this point.
> 
> Now we start moving hash entries from the old table into the current table.
> This is done under a seqlock, and we can do it bit by bit, as time and
> latency allow... After all entries are moved, then we set the "old" pointer
> to NULL, wait for a grace period (now no readers will have a handle on it),
> then free it.
> 
> On the read side, we load the current hash pointer and the old hash pointer
> first (remember this has to happen under rcu_read_lock). Then everything
> happens as normal when the old pointer is NULL, which is the common case.
> If the old pointer is not NULL, we have to be prepared to search both hash
> tables to find the entry we want. We also have to do this under the read
> seqlock in case we miss an entry when it is being moved from the old table
> to the current table as part of the resize process.
> 
> NOTE: you can be creative with the seqlock. It could be several seqlocks, if
> performance under resize is important. It doesn't even have to be a seqlock,
> just something to give you synchronisation. Say if you have a spinlock per
> bucket: just hold locks for both buckets while doing the move from one to the
> other, and have the read-side search the old table first.
> 
> struct hash_table *table, *old_table;
> 
> lookup_hash()
> {
>   struct hash_table *cur, *old;
> 
>   rcu_read_lock();
>   cur = table;
>   smp_rmb();
>   old = old_table;
> 
>   if (unlikely(old)) {
>   do {
>   read_seqbegin();
>   entry = search_table(cur);
>   if (!entry)
>   entry = search_table(old);
>   } while (read_seqretry());
>   } else {
>   entry = search_table(cur);
>   }
>   rcu_read_unlock();
> }
> 
> resize_hash()
> {
>   init(new_table, new_size);
> 
>   old_table = table;
>   smp_wmb();
>   table = new_table;
>   synchronize_rcu();
> 
>   for_each_entry_in(entry, old_table) {
>   write_seqlock();
>   rehash(entry, new_table);
>   write_sequnlock();
>   }
> 
>   tmp = old_table;
>   old_table = NULL;
>   synchronize_rcu();
>   

Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread William Lee Irwin III
> From: William Lee Irwin III <[EMAIL PROTECTED]>
> Date: Sat, 24 Feb 2007 14:56:31 -0800
>> just do it on a per-directory basis so you don't intermix children
>> of different parents in some boot-time -allocated global trainwreck
>> and you're home free.  Benchmarking is probably needed to gauge
>> which performs best.

On Sat, Feb 24, 2007 at 04:56:31PM -0800, David Miller wrote:
> The original dentry implementation in the kernel did things
> per-directory and it sucked.
> The problem you run into is that you end up with recursive algorithms
> all over the place, which chew up and overflow the kernel stack.
> So it would be a regression to go to a per-directory type
> lookup data structure for dentries.

Recursive code is clearly inadmissible in kernel environments. Any
implementations would clearly be required to be coded iteratively.
It's unclear to me where the choice of data structure would make a
demand for recursive code, but wherever it ostensibly does so, an
explicitly-allocated stack (or queue or worklist of whatever sort)
should be able to unravel it. There are furthermore rather stringent
bounds on balanced tree heights due to address space limits, so
whatever explicitly-allocated stacks may be required for e.g. tree
balancing algorithms are tightly bounded in depth.

The usual objection is to lexicographically ordered balanced trees
such as B+ trees on the grounds that string comparison is cache-
intensive. Hash tries resolve that concern, albeit at the amortized
cost of expansion and contraction of what are effectively nodes in
a radix tree with a large radix as well as undoing path compression
where required (speaking of which, lib/radix-tree.c should really do
path compression instead of the ->depth hack). The quest, as I see it,
is for self-sizing data structures with as little restructuring and
direct key (string) comparison as possible. Should the obvious
candidates not incur significant overhead on account of where they
do such, maybe they're good enough.

So I'm not sure what else to say as far as the per-directory lookup
structure commentary goes. There's no way I'd propose putting
recursive code in the kernel. I don't believe any of what I'm talking
about requires such. I'm aware of various drawbacks the algorithms
have, and have in mind that they should pass benchmark criteria in
addition to reducing kernel memory consumption in order to prove that
they don't overwhelm the putative benefits. So I think I've got all the
bases covered.

Am I off-base on any of this? Is there anything I missed?


-- wli

(P.S.: The only time I ever proposed putting something "recursive"
in the kernel, it used a flag to limit the recursion depth to two
stack frames. It was for the purpose of simplifying the reservation
of metadata in an allocator, and so not strictly necessary. A 
simplified metadata allocator could've easily been devised, albeit at
the cost of some code duplication.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread David Miller
From: William Lee Irwin III <[EMAIL PROTECTED]>
Date: Sat, 24 Feb 2007 14:56:31 -0800

> just do it on a per-directory basis so you don't intermix children
> of different parents in some boot-time -allocated global trainwreck
> and you're home free.  Benchmarking is probably needed to gauge
> which performs best.

The original dentry implementation in the kernel did things
per-directory and it sucked.

The problem you run into is that you end up with recursive algorithms
all over the place, which chew up and overflow the kernel stack.

So it would be a regression to go to a per-directory type
lookup data structure for dentries.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread William Lee Irwin III
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote:
>> You would be better served by a data structure different from a hashtable.

On Sat, Feb 24, 2007 at 06:09:37AM +0100, Nick Piggin wrote:
> Out of curiosity, what better data structure do you have in mind for
> the dentry hash?

Per-directory indices of ->d_subdirs. Hash tries (spelled correctly)
and B+ trees hung off dentries corresponding to directories are
plausible. There are plenty of other possibilities; just do it on a
per-directory basis so you don't intermix children of different parents
in some boot-time -allocated global trainwreck and you're home free.
Benchmarking is probably needed to gauge which performs best.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread William Lee Irwin III
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote:
 You would be better served by a data structure different from a hashtable.

On Sat, Feb 24, 2007 at 06:09:37AM +0100, Nick Piggin wrote:
 Out of curiosity, what better data structure do you have in mind for
 the dentry hash?

Per-directory indices of -d_subdirs. Hash tries (spelled correctly)
and B+ trees hung off dentries corresponding to directories are
plausible. There are plenty of other possibilities; just do it on a
per-directory basis so you don't intermix children of different parents
in some boot-time -allocated global trainwreck and you're home free.
Benchmarking is probably needed to gauge which performs best.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread David Miller
From: William Lee Irwin III [EMAIL PROTECTED]
Date: Sat, 24 Feb 2007 14:56:31 -0800

 just do it on a per-directory basis so you don't intermix children
 of different parents in some boot-time -allocated global trainwreck
 and you're home free.  Benchmarking is probably needed to gauge
 which performs best.

The original dentry implementation in the kernel did things
per-directory and it sucked.

The problem you run into is that you end up with recursive algorithms
all over the place, which chew up and overflow the kernel stack.

So it would be a regression to go to a per-directory type
lookup data structure for dentries.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread William Lee Irwin III
 From: William Lee Irwin III [EMAIL PROTECTED]
 Date: Sat, 24 Feb 2007 14:56:31 -0800
 just do it on a per-directory basis so you don't intermix children
 of different parents in some boot-time -allocated global trainwreck
 and you're home free.  Benchmarking is probably needed to gauge
 which performs best.

On Sat, Feb 24, 2007 at 04:56:31PM -0800, David Miller wrote:
 The original dentry implementation in the kernel did things
 per-directory and it sucked.
 The problem you run into is that you end up with recursive algorithms
 all over the place, which chew up and overflow the kernel stack.
 So it would be a regression to go to a per-directory type
 lookup data structure for dentries.

Recursive code is clearly inadmissible in kernel environments. Any
implementations would clearly be required to be coded iteratively.
It's unclear to me where the choice of data structure would make a
demand for recursive code, but wherever it ostensibly does so, an
explicitly-allocated stack (or queue or worklist of whatever sort)
should be able to unravel it. There are furthermore rather stringent
bounds on balanced tree heights due to address space limits, so
whatever explicitly-allocated stacks may be required for e.g. tree
balancing algorithms are tightly bounded in depth.

The usual objection is to lexicographically ordered balanced trees
such as B+ trees on the grounds that string comparison is cache-
intensive. Hash tries resolve that concern, albeit at the amortized
cost of expansion and contraction of what are effectively nodes in
a radix tree with a large radix as well as undoing path compression
where required (speaking of which, lib/radix-tree.c should really do
path compression instead of the -depth hack). The quest, as I see it,
is for self-sizing data structures with as little restructuring and
direct key (string) comparison as possible. Should the obvious
candidates not incur significant overhead on account of where they
do such, maybe they're good enough.

So I'm not sure what else to say as far as the per-directory lookup
structure commentary goes. There's no way I'd propose putting
recursive code in the kernel. I don't believe any of what I'm talking
about requires such. I'm aware of various drawbacks the algorithms
have, and have in mind that they should pass benchmark criteria in
addition to reducing kernel memory consumption in order to prove that
they don't overwhelm the putative benefits. So I think I've got all the
bases covered.

Am I off-base on any of this? Is there anything I missed?


-- wli

(P.S.: The only time I ever proposed putting something recursive
in the kernel, it used a flag to limit the recursion depth to two
stack frames. It was for the purpose of simplifying the reservation
of metadata in an allocator, and so not strictly necessary. A 
simplified metadata allocator could've easily been devised, albeit at
the cost of some code duplication.)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-24 Thread Paul E. McKenney
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote:
 
 The dentry hash uses up 8MB for 1 million entries on my 4GB system is one
 of the biggest wasters of memory for me. Because I rarely have more than one 
 or
 two hundred thousand dentries. And that's with several kernel trees worth of
 entries. Most desktop and probably even many types of servers will only use a
 fraction of that.
 
 So I introduce a new method for resizing hash tables with RCU, and apply
 that to the dentry hash.
 
 The primitive heuristic is that the hash size is doubled when the number of
 entries reaches 150% the hash size, and halved when the number is 50%.
 It should also be able to shrink under memory pressure, and scale up as
 large as we go.
 
 A pity it uses vmalloc memory for the moment.
 
 The implementation is not highly stress tested, but it is running now. It
 could do a bit more RCU stuff asynchronously rather than with synchronize_rcu,
 but who cares, for now.
 
 The hash is costing me about 256K now, which is a 32x reduction in memory.
 
 I don't know if it's worthwhile to do this, rather than move things to other
 data structures, but something just tempted me to have a go!  I'd be 
 interested
 to hear comments, and how many holes people can spot in my design ;)

The pseudocode looks sound to me.  The discussion on netdev has a few
alternatives that are interesting as well -- one of them gets rid of
the need for the sequence lock using a trick similar to one that Josh
Triplett came up with for atomically moving an element from one list
to another in face of RCU readers.  Another is Robert Ohlsson's trash
trie-hash implementation.

Some rather impressive algorithms all around!

Thanx, Paul

 Thanks,
 Nick
 
 
 RCU hash resizing algorithm as follows:
 
 There is a 2 pointer system, the regular hash table, and an old hash table
 pointer which is NULL except when resizing the hash.
 
 When resizing the hash table, the pointer is set to the current table, and the
 regular hash table pointer is switched from the current table to a newly
 initialised hash table (so the current table becomes the old table, and the 
 new
 table becomes the current table). Memory barriers are used so a reader won't
 find a NULL old table AND a new, empty current table.
 
 Then we wait for a grace period, so that all readers can see this new
 world order. Readers are anybody who wants to get a handle on the hash table,
 including to add entries to it (ie. the data we're protecting is the actual
 table -- hash entry locking is pretty well decoupled from this algorithm).
 So any new insertions should be going into the current table at this point.
 
 Now we start moving hash entries from the old table into the current table.
 This is done under a seqlock, and we can do it bit by bit, as time and
 latency allow... After all entries are moved, then we set the old pointer
 to NULL, wait for a grace period (now no readers will have a handle on it),
 then free it.
 
 On the read side, we load the current hash pointer and the old hash pointer
 first (remember this has to happen under rcu_read_lock). Then everything
 happens as normal when the old pointer is NULL, which is the common case.
 If the old pointer is not NULL, we have to be prepared to search both hash
 tables to find the entry we want. We also have to do this under the read
 seqlock in case we miss an entry when it is being moved from the old table
 to the current table as part of the resize process.
 
 NOTE: you can be creative with the seqlock. It could be several seqlocks, if
 performance under resize is important. It doesn't even have to be a seqlock,
 just something to give you synchronisation. Say if you have a spinlock per
 bucket: just hold locks for both buckets while doing the move from one to the
 other, and have the read-side search the old table first.
 
 struct hash_table *table, *old_table;
 
 lookup_hash()
 {
   struct hash_table *cur, *old;
 
   rcu_read_lock();
   cur = table;
   smp_rmb();
   old = old_table;
 
   if (unlikely(old)) {
   do {
   read_seqbegin();
   entry = search_table(cur);
   if (!entry)
   entry = search_table(old);
   } while (read_seqretry());
   } else {
   entry = search_table(cur);
   }
   rcu_read_unlock();
 }
 
 resize_hash()
 {
   init(new_table, new_size);
 
   old_table = table;
   smp_wmb();
   table = new_table;
   synchronize_rcu();
 
   for_each_entry_in(entry, old_table) {
   write_seqlock();
   rehash(entry, new_table);
   write_sequnlock();
   }
 
   tmp = old_table;
   old_table = NULL;
   synchronize_rcu();
   free(tmp);
 }
 
 
 
  fs/dcache.c |  324 
 +---
  1 file changed, 243 

Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Sat, Feb 24, 2007 at 01:07:23PM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 23 Feb 2007 16:37:43 +0100
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > +static void dcache_hash_resize(unsigned int new_shift);
> > +static void mod_nr_dentry(int mod)
> > +{
> > +   unsigned long dentry_size;
> > +
> > +   dentry_stat.nr_dentry += mod;
> > +
> > +   dentry_size = 1 << dentry_hash->shift;
> > +   if (unlikely(dentry_stat.nr_dentry > dentry_size+(dentry_size>>1)))
> > +   dcache_hash_resize(dentry_hash->shift+1);
> > +   else if (unlikely(dentry_stat.nr_dentry < (dentry_size>>1)))
> > +   dcache_hash_resize(dentry_hash->shift-1);
> > +}
> > +
> 
> Do we need to do this kind of resizing in implicit automatic way ?
> I think it's good to show contention rate by /proc and add sysctl for
> resize this.

Well having the kernel automatically adjust to the workload is better
than a sysctl. But it could well be the case that these simple heuristics
are bad (though they're fine for my system).

But a manual override is a good idea as well.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote:
> On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote:
> > The dentry hash uses up 8MB for 1 million entries on my 4GB system is
> > one of the biggest wasters of memory for me. Because I rarely have
> > more than one or two hundred thousand dentries. And that's with
> > several kernel trees worth of entries. Most desktop and probably even
> > many types of servers will only use a fraction of that.
> > So I introduce a new method for resizing hash tables with RCU, and apply
> > that to the dentry hash.
> > The primitive heuristic is that the hash size is doubled when the number of
> > entries reaches 150% the hash size, and halved when the number is 50%.
> > It should also be able to shrink under memory pressure, and scale up as
> > large as we go.
> 
> You would be better served by a data structure different from a hashtable.

Possibly. Note that I wasn't intending to rewrite the dcache hash
specifically, but just demonstrate my lockless dynamic data structure
switch, and the dentry hash was the most obvious candidate.

But considering that it is lockless, and basically free in the fastpath
(ie. a single easily-predicted branch), then I'm better served by an
dynamically sized dynamic hash table than an inappropriately sized one.

Well, almost. The vmalloc issue is the downside, of course. But if I'm
on a NUMA that is doing vmalloc hashes anyway, then there is little
downside.

Out of curiosity, what better data structure do you have in mind for
the dentry hash?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread William Lee Irwin III
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote:
> The dentry hash uses up 8MB for 1 million entries on my 4GB system is
> one of the biggest wasters of memory for me. Because I rarely have
> more than one or two hundred thousand dentries. And that's with
> several kernel trees worth of entries. Most desktop and probably even
> many types of servers will only use a fraction of that.
> So I introduce a new method for resizing hash tables with RCU, and apply
> that to the dentry hash.
> The primitive heuristic is that the hash size is doubled when the number of
> entries reaches 150% the hash size, and halved when the number is 50%.
> It should also be able to shrink under memory pressure, and scale up as
> large as we go.

You would be better served by a data structure different from a hashtable.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread KAMEZAWA Hiroyuki
On Fri, 23 Feb 2007 16:37:43 +0100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> +static void dcache_hash_resize(unsigned int new_shift);
> +static void mod_nr_dentry(int mod)
> +{
> + unsigned long dentry_size;
> +
> + dentry_stat.nr_dentry += mod;
> +
> + dentry_size = 1 << dentry_hash->shift;
> + if (unlikely(dentry_stat.nr_dentry > dentry_size+(dentry_size>>1)))
> + dcache_hash_resize(dentry_hash->shift+1);
> + else if (unlikely(dentry_stat.nr_dentry < (dentry_size>>1)))
> + dcache_hash_resize(dentry_hash->shift-1);
> +}
> +

Do we need to do this kind of resizing in implicit automatic way ?
I think it's good to show contention rate by /proc and add sysctl for
resize this.

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Sat, Feb 24, 2007 at 02:26:02AM +0100, Nick Piggin wrote:
> On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote:
> > 
> > On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote:
> > 
> > >
> > >The dentry hash uses up 8MB for 1 million entries on my 4GB system  
> > >is one
> > >of the biggest wasters of memory for me. Because I rarely have more  
> > >than one or
> > >two hundred thousand dentries. And that's with several kernel trees  
> > >worth of
> > >entries. Most desktop and probably even many types of servers will  
> > >only use a
> > >fraction of that.
> > >
> > >So I introduce a new method for resizing hash tables with RCU, and  
> > >apply
> > >that to the dentry hash.
> > 
> > Can you compare what you've done to the design that Paul and David  
> > talked about a year ago?
> > 
> >   http://lkml.org/lkml/2006/1/30/74
> 
> Thanks for the link, I wasn't aware of Paul's algorithm before.
> 
> I guess most variants are going to have a double pointer scheme,
> so they are similar in that regard.
> 
> I think Paul's is quite complex in moving entries to the new table,
> and it looks like it requires a lot of grace periods. I avoid all
> that by using the seqlock.
> 
> It wasn't clear to me how Paul handled the case where an item is
> present in the not_current table, but the lookup misses it when it
> gets moved between the tables. It is a little tricky to follow
> the find details because it is not in code or pseudo code format.


I guess the other thing is that Paul's seems quite hash specific,
wheras mine does not have to be tied to any particular data strcture.

All you need it to know how to perform a lookup on both your old
and new data structure, and the same algorithm is basically applicable
to switching between any 2 data structures.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 05:31:30PM -0800, Michael K. Edwards wrote:
> On 2/23/07, Zach Brown <[EMAIL PROTECTED]> wrote:
> >I'd love to see a generic implementation of RCU hashing that
> >subsystems can then take advantage of.  It's long been on the fun
> >side of my todo list.  The side I never get to :/.
> 
> There's an active thread on netdev about implementing an RCU hash.
> I'd suggest a 2-left (or possibly even k-left) hash for statistical
> reasons discussed briefly there, and in greater depth in a paper by
> Michael Mitzenmacher at
> www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps.
> Despite his paper's emphasis on hardware parallelism, there's a bigger
> win associated with Poisson statistics and decreasing occupation
> fraction (and therefore collision probability) in successive hashes.

Thanks for the link, I'll take a look.

I'm *pretty* sure that my algorithm will work with any kind of data
structure. It could even change from one to another completely different
data structure.

The important point is that, when a switch is in progress, the data
structure effectively becomes the union of the old and the new. Readers
insert items only into the new structure, but it is implicit that they
have checked for collisions in the union (in the case where collisions are
possible).

And finally, (you probably gathered this, but I'll just make it clear),
I haven't implemented a lockless hash lookup. The dcache already has this.
My algorithm is a lockless dynamic data structure switch, rather than
having anything specifically to do with a specific type of hash, or even
a hash at all. So yes, it should be applicable to resizing a 2-left hash or
whatever.

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Michael K. Edwards

On 2/23/07, Zach Brown <[EMAIL PROTECTED]> wrote:

I'd love to see a generic implementation of RCU hashing that
subsystems can then take advantage of.  It's long been on the fun
side of my todo list.  The side I never get to :/.


There's an active thread on netdev about implementing an RCU hash.
I'd suggest a 2-left (or possibly even k-left) hash for statistical
reasons discussed briefly there, and in greater depth in a paper by
Michael Mitzenmacher at
www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps.
Despite his paper's emphasis on hardware parallelism, there's a bigger
win associated with Poisson statistics and decreasing occupation
fraction (and therefore collision probability) in successive hashes.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote:
> 
> On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote:
> 
> >
> >The dentry hash uses up 8MB for 1 million entries on my 4GB system  
> >is one
> >of the biggest wasters of memory for me. Because I rarely have more  
> >than one or
> >two hundred thousand dentries. And that's with several kernel trees  
> >worth of
> >entries. Most desktop and probably even many types of servers will  
> >only use a
> >fraction of that.
> >
> >So I introduce a new method for resizing hash tables with RCU, and  
> >apply
> >that to the dentry hash.
> 
> Can you compare what you've done to the design that Paul and David  
> talked about a year ago?
> 
>   http://lkml.org/lkml/2006/1/30/74

Thanks for the link, I wasn't aware of Paul's algorithm before.

I guess most variants are going to have a double pointer scheme,
so they are similar in that regard.

I think Paul's is quite complex in moving entries to the new table,
and it looks like it requires a lot of grace periods. I avoid all
that by using the seqlock.

It wasn't clear to me how Paul handled the case where an item is
present in the not_current table, but the lookup misses it when it
gets moved between the tables. It is a little tricky to follow
the find details because it is not in code or pseudo code format.


> I'd love to see a generic implementation of RCU hashing that  
> subsystems can then take advantage of.  It's long been on the fun  
> side of my todo list.  The side I never get to :/.

Yeah if this is to be used anywhere else, I think it definitely
needs to be made into a generic library if possible. Main thing
I wanted was to get something working and see what people think
about it.

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 05:31:17PM +0100, Eric Dumazet wrote:
> On Friday 23 February 2007 16:37, Nick Piggin wrote:
> > The dentry hash uses up 8MB for 1 million entries on my 4GB system is one
> > of the biggest wasters of memory for me. Because I rarely have more than
> > one or two hundred thousand dentries. And that's with several kernel trees
> > worth of entries. Most desktop and probably even many types of servers will
> > only use a fraction of that.
> >
> > So I introduce a new method for resizing hash tables with RCU, and apply
> > that to the dentry hash.
> >
> > The primitive heuristic is that the hash size is doubled when the number of
> > entries reaches 150% the hash size, and halved when the number is 50%.
> > It should also be able to shrink under memory pressure, and scale up as
> > large as we go.
> >
> > A pity it uses vmalloc memory for the moment.
> >
> > The implementation is not highly stress tested, but it is running now. It
> > could do a bit more RCU stuff asynchronously rather than with
> > synchronize_rcu, but who cares, for now.
> >
> > The hash is costing me about 256K now, which is a 32x reduction in memory.
> >
> > I don't know if it's worthwhile to do this, rather than move things to
> > other data structures, but something just tempted me to have a go!  I'd be
> > interested to hear comments, and how many holes people can spot in my
> > design ;)
> >
> > Thanks,
> > Nick
> 
> Hi Nick
> 
> Thats a really good idea !
> 
> The vmalloc() thing could be a problem, so :
> 
> Could you bring back the support of 'dhash_entries=262144' boot param, so 
> that 
> an admin could set the initial size of dhash table, (and not shrink it under 
> this size even if the number of dentries is low)

Hi Eric,

Yeah, that's a good idea. I'll look at doing that.

> In case dhash_entries is set in boot params, we could try to use 
> alloc_large_system_hash() for the initial table, (eventually using Hugepages 
> (not vmalloc)), if we add a free_large_system_hash() function to be able to 
> free the initial table.
> 
> Or else, time is to add the possibility for vmalloc() to use hugepages 
> itself...

That sounds like a nice idea to have a hugepage vmalloc for very large
allocations. The big NUMA guys already use vmalloc to allocate large hashes,
so hugepages would probably be a big win for them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Zach Brown


On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote:



The dentry hash uses up 8MB for 1 million entries on my 4GB system  
is one
of the biggest wasters of memory for me. Because I rarely have more  
than one or
two hundred thousand dentries. And that's with several kernel trees  
worth of
entries. Most desktop and probably even many types of servers will  
only use a

fraction of that.

So I introduce a new method for resizing hash tables with RCU, and  
apply

that to the dentry hash.


Can you compare what you've done to the design that Paul and David  
talked about a year ago?


  http://lkml.org/lkml/2006/1/30/74

I'd love to see a generic implementation of RCU hashing that  
subsystems can then take advantage of.  It's long been on the fun  
side of my todo list.  The side I never get to :/.


- z

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Eric Dumazet
On Friday 23 February 2007 16:37, Nick Piggin wrote:
> The dentry hash uses up 8MB for 1 million entries on my 4GB system is one
> of the biggest wasters of memory for me. Because I rarely have more than
> one or two hundred thousand dentries. And that's with several kernel trees
> worth of entries. Most desktop and probably even many types of servers will
> only use a fraction of that.
>
> So I introduce a new method for resizing hash tables with RCU, and apply
> that to the dentry hash.
>
> The primitive heuristic is that the hash size is doubled when the number of
> entries reaches 150% the hash size, and halved when the number is 50%.
> It should also be able to shrink under memory pressure, and scale up as
> large as we go.
>
> A pity it uses vmalloc memory for the moment.
>
> The implementation is not highly stress tested, but it is running now. It
> could do a bit more RCU stuff asynchronously rather than with
> synchronize_rcu, but who cares, for now.
>
> The hash is costing me about 256K now, which is a 32x reduction in memory.
>
> I don't know if it's worthwhile to do this, rather than move things to
> other data structures, but something just tempted me to have a go!  I'd be
> interested to hear comments, and how many holes people can spot in my
> design ;)
>
> Thanks,
> Nick

Hi Nick

Thats a really good idea !

The vmalloc() thing could be a problem, so :

Could you bring back the support of 'dhash_entries=262144' boot param, so that 
an admin could set the initial size of dhash table, (and not shrink it under 
this size even if the number of dentries is low)

In case dhash_entries is set in boot params, we could try to use 
alloc_large_system_hash() for the initial table, (eventually using Hugepages 
(not vmalloc)), if we add a free_large_system_hash() function to be able to 
free the initial table.

Or else, time is to add the possibility for vmalloc() to use hugepages 
itself...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Eric Dumazet
On Friday 23 February 2007 16:37, Nick Piggin wrote:
 The dentry hash uses up 8MB for 1 million entries on my 4GB system is one
 of the biggest wasters of memory for me. Because I rarely have more than
 one or two hundred thousand dentries. And that's with several kernel trees
 worth of entries. Most desktop and probably even many types of servers will
 only use a fraction of that.

 So I introduce a new method for resizing hash tables with RCU, and apply
 that to the dentry hash.

 The primitive heuristic is that the hash size is doubled when the number of
 entries reaches 150% the hash size, and halved when the number is 50%.
 It should also be able to shrink under memory pressure, and scale up as
 large as we go.

 A pity it uses vmalloc memory for the moment.

 The implementation is not highly stress tested, but it is running now. It
 could do a bit more RCU stuff asynchronously rather than with
 synchronize_rcu, but who cares, for now.

 The hash is costing me about 256K now, which is a 32x reduction in memory.

 I don't know if it's worthwhile to do this, rather than move things to
 other data structures, but something just tempted me to have a go!  I'd be
 interested to hear comments, and how many holes people can spot in my
 design ;)

 Thanks,
 Nick

Hi Nick

Thats a really good idea !

The vmalloc() thing could be a problem, so :

Could you bring back the support of 'dhash_entries=262144' boot param, so that 
an admin could set the initial size of dhash table, (and not shrink it under 
this size even if the number of dentries is low)

In case dhash_entries is set in boot params, we could try to use 
alloc_large_system_hash() for the initial table, (eventually using Hugepages 
(not vmalloc)), if we add a free_large_system_hash() function to be able to 
free the initial table.

Or else, time is to add the possibility for vmalloc() to use hugepages 
itself...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Zach Brown


On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote:



The dentry hash uses up 8MB for 1 million entries on my 4GB system  
is one
of the biggest wasters of memory for me. Because I rarely have more  
than one or
two hundred thousand dentries. And that's with several kernel trees  
worth of
entries. Most desktop and probably even many types of servers will  
only use a

fraction of that.

So I introduce a new method for resizing hash tables with RCU, and  
apply

that to the dentry hash.


Can you compare what you've done to the design that Paul and David  
talked about a year ago?


  http://lkml.org/lkml/2006/1/30/74

I'd love to see a generic implementation of RCU hashing that  
subsystems can then take advantage of.  It's long been on the fun  
side of my todo list.  The side I never get to :/.


- z

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 05:31:17PM +0100, Eric Dumazet wrote:
 On Friday 23 February 2007 16:37, Nick Piggin wrote:
  The dentry hash uses up 8MB for 1 million entries on my 4GB system is one
  of the biggest wasters of memory for me. Because I rarely have more than
  one or two hundred thousand dentries. And that's with several kernel trees
  worth of entries. Most desktop and probably even many types of servers will
  only use a fraction of that.
 
  So I introduce a new method for resizing hash tables with RCU, and apply
  that to the dentry hash.
 
  The primitive heuristic is that the hash size is doubled when the number of
  entries reaches 150% the hash size, and halved when the number is 50%.
  It should also be able to shrink under memory pressure, and scale up as
  large as we go.
 
  A pity it uses vmalloc memory for the moment.
 
  The implementation is not highly stress tested, but it is running now. It
  could do a bit more RCU stuff asynchronously rather than with
  synchronize_rcu, but who cares, for now.
 
  The hash is costing me about 256K now, which is a 32x reduction in memory.
 
  I don't know if it's worthwhile to do this, rather than move things to
  other data structures, but something just tempted me to have a go!  I'd be
  interested to hear comments, and how many holes people can spot in my
  design ;)
 
  Thanks,
  Nick
 
 Hi Nick
 
 Thats a really good idea !
 
 The vmalloc() thing could be a problem, so :
 
 Could you bring back the support of 'dhash_entries=262144' boot param, so 
 that 
 an admin could set the initial size of dhash table, (and not shrink it under 
 this size even if the number of dentries is low)

Hi Eric,

Yeah, that's a good idea. I'll look at doing that.

 In case dhash_entries is set in boot params, we could try to use 
 alloc_large_system_hash() for the initial table, (eventually using Hugepages 
 (not vmalloc)), if we add a free_large_system_hash() function to be able to 
 free the initial table.
 
 Or else, time is to add the possibility for vmalloc() to use hugepages 
 itself...

That sounds like a nice idea to have a hugepage vmalloc for very large
allocations. The big NUMA guys already use vmalloc to allocate large hashes,
so hugepages would probably be a big win for them.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote:
 
 On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote:
 
 
 The dentry hash uses up 8MB for 1 million entries on my 4GB system  
 is one
 of the biggest wasters of memory for me. Because I rarely have more  
 than one or
 two hundred thousand dentries. And that's with several kernel trees  
 worth of
 entries. Most desktop and probably even many types of servers will  
 only use a
 fraction of that.
 
 So I introduce a new method for resizing hash tables with RCU, and  
 apply
 that to the dentry hash.
 
 Can you compare what you've done to the design that Paul and David  
 talked about a year ago?
 
   http://lkml.org/lkml/2006/1/30/74

Thanks for the link, I wasn't aware of Paul's algorithm before.

I guess most variants are going to have a double pointer scheme,
so they are similar in that regard.

I think Paul's is quite complex in moving entries to the new table,
and it looks like it requires a lot of grace periods. I avoid all
that by using the seqlock.

It wasn't clear to me how Paul handled the case where an item is
present in the not_current table, but the lookup misses it when it
gets moved between the tables. It is a little tricky to follow
the find details because it is not in code or pseudo code format.


 I'd love to see a generic implementation of RCU hashing that  
 subsystems can then take advantage of.  It's long been on the fun  
 side of my todo list.  The side I never get to :/.

Yeah if this is to be used anywhere else, I think it definitely
needs to be made into a generic library if possible. Main thing
I wanted was to get something working and see what people think
about it.

Thanks,
Nick
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Michael K. Edwards

On 2/23/07, Zach Brown [EMAIL PROTECTED] wrote:

I'd love to see a generic implementation of RCU hashing that
subsystems can then take advantage of.  It's long been on the fun
side of my todo list.  The side I never get to :/.


There's an active thread on netdev about implementing an RCU hash.
I'd suggest a 2-left (or possibly even k-left) hash for statistical
reasons discussed briefly there, and in greater depth in a paper by
Michael Mitzenmacher at
www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps.
Despite his paper's emphasis on hardware parallelism, there's a bigger
win associated with Poisson statistics and decreasing occupation
fraction (and therefore collision probability) in successive hashes.

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 05:31:30PM -0800, Michael K. Edwards wrote:
 On 2/23/07, Zach Brown [EMAIL PROTECTED] wrote:
 I'd love to see a generic implementation of RCU hashing that
 subsystems can then take advantage of.  It's long been on the fun
 side of my todo list.  The side I never get to :/.
 
 There's an active thread on netdev about implementing an RCU hash.
 I'd suggest a 2-left (or possibly even k-left) hash for statistical
 reasons discussed briefly there, and in greater depth in a paper by
 Michael Mitzenmacher at
 www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps.
 Despite his paper's emphasis on hardware parallelism, there's a bigger
 win associated with Poisson statistics and decreasing occupation
 fraction (and therefore collision probability) in successive hashes.

Thanks for the link, I'll take a look.

I'm *pretty* sure that my algorithm will work with any kind of data
structure. It could even change from one to another completely different
data structure.

The important point is that, when a switch is in progress, the data
structure effectively becomes the union of the old and the new. Readers
insert items only into the new structure, but it is implicit that they
have checked for collisions in the union (in the case where collisions are
possible).

And finally, (you probably gathered this, but I'll just make it clear),
I haven't implemented a lockless hash lookup. The dcache already has this.
My algorithm is a lockless dynamic data structure switch, rather than
having anything specifically to do with a specific type of hash, or even
a hash at all. So yes, it should be applicable to resizing a 2-left hash or
whatever.

Thanks,
Nick
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Sat, Feb 24, 2007 at 02:26:02AM +0100, Nick Piggin wrote:
 On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote:
  
  On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote:
  
  
  The dentry hash uses up 8MB for 1 million entries on my 4GB system  
  is one
  of the biggest wasters of memory for me. Because I rarely have more  
  than one or
  two hundred thousand dentries. And that's with several kernel trees  
  worth of
  entries. Most desktop and probably even many types of servers will  
  only use a
  fraction of that.
  
  So I introduce a new method for resizing hash tables with RCU, and  
  apply
  that to the dentry hash.
  
  Can you compare what you've done to the design that Paul and David  
  talked about a year ago?
  
http://lkml.org/lkml/2006/1/30/74
 
 Thanks for the link, I wasn't aware of Paul's algorithm before.
 
 I guess most variants are going to have a double pointer scheme,
 so they are similar in that regard.
 
 I think Paul's is quite complex in moving entries to the new table,
 and it looks like it requires a lot of grace periods. I avoid all
 that by using the seqlock.
 
 It wasn't clear to me how Paul handled the case where an item is
 present in the not_current table, but the lookup misses it when it
 gets moved between the tables. It is a little tricky to follow
 the find details because it is not in code or pseudo code format.


I guess the other thing is that Paul's seems quite hash specific,
wheras mine does not have to be tied to any particular data strcture.

All you need it to know how to perform a lookup on both your old
and new data structure, and the same algorithm is basically applicable
to switching between any 2 data structures.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread KAMEZAWA Hiroyuki
On Fri, 23 Feb 2007 16:37:43 +0100
Nick Piggin [EMAIL PROTECTED] wrote:

 +static void dcache_hash_resize(unsigned int new_shift);
 +static void mod_nr_dentry(int mod)
 +{
 + unsigned long dentry_size;
 +
 + dentry_stat.nr_dentry += mod;
 +
 + dentry_size = 1  dentry_hash-shift;
 + if (unlikely(dentry_stat.nr_dentry  dentry_size+(dentry_size1)))
 + dcache_hash_resize(dentry_hash-shift+1);
 + else if (unlikely(dentry_stat.nr_dentry  (dentry_size1)))
 + dcache_hash_resize(dentry_hash-shift-1);
 +}
 +

Do we need to do this kind of resizing in implicit automatic way ?
I think it's good to show contention rate by /proc and add sysctl for
resize this.

-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread William Lee Irwin III
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote:
 The dentry hash uses up 8MB for 1 million entries on my 4GB system is
 one of the biggest wasters of memory for me. Because I rarely have
 more than one or two hundred thousand dentries. And that's with
 several kernel trees worth of entries. Most desktop and probably even
 many types of servers will only use a fraction of that.
 So I introduce a new method for resizing hash tables with RCU, and apply
 that to the dentry hash.
 The primitive heuristic is that the hash size is doubled when the number of
 entries reaches 150% the hash size, and halved when the number is 50%.
 It should also be able to shrink under memory pressure, and scale up as
 large as we go.

You would be better served by a data structure different from a hashtable.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote:
 On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote:
  The dentry hash uses up 8MB for 1 million entries on my 4GB system is
  one of the biggest wasters of memory for me. Because I rarely have
  more than one or two hundred thousand dentries. And that's with
  several kernel trees worth of entries. Most desktop and probably even
  many types of servers will only use a fraction of that.
  So I introduce a new method for resizing hash tables with RCU, and apply
  that to the dentry hash.
  The primitive heuristic is that the hash size is doubled when the number of
  entries reaches 150% the hash size, and halved when the number is 50%.
  It should also be able to shrink under memory pressure, and scale up as
  large as we go.
 
 You would be better served by a data structure different from a hashtable.

Possibly. Note that I wasn't intending to rewrite the dcache hash
specifically, but just demonstrate my lockless dynamic data structure
switch, and the dentry hash was the most obvious candidate.

But considering that it is lockless, and basically free in the fastpath
(ie. a single easily-predicted branch), then I'm better served by an
dynamically sized dynamic hash table than an inappropriately sized one.

Well, almost. The vmalloc issue is the downside, of course. But if I'm
on a NUMA that is doing vmalloc hashes anyway, then there is little
downside.

Out of curiosity, what better data structure do you have in mind for
the dentry hash?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] dynamic resizing dentry hash using RCU

2007-02-23 Thread Nick Piggin
On Sat, Feb 24, 2007 at 01:07:23PM +0900, KAMEZAWA Hiroyuki wrote:
 On Fri, 23 Feb 2007 16:37:43 +0100
 Nick Piggin [EMAIL PROTECTED] wrote:
 
  +static void dcache_hash_resize(unsigned int new_shift);
  +static void mod_nr_dentry(int mod)
  +{
  +   unsigned long dentry_size;
  +
  +   dentry_stat.nr_dentry += mod;
  +
  +   dentry_size = 1  dentry_hash-shift;
  +   if (unlikely(dentry_stat.nr_dentry  dentry_size+(dentry_size1)))
  +   dcache_hash_resize(dentry_hash-shift+1);
  +   else if (unlikely(dentry_stat.nr_dentry  (dentry_size1)))
  +   dcache_hash_resize(dentry_hash-shift-1);
  +}
  +
 
 Do we need to do this kind of resizing in implicit automatic way ?
 I think it's good to show contention rate by /proc and add sysctl for
 resize this.

Well having the kernel automatically adjust to the workload is better
than a sysctl. But it could well be the case that these simple heuristics
are bad (though they're fine for my system).

But a manual override is a good idea as well.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/