Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Mon, Mar 05, 2007 at 05:27:24AM +0100, Nick Piggin wrote: > On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote: > > One minor nit: > > > > > +struct dentry_hash { > > > + unsigned int shift; > > > + unsigned long mask; > > > + struct hlist_head *table; > > > +}; > > > > I don't see any reason to make 'mask' an unsigned long > > and this makes this structure take up 8 bytes more than > > necessary on 64-bit. > > You're right, the mask is currently just an int, so my patch should > not be messing with that. Thanks. > > The other thing you'll have to be careful of when looking at doing > an implementation, is that I think I forgot to use the RCU accessors > (rcu_assign_pointer, rcu_dereference) when assigning and loading > the new/old hash table pointers. Also, now that I think of it, if resize performance is far far less important than read-side performance, then you should be able to get away without the additional smp_rmb() that I add to the read-side in the algorithm I described. All you have to do is introduce an additional RCU grace period between assigning the old pointer to the current table, and the current pointer to the new table. This will still ensure that an (old_table == NULL && cur_table == new_table) condition is impossible (that condition would mean that one of our active tables is invisible to the reader). This should truely reduce read-side fastpath overhead to just a single, predictable branch, an extra read_mostly load or two and the rcu_blah() bits. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: Nick Piggin <[EMAIL PROTECTED]> Date: Mon, 5 Mar 2007 05:27:24 +0100 > Sounds great, I would be happy to help review it. If we can create a > bit of common infrastructure, the dcache conversion might become a bit > more palatable and we could look at other things like the inode hash > as well. Absolutely, I'll run my patch by you once I have it ready. > The other thing you'll have to be careful of when looking at doing > an implementation, is that I think I forgot to use the RCU accessors > (rcu_assign_pointer, rcu_dereference) when assigning and loading > the new/old hash table pointers. Thanks for the heads-up. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote: > From: Nick Piggin <[EMAIL PROTECTED]> > Date: Fri, 23 Feb 2007 16:37:43 +0100 > > > So I introduce a new method for resizing hash tables with RCU, and apply > > that to the dentry hash. > > Thanks for doing this work Nick. I'm going to take your ideas > and apply them to an ipv4 routing cache dynamic growth patch I > worked on a while ago. Sounds great, I would be happy to help review it. If we can create a bit of common infrastructure, the dcache conversion might become a bit more palatable and we could look at other things like the inode hash as well. > One minor nit: > > > +struct dentry_hash { > > + unsigned int shift; > > + unsigned long mask; > > + struct hlist_head *table; > > +}; > > I don't see any reason to make 'mask' an unsigned long > and this makes this structure take up 8 bytes more than > necessary on 64-bit. You're right, the mask is currently just an int, so my patch should not be messing with that. Thanks. The other thing you'll have to be careful of when looking at doing an implementation, is that I think I forgot to use the RCU accessors (rcu_assign_pointer, rcu_dereference) when assigning and loading the new/old hash table pointers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: Nick Piggin <[EMAIL PROTECTED]> Date: Fri, 23 Feb 2007 16:37:43 +0100 > So I introduce a new method for resizing hash tables with RCU, and apply > that to the dentry hash. Thanks for doing this work Nick. I'm going to take your ideas and apply them to an ipv4 routing cache dynamic growth patch I worked on a while ago. One minor nit: > +struct dentry_hash { > + unsigned int shift; > + unsigned long mask; > + struct hlist_head *table; > +}; I don't see any reason to make 'mask' an unsigned long and this makes this structure take up 8 bytes more than necessary on 64-bit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: Nick Piggin [EMAIL PROTECTED] Date: Fri, 23 Feb 2007 16:37:43 +0100 So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. Thanks for doing this work Nick. I'm going to take your ideas and apply them to an ipv4 routing cache dynamic growth patch I worked on a while ago. One minor nit: +struct dentry_hash { + unsigned int shift; + unsigned long mask; + struct hlist_head *table; +}; I don't see any reason to make 'mask' an unsigned long and this makes this structure take up 8 bytes more than necessary on 64-bit. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote: From: Nick Piggin [EMAIL PROTECTED] Date: Fri, 23 Feb 2007 16:37:43 +0100 So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. Thanks for doing this work Nick. I'm going to take your ideas and apply them to an ipv4 routing cache dynamic growth patch I worked on a while ago. Sounds great, I would be happy to help review it. If we can create a bit of common infrastructure, the dcache conversion might become a bit more palatable and we could look at other things like the inode hash as well. One minor nit: +struct dentry_hash { + unsigned int shift; + unsigned long mask; + struct hlist_head *table; +}; I don't see any reason to make 'mask' an unsigned long and this makes this structure take up 8 bytes more than necessary on 64-bit. You're right, the mask is currently just an int, so my patch should not be messing with that. Thanks. The other thing you'll have to be careful of when looking at doing an implementation, is that I think I forgot to use the RCU accessors (rcu_assign_pointer, rcu_dereference) when assigning and loading the new/old hash table pointers. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: Nick Piggin [EMAIL PROTECTED] Date: Mon, 5 Mar 2007 05:27:24 +0100 Sounds great, I would be happy to help review it. If we can create a bit of common infrastructure, the dcache conversion might become a bit more palatable and we could look at other things like the inode hash as well. Absolutely, I'll run my patch by you once I have it ready. The other thing you'll have to be careful of when looking at doing an implementation, is that I think I forgot to use the RCU accessors (rcu_assign_pointer, rcu_dereference) when assigning and loading the new/old hash table pointers. Thanks for the heads-up. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Mon, Mar 05, 2007 at 05:27:24AM +0100, Nick Piggin wrote: On Sun, Mar 04, 2007 at 08:11:42PM -0800, David Miller wrote: One minor nit: +struct dentry_hash { + unsigned int shift; + unsigned long mask; + struct hlist_head *table; +}; I don't see any reason to make 'mask' an unsigned long and this makes this structure take up 8 bytes more than necessary on 64-bit. You're right, the mask is currently just an int, so my patch should not be messing with that. Thanks. The other thing you'll have to be careful of when looking at doing an implementation, is that I think I forgot to use the RCU accessors (rcu_assign_pointer, rcu_dereference) when assigning and loading the new/old hash table pointers. Also, now that I think of it, if resize performance is far far less important than read-side performance, then you should be able to get away without the additional smp_rmb() that I add to the read-side in the algorithm I described. All you have to do is introduce an additional RCU grace period between assigning the old pointer to the current table, and the current pointer to the new table. This will still ensure that an (old_table == NULL cur_table == new_table) condition is impossible (that condition would mean that one of our active tables is invisible to the reader). This should truely reduce read-side fastpath overhead to just a single, predictable branch, an extra read_mostly load or two and the rcu_blah() bits. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote: > > The dentry hash uses up 8MB for 1 million entries on my 4GB system is one > of the biggest wasters of memory for me. Because I rarely have more than one > or > two hundred thousand dentries. And that's with several kernel trees worth of > entries. Most desktop and probably even many types of servers will only use a > fraction of that. > > So I introduce a new method for resizing hash tables with RCU, and apply > that to the dentry hash. > > The primitive heuristic is that the hash size is doubled when the number of > entries reaches 150% the hash size, and halved when the number is 50%. > It should also be able to shrink under memory pressure, and scale up as > large as we go. > > A pity it uses vmalloc memory for the moment. > > The implementation is not highly stress tested, but it is running now. It > could do a bit more RCU stuff asynchronously rather than with synchronize_rcu, > but who cares, for now. > > The hash is costing me about 256K now, which is a 32x reduction in memory. > > I don't know if it's worthwhile to do this, rather than move things to other > data structures, but something just tempted me to have a go! I'd be > interested > to hear comments, and how many holes people can spot in my design ;) The pseudocode looks sound to me. The discussion on netdev has a few alternatives that are interesting as well -- one of them gets rid of the need for the sequence lock using a trick similar to one that Josh Triplett came up with for atomically moving an element from one list to another in face of RCU readers. Another is Robert Ohlsson's "trash" trie-hash implementation. Some rather impressive algorithms all around! Thanx, Paul > Thanks, > Nick > > > RCU hash resizing algorithm as follows: > > There is a 2 pointer system, the regular hash table, and an "old" hash table > pointer which is NULL except when resizing the hash. > > When resizing the hash table, the pointer is set to the current table, and the > regular hash table pointer is switched from the current table to a newly > initialised hash table (so the current table becomes the old table, and the > new > table becomes the current table). Memory barriers are used so a reader won't > find a NULL old table AND a new, empty current table. > > Then we wait for a grace period, so that all readers can see this new > world order. Readers are anybody who wants to get a handle on the hash table, > including to add entries to it (ie. the data we're protecting is the actual > table -- hash entry locking is pretty well decoupled from this algorithm). > So any new insertions should be going into the current table at this point. > > Now we start moving hash entries from the old table into the current table. > This is done under a seqlock, and we can do it bit by bit, as time and > latency allow... After all entries are moved, then we set the "old" pointer > to NULL, wait for a grace period (now no readers will have a handle on it), > then free it. > > On the read side, we load the current hash pointer and the old hash pointer > first (remember this has to happen under rcu_read_lock). Then everything > happens as normal when the old pointer is NULL, which is the common case. > If the old pointer is not NULL, we have to be prepared to search both hash > tables to find the entry we want. We also have to do this under the read > seqlock in case we miss an entry when it is being moved from the old table > to the current table as part of the resize process. > > NOTE: you can be creative with the seqlock. It could be several seqlocks, if > performance under resize is important. It doesn't even have to be a seqlock, > just something to give you synchronisation. Say if you have a spinlock per > bucket: just hold locks for both buckets while doing the move from one to the > other, and have the read-side search the old table first. > > struct hash_table *table, *old_table; > > lookup_hash() > { > struct hash_table *cur, *old; > > rcu_read_lock(); > cur = table; > smp_rmb(); > old = old_table; > > if (unlikely(old)) { > do { > read_seqbegin(); > entry = search_table(cur); > if (!entry) > entry = search_table(old); > } while (read_seqretry()); > } else { > entry = search_table(cur); > } > rcu_read_unlock(); > } > > resize_hash() > { > init(new_table, new_size); > > old_table = table; > smp_wmb(); > table = new_table; > synchronize_rcu(); > > for_each_entry_in(entry, old_table) { > write_seqlock(); > rehash(entry, new_table); > write_sequnlock(); > } > > tmp = old_table; > old_table = NULL; > synchronize_rcu(); >
Re: [rfc][patch] dynamic resizing dentry hash using RCU
> From: William Lee Irwin III <[EMAIL PROTECTED]> > Date: Sat, 24 Feb 2007 14:56:31 -0800 >> just do it on a per-directory basis so you don't intermix children >> of different parents in some boot-time -allocated global trainwreck >> and you're home free. Benchmarking is probably needed to gauge >> which performs best. On Sat, Feb 24, 2007 at 04:56:31PM -0800, David Miller wrote: > The original dentry implementation in the kernel did things > per-directory and it sucked. > The problem you run into is that you end up with recursive algorithms > all over the place, which chew up and overflow the kernel stack. > So it would be a regression to go to a per-directory type > lookup data structure for dentries. Recursive code is clearly inadmissible in kernel environments. Any implementations would clearly be required to be coded iteratively. It's unclear to me where the choice of data structure would make a demand for recursive code, but wherever it ostensibly does so, an explicitly-allocated stack (or queue or worklist of whatever sort) should be able to unravel it. There are furthermore rather stringent bounds on balanced tree heights due to address space limits, so whatever explicitly-allocated stacks may be required for e.g. tree balancing algorithms are tightly bounded in depth. The usual objection is to lexicographically ordered balanced trees such as B+ trees on the grounds that string comparison is cache- intensive. Hash tries resolve that concern, albeit at the amortized cost of expansion and contraction of what are effectively nodes in a radix tree with a large radix as well as undoing path compression where required (speaking of which, lib/radix-tree.c should really do path compression instead of the ->depth hack). The quest, as I see it, is for self-sizing data structures with as little restructuring and direct key (string) comparison as possible. Should the obvious candidates not incur significant overhead on account of where they do such, maybe they're good enough. So I'm not sure what else to say as far as the per-directory lookup structure commentary goes. There's no way I'd propose putting recursive code in the kernel. I don't believe any of what I'm talking about requires such. I'm aware of various drawbacks the algorithms have, and have in mind that they should pass benchmark criteria in addition to reducing kernel memory consumption in order to prove that they don't overwhelm the putative benefits. So I think I've got all the bases covered. Am I off-base on any of this? Is there anything I missed? -- wli (P.S.: The only time I ever proposed putting something "recursive" in the kernel, it used a flag to limit the recursion depth to two stack frames. It was for the purpose of simplifying the reservation of metadata in an allocator, and so not strictly necessary. A simplified metadata allocator could've easily been devised, albeit at the cost of some code duplication.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: William Lee Irwin III <[EMAIL PROTECTED]> Date: Sat, 24 Feb 2007 14:56:31 -0800 > just do it on a per-directory basis so you don't intermix children > of different parents in some boot-time -allocated global trainwreck > and you're home free. Benchmarking is probably needed to gauge > which performs best. The original dentry implementation in the kernel did things per-directory and it sucked. The problem you run into is that you end up with recursive algorithms all over the place, which chew up and overflow the kernel stack. So it would be a regression to go to a per-directory type lookup data structure for dentries. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote: >> You would be better served by a data structure different from a hashtable. On Sat, Feb 24, 2007 at 06:09:37AM +0100, Nick Piggin wrote: > Out of curiosity, what better data structure do you have in mind for > the dentry hash? Per-directory indices of ->d_subdirs. Hash tries (spelled correctly) and B+ trees hung off dentries corresponding to directories are plausible. There are plenty of other possibilities; just do it on a per-directory basis so you don't intermix children of different parents in some boot-time -allocated global trainwreck and you're home free. Benchmarking is probably needed to gauge which performs best. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote: You would be better served by a data structure different from a hashtable. On Sat, Feb 24, 2007 at 06:09:37AM +0100, Nick Piggin wrote: Out of curiosity, what better data structure do you have in mind for the dentry hash? Per-directory indices of -d_subdirs. Hash tries (spelled correctly) and B+ trees hung off dentries corresponding to directories are plausible. There are plenty of other possibilities; just do it on a per-directory basis so you don't intermix children of different parents in some boot-time -allocated global trainwreck and you're home free. Benchmarking is probably needed to gauge which performs best. -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: William Lee Irwin III [EMAIL PROTECTED] Date: Sat, 24 Feb 2007 14:56:31 -0800 just do it on a per-directory basis so you don't intermix children of different parents in some boot-time -allocated global trainwreck and you're home free. Benchmarking is probably needed to gauge which performs best. The original dentry implementation in the kernel did things per-directory and it sucked. The problem you run into is that you end up with recursive algorithms all over the place, which chew up and overflow the kernel stack. So it would be a regression to go to a per-directory type lookup data structure for dentries. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
From: William Lee Irwin III [EMAIL PROTECTED] Date: Sat, 24 Feb 2007 14:56:31 -0800 just do it on a per-directory basis so you don't intermix children of different parents in some boot-time -allocated global trainwreck and you're home free. Benchmarking is probably needed to gauge which performs best. On Sat, Feb 24, 2007 at 04:56:31PM -0800, David Miller wrote: The original dentry implementation in the kernel did things per-directory and it sucked. The problem you run into is that you end up with recursive algorithms all over the place, which chew up and overflow the kernel stack. So it would be a regression to go to a per-directory type lookup data structure for dentries. Recursive code is clearly inadmissible in kernel environments. Any implementations would clearly be required to be coded iteratively. It's unclear to me where the choice of data structure would make a demand for recursive code, but wherever it ostensibly does so, an explicitly-allocated stack (or queue or worklist of whatever sort) should be able to unravel it. There are furthermore rather stringent bounds on balanced tree heights due to address space limits, so whatever explicitly-allocated stacks may be required for e.g. tree balancing algorithms are tightly bounded in depth. The usual objection is to lexicographically ordered balanced trees such as B+ trees on the grounds that string comparison is cache- intensive. Hash tries resolve that concern, albeit at the amortized cost of expansion and contraction of what are effectively nodes in a radix tree with a large radix as well as undoing path compression where required (speaking of which, lib/radix-tree.c should really do path compression instead of the -depth hack). The quest, as I see it, is for self-sizing data structures with as little restructuring and direct key (string) comparison as possible. Should the obvious candidates not incur significant overhead on account of where they do such, maybe they're good enough. So I'm not sure what else to say as far as the per-directory lookup structure commentary goes. There's no way I'd propose putting recursive code in the kernel. I don't believe any of what I'm talking about requires such. I'm aware of various drawbacks the algorithms have, and have in mind that they should pass benchmark criteria in addition to reducing kernel memory consumption in order to prove that they don't overwhelm the putative benefits. So I think I've got all the bases covered. Am I off-base on any of this? Is there anything I missed? -- wli (P.S.: The only time I ever proposed putting something recursive in the kernel, it used a flag to limit the recursion depth to two stack frames. It was for the purpose of simplifying the reservation of metadata in an allocator, and so not strictly necessary. A simplified metadata allocator could've easily been devised, albeit at the cost of some code duplication.) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. The primitive heuristic is that the hash size is doubled when the number of entries reaches 150% the hash size, and halved when the number is 50%. It should also be able to shrink under memory pressure, and scale up as large as we go. A pity it uses vmalloc memory for the moment. The implementation is not highly stress tested, but it is running now. It could do a bit more RCU stuff asynchronously rather than with synchronize_rcu, but who cares, for now. The hash is costing me about 256K now, which is a 32x reduction in memory. I don't know if it's worthwhile to do this, rather than move things to other data structures, but something just tempted me to have a go! I'd be interested to hear comments, and how many holes people can spot in my design ;) The pseudocode looks sound to me. The discussion on netdev has a few alternatives that are interesting as well -- one of them gets rid of the need for the sequence lock using a trick similar to one that Josh Triplett came up with for atomically moving an element from one list to another in face of RCU readers. Another is Robert Ohlsson's trash trie-hash implementation. Some rather impressive algorithms all around! Thanx, Paul Thanks, Nick RCU hash resizing algorithm as follows: There is a 2 pointer system, the regular hash table, and an old hash table pointer which is NULL except when resizing the hash. When resizing the hash table, the pointer is set to the current table, and the regular hash table pointer is switched from the current table to a newly initialised hash table (so the current table becomes the old table, and the new table becomes the current table). Memory barriers are used so a reader won't find a NULL old table AND a new, empty current table. Then we wait for a grace period, so that all readers can see this new world order. Readers are anybody who wants to get a handle on the hash table, including to add entries to it (ie. the data we're protecting is the actual table -- hash entry locking is pretty well decoupled from this algorithm). So any new insertions should be going into the current table at this point. Now we start moving hash entries from the old table into the current table. This is done under a seqlock, and we can do it bit by bit, as time and latency allow... After all entries are moved, then we set the old pointer to NULL, wait for a grace period (now no readers will have a handle on it), then free it. On the read side, we load the current hash pointer and the old hash pointer first (remember this has to happen under rcu_read_lock). Then everything happens as normal when the old pointer is NULL, which is the common case. If the old pointer is not NULL, we have to be prepared to search both hash tables to find the entry we want. We also have to do this under the read seqlock in case we miss an entry when it is being moved from the old table to the current table as part of the resize process. NOTE: you can be creative with the seqlock. It could be several seqlocks, if performance under resize is important. It doesn't even have to be a seqlock, just something to give you synchronisation. Say if you have a spinlock per bucket: just hold locks for both buckets while doing the move from one to the other, and have the read-side search the old table first. struct hash_table *table, *old_table; lookup_hash() { struct hash_table *cur, *old; rcu_read_lock(); cur = table; smp_rmb(); old = old_table; if (unlikely(old)) { do { read_seqbegin(); entry = search_table(cur); if (!entry) entry = search_table(old); } while (read_seqretry()); } else { entry = search_table(cur); } rcu_read_unlock(); } resize_hash() { init(new_table, new_size); old_table = table; smp_wmb(); table = new_table; synchronize_rcu(); for_each_entry_in(entry, old_table) { write_seqlock(); rehash(entry, new_table); write_sequnlock(); } tmp = old_table; old_table = NULL; synchronize_rcu(); free(tmp); } fs/dcache.c | 324 +--- 1 file changed, 243
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Sat, Feb 24, 2007 at 01:07:23PM +0900, KAMEZAWA Hiroyuki wrote: > On Fri, 23 Feb 2007 16:37:43 +0100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > +static void dcache_hash_resize(unsigned int new_shift); > > +static void mod_nr_dentry(int mod) > > +{ > > + unsigned long dentry_size; > > + > > + dentry_stat.nr_dentry += mod; > > + > > + dentry_size = 1 << dentry_hash->shift; > > + if (unlikely(dentry_stat.nr_dentry > dentry_size+(dentry_size>>1))) > > + dcache_hash_resize(dentry_hash->shift+1); > > + else if (unlikely(dentry_stat.nr_dentry < (dentry_size>>1))) > > + dcache_hash_resize(dentry_hash->shift-1); > > +} > > + > > Do we need to do this kind of resizing in implicit automatic way ? > I think it's good to show contention rate by /proc and add sysctl for > resize this. Well having the kernel automatically adjust to the workload is better than a sysctl. But it could well be the case that these simple heuristics are bad (though they're fine for my system). But a manual override is a good idea as well. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote: > On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote: > > The dentry hash uses up 8MB for 1 million entries on my 4GB system is > > one of the biggest wasters of memory for me. Because I rarely have > > more than one or two hundred thousand dentries. And that's with > > several kernel trees worth of entries. Most desktop and probably even > > many types of servers will only use a fraction of that. > > So I introduce a new method for resizing hash tables with RCU, and apply > > that to the dentry hash. > > The primitive heuristic is that the hash size is doubled when the number of > > entries reaches 150% the hash size, and halved when the number is 50%. > > It should also be able to shrink under memory pressure, and scale up as > > large as we go. > > You would be better served by a data structure different from a hashtable. Possibly. Note that I wasn't intending to rewrite the dcache hash specifically, but just demonstrate my lockless dynamic data structure switch, and the dentry hash was the most obvious candidate. But considering that it is lockless, and basically free in the fastpath (ie. a single easily-predicted branch), then I'm better served by an dynamically sized dynamic hash table than an inappropriately sized one. Well, almost. The vmalloc issue is the downside, of course. But if I'm on a NUMA that is doing vmalloc hashes anyway, then there is little downside. Out of curiosity, what better data structure do you have in mind for the dentry hash? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote: > The dentry hash uses up 8MB for 1 million entries on my 4GB system is > one of the biggest wasters of memory for me. Because I rarely have > more than one or two hundred thousand dentries. And that's with > several kernel trees worth of entries. Most desktop and probably even > many types of servers will only use a fraction of that. > So I introduce a new method for resizing hash tables with RCU, and apply > that to the dentry hash. > The primitive heuristic is that the hash size is doubled when the number of > entries reaches 150% the hash size, and halved when the number is 50%. > It should also be able to shrink under memory pressure, and scale up as > large as we go. You would be better served by a data structure different from a hashtable. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, 23 Feb 2007 16:37:43 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote: > +static void dcache_hash_resize(unsigned int new_shift); > +static void mod_nr_dentry(int mod) > +{ > + unsigned long dentry_size; > + > + dentry_stat.nr_dentry += mod; > + > + dentry_size = 1 << dentry_hash->shift; > + if (unlikely(dentry_stat.nr_dentry > dentry_size+(dentry_size>>1))) > + dcache_hash_resize(dentry_hash->shift+1); > + else if (unlikely(dentry_stat.nr_dentry < (dentry_size>>1))) > + dcache_hash_resize(dentry_hash->shift-1); > +} > + Do we need to do this kind of resizing in implicit automatic way ? I think it's good to show contention rate by /proc and add sysctl for resize this. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Sat, Feb 24, 2007 at 02:26:02AM +0100, Nick Piggin wrote: > On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote: > > > > On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote: > > > > > > > >The dentry hash uses up 8MB for 1 million entries on my 4GB system > > >is one > > >of the biggest wasters of memory for me. Because I rarely have more > > >than one or > > >two hundred thousand dentries. And that's with several kernel trees > > >worth of > > >entries. Most desktop and probably even many types of servers will > > >only use a > > >fraction of that. > > > > > >So I introduce a new method for resizing hash tables with RCU, and > > >apply > > >that to the dentry hash. > > > > Can you compare what you've done to the design that Paul and David > > talked about a year ago? > > > > http://lkml.org/lkml/2006/1/30/74 > > Thanks for the link, I wasn't aware of Paul's algorithm before. > > I guess most variants are going to have a double pointer scheme, > so they are similar in that regard. > > I think Paul's is quite complex in moving entries to the new table, > and it looks like it requires a lot of grace periods. I avoid all > that by using the seqlock. > > It wasn't clear to me how Paul handled the case where an item is > present in the not_current table, but the lookup misses it when it > gets moved between the tables. It is a little tricky to follow > the find details because it is not in code or pseudo code format. I guess the other thing is that Paul's seems quite hash specific, wheras mine does not have to be tied to any particular data strcture. All you need it to know how to perform a lookup on both your old and new data structure, and the same algorithm is basically applicable to switching between any 2 data structures. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 05:31:30PM -0800, Michael K. Edwards wrote: > On 2/23/07, Zach Brown <[EMAIL PROTECTED]> wrote: > >I'd love to see a generic implementation of RCU hashing that > >subsystems can then take advantage of. It's long been on the fun > >side of my todo list. The side I never get to :/. > > There's an active thread on netdev about implementing an RCU hash. > I'd suggest a 2-left (or possibly even k-left) hash for statistical > reasons discussed briefly there, and in greater depth in a paper by > Michael Mitzenmacher at > www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps. > Despite his paper's emphasis on hardware parallelism, there's a bigger > win associated with Poisson statistics and decreasing occupation > fraction (and therefore collision probability) in successive hashes. Thanks for the link, I'll take a look. I'm *pretty* sure that my algorithm will work with any kind of data structure. It could even change from one to another completely different data structure. The important point is that, when a switch is in progress, the data structure effectively becomes the union of the old and the new. Readers insert items only into the new structure, but it is implicit that they have checked for collisions in the union (in the case where collisions are possible). And finally, (you probably gathered this, but I'll just make it clear), I haven't implemented a lockless hash lookup. The dcache already has this. My algorithm is a lockless dynamic data structure switch, rather than having anything specifically to do with a specific type of hash, or even a hash at all. So yes, it should be applicable to resizing a 2-left hash or whatever. Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On 2/23/07, Zach Brown <[EMAIL PROTECTED]> wrote: I'd love to see a generic implementation of RCU hashing that subsystems can then take advantage of. It's long been on the fun side of my todo list. The side I never get to :/. There's an active thread on netdev about implementing an RCU hash. I'd suggest a 2-left (or possibly even k-left) hash for statistical reasons discussed briefly there, and in greater depth in a paper by Michael Mitzenmacher at www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps. Despite his paper's emphasis on hardware parallelism, there's a bigger win associated with Poisson statistics and decreasing occupation fraction (and therefore collision probability) in successive hashes. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote: > > On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote: > > > > >The dentry hash uses up 8MB for 1 million entries on my 4GB system > >is one > >of the biggest wasters of memory for me. Because I rarely have more > >than one or > >two hundred thousand dentries. And that's with several kernel trees > >worth of > >entries. Most desktop and probably even many types of servers will > >only use a > >fraction of that. > > > >So I introduce a new method for resizing hash tables with RCU, and > >apply > >that to the dentry hash. > > Can you compare what you've done to the design that Paul and David > talked about a year ago? > > http://lkml.org/lkml/2006/1/30/74 Thanks for the link, I wasn't aware of Paul's algorithm before. I guess most variants are going to have a double pointer scheme, so they are similar in that regard. I think Paul's is quite complex in moving entries to the new table, and it looks like it requires a lot of grace periods. I avoid all that by using the seqlock. It wasn't clear to me how Paul handled the case where an item is present in the not_current table, but the lookup misses it when it gets moved between the tables. It is a little tricky to follow the find details because it is not in code or pseudo code format. > I'd love to see a generic implementation of RCU hashing that > subsystems can then take advantage of. It's long been on the fun > side of my todo list. The side I never get to :/. Yeah if this is to be used anywhere else, I think it definitely needs to be made into a generic library if possible. Main thing I wanted was to get something working and see what people think about it. Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 05:31:17PM +0100, Eric Dumazet wrote: > On Friday 23 February 2007 16:37, Nick Piggin wrote: > > The dentry hash uses up 8MB for 1 million entries on my 4GB system is one > > of the biggest wasters of memory for me. Because I rarely have more than > > one or two hundred thousand dentries. And that's with several kernel trees > > worth of entries. Most desktop and probably even many types of servers will > > only use a fraction of that. > > > > So I introduce a new method for resizing hash tables with RCU, and apply > > that to the dentry hash. > > > > The primitive heuristic is that the hash size is doubled when the number of > > entries reaches 150% the hash size, and halved when the number is 50%. > > It should also be able to shrink under memory pressure, and scale up as > > large as we go. > > > > A pity it uses vmalloc memory for the moment. > > > > The implementation is not highly stress tested, but it is running now. It > > could do a bit more RCU stuff asynchronously rather than with > > synchronize_rcu, but who cares, for now. > > > > The hash is costing me about 256K now, which is a 32x reduction in memory. > > > > I don't know if it's worthwhile to do this, rather than move things to > > other data structures, but something just tempted me to have a go! I'd be > > interested to hear comments, and how many holes people can spot in my > > design ;) > > > > Thanks, > > Nick > > Hi Nick > > Thats a really good idea ! > > The vmalloc() thing could be a problem, so : > > Could you bring back the support of 'dhash_entries=262144' boot param, so > that > an admin could set the initial size of dhash table, (and not shrink it under > this size even if the number of dentries is low) Hi Eric, Yeah, that's a good idea. I'll look at doing that. > In case dhash_entries is set in boot params, we could try to use > alloc_large_system_hash() for the initial table, (eventually using Hugepages > (not vmalloc)), if we add a free_large_system_hash() function to be able to > free the initial table. > > Or else, time is to add the possibility for vmalloc() to use hugepages > itself... That sounds like a nice idea to have a hugepage vmalloc for very large allocations. The big NUMA guys already use vmalloc to allocate large hashes, so hugepages would probably be a big win for them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. Can you compare what you've done to the design that Paul and David talked about a year ago? http://lkml.org/lkml/2006/1/30/74 I'd love to see a generic implementation of RCU hashing that subsystems can then take advantage of. It's long been on the fun side of my todo list. The side I never get to :/. - z - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Friday 23 February 2007 16:37, Nick Piggin wrote: > The dentry hash uses up 8MB for 1 million entries on my 4GB system is one > of the biggest wasters of memory for me. Because I rarely have more than > one or two hundred thousand dentries. And that's with several kernel trees > worth of entries. Most desktop and probably even many types of servers will > only use a fraction of that. > > So I introduce a new method for resizing hash tables with RCU, and apply > that to the dentry hash. > > The primitive heuristic is that the hash size is doubled when the number of > entries reaches 150% the hash size, and halved when the number is 50%. > It should also be able to shrink under memory pressure, and scale up as > large as we go. > > A pity it uses vmalloc memory for the moment. > > The implementation is not highly stress tested, but it is running now. It > could do a bit more RCU stuff asynchronously rather than with > synchronize_rcu, but who cares, for now. > > The hash is costing me about 256K now, which is a 32x reduction in memory. > > I don't know if it's worthwhile to do this, rather than move things to > other data structures, but something just tempted me to have a go! I'd be > interested to hear comments, and how many holes people can spot in my > design ;) > > Thanks, > Nick Hi Nick Thats a really good idea ! The vmalloc() thing could be a problem, so : Could you bring back the support of 'dhash_entries=262144' boot param, so that an admin could set the initial size of dhash table, (and not shrink it under this size even if the number of dentries is low) In case dhash_entries is set in boot params, we could try to use alloc_large_system_hash() for the initial table, (eventually using Hugepages (not vmalloc)), if we add a free_large_system_hash() function to be able to free the initial table. Or else, time is to add the possibility for vmalloc() to use hugepages itself... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Friday 23 February 2007 16:37, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. The primitive heuristic is that the hash size is doubled when the number of entries reaches 150% the hash size, and halved when the number is 50%. It should also be able to shrink under memory pressure, and scale up as large as we go. A pity it uses vmalloc memory for the moment. The implementation is not highly stress tested, but it is running now. It could do a bit more RCU stuff asynchronously rather than with synchronize_rcu, but who cares, for now. The hash is costing me about 256K now, which is a 32x reduction in memory. I don't know if it's worthwhile to do this, rather than move things to other data structures, but something just tempted me to have a go! I'd be interested to hear comments, and how many holes people can spot in my design ;) Thanks, Nick Hi Nick Thats a really good idea ! The vmalloc() thing could be a problem, so : Could you bring back the support of 'dhash_entries=262144' boot param, so that an admin could set the initial size of dhash table, (and not shrink it under this size even if the number of dentries is low) In case dhash_entries is set in boot params, we could try to use alloc_large_system_hash() for the initial table, (eventually using Hugepages (not vmalloc)), if we add a free_large_system_hash() function to be able to free the initial table. Or else, time is to add the possibility for vmalloc() to use hugepages itself... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. Can you compare what you've done to the design that Paul and David talked about a year ago? http://lkml.org/lkml/2006/1/30/74 I'd love to see a generic implementation of RCU hashing that subsystems can then take advantage of. It's long been on the fun side of my todo list. The side I never get to :/. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 05:31:17PM +0100, Eric Dumazet wrote: On Friday 23 February 2007 16:37, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. The primitive heuristic is that the hash size is doubled when the number of entries reaches 150% the hash size, and halved when the number is 50%. It should also be able to shrink under memory pressure, and scale up as large as we go. A pity it uses vmalloc memory for the moment. The implementation is not highly stress tested, but it is running now. It could do a bit more RCU stuff asynchronously rather than with synchronize_rcu, but who cares, for now. The hash is costing me about 256K now, which is a 32x reduction in memory. I don't know if it's worthwhile to do this, rather than move things to other data structures, but something just tempted me to have a go! I'd be interested to hear comments, and how many holes people can spot in my design ;) Thanks, Nick Hi Nick Thats a really good idea ! The vmalloc() thing could be a problem, so : Could you bring back the support of 'dhash_entries=262144' boot param, so that an admin could set the initial size of dhash table, (and not shrink it under this size even if the number of dentries is low) Hi Eric, Yeah, that's a good idea. I'll look at doing that. In case dhash_entries is set in boot params, we could try to use alloc_large_system_hash() for the initial table, (eventually using Hugepages (not vmalloc)), if we add a free_large_system_hash() function to be able to free the initial table. Or else, time is to add the possibility for vmalloc() to use hugepages itself... That sounds like a nice idea to have a hugepage vmalloc for very large allocations. The big NUMA guys already use vmalloc to allocate large hashes, so hugepages would probably be a big win for them. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote: On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. Can you compare what you've done to the design that Paul and David talked about a year ago? http://lkml.org/lkml/2006/1/30/74 Thanks for the link, I wasn't aware of Paul's algorithm before. I guess most variants are going to have a double pointer scheme, so they are similar in that regard. I think Paul's is quite complex in moving entries to the new table, and it looks like it requires a lot of grace periods. I avoid all that by using the seqlock. It wasn't clear to me how Paul handled the case where an item is present in the not_current table, but the lookup misses it when it gets moved between the tables. It is a little tricky to follow the find details because it is not in code or pseudo code format. I'd love to see a generic implementation of RCU hashing that subsystems can then take advantage of. It's long been on the fun side of my todo list. The side I never get to :/. Yeah if this is to be used anywhere else, I think it definitely needs to be made into a generic library if possible. Main thing I wanted was to get something working and see what people think about it. Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On 2/23/07, Zach Brown [EMAIL PROTECTED] wrote: I'd love to see a generic implementation of RCU hashing that subsystems can then take advantage of. It's long been on the fun side of my todo list. The side I never get to :/. There's an active thread on netdev about implementing an RCU hash. I'd suggest a 2-left (or possibly even k-left) hash for statistical reasons discussed briefly there, and in greater depth in a paper by Michael Mitzenmacher at www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps. Despite his paper's emphasis on hardware parallelism, there's a bigger win associated with Poisson statistics and decreasing occupation fraction (and therefore collision probability) in successive hashes. Cheers, - Michael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 05:31:30PM -0800, Michael K. Edwards wrote: On 2/23/07, Zach Brown [EMAIL PROTECTED] wrote: I'd love to see a generic implementation of RCU hashing that subsystems can then take advantage of. It's long been on the fun side of my todo list. The side I never get to :/. There's an active thread on netdev about implementing an RCU hash. I'd suggest a 2-left (or possibly even k-left) hash for statistical reasons discussed briefly there, and in greater depth in a paper by Michael Mitzenmacher at www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/iproute.ps. Despite his paper's emphasis on hardware parallelism, there's a bigger win associated with Poisson statistics and decreasing occupation fraction (and therefore collision probability) in successive hashes. Thanks for the link, I'll take a look. I'm *pretty* sure that my algorithm will work with any kind of data structure. It could even change from one to another completely different data structure. The important point is that, when a switch is in progress, the data structure effectively becomes the union of the old and the new. Readers insert items only into the new structure, but it is implicit that they have checked for collisions in the union (in the case where collisions are possible). And finally, (you probably gathered this, but I'll just make it clear), I haven't implemented a lockless hash lookup. The dcache already has this. My algorithm is a lockless dynamic data structure switch, rather than having anything specifically to do with a specific type of hash, or even a hash at all. So yes, it should be applicable to resizing a 2-left hash or whatever. Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Sat, Feb 24, 2007 at 02:26:02AM +0100, Nick Piggin wrote: On Fri, Feb 23, 2007 at 09:25:28AM -0800, Zach Brown wrote: On Feb 23, 2007, at 7:37 AM, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. Can you compare what you've done to the design that Paul and David talked about a year ago? http://lkml.org/lkml/2006/1/30/74 Thanks for the link, I wasn't aware of Paul's algorithm before. I guess most variants are going to have a double pointer scheme, so they are similar in that regard. I think Paul's is quite complex in moving entries to the new table, and it looks like it requires a lot of grace periods. I avoid all that by using the seqlock. It wasn't clear to me how Paul handled the case where an item is present in the not_current table, but the lookup misses it when it gets moved between the tables. It is a little tricky to follow the find details because it is not in code or pseudo code format. I guess the other thing is that Paul's seems quite hash specific, wheras mine does not have to be tied to any particular data strcture. All you need it to know how to perform a lookup on both your old and new data structure, and the same algorithm is basically applicable to switching between any 2 data structures. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, 23 Feb 2007 16:37:43 +0100 Nick Piggin [EMAIL PROTECTED] wrote: +static void dcache_hash_resize(unsigned int new_shift); +static void mod_nr_dentry(int mod) +{ + unsigned long dentry_size; + + dentry_stat.nr_dentry += mod; + + dentry_size = 1 dentry_hash-shift; + if (unlikely(dentry_stat.nr_dentry dentry_size+(dentry_size1))) + dcache_hash_resize(dentry_hash-shift+1); + else if (unlikely(dentry_stat.nr_dentry (dentry_size1))) + dcache_hash_resize(dentry_hash-shift-1); +} + Do we need to do this kind of resizing in implicit automatic way ? I think it's good to show contention rate by /proc and add sysctl for resize this. -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. The primitive heuristic is that the hash size is doubled when the number of entries reaches 150% the hash size, and halved when the number is 50%. It should also be able to shrink under memory pressure, and scale up as large as we go. You would be better served by a data structure different from a hashtable. -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Fri, Feb 23, 2007 at 08:24:44PM -0800, William Lee Irwin III wrote: On Fri, Feb 23, 2007 at 04:37:43PM +0100, Nick Piggin wrote: The dentry hash uses up 8MB for 1 million entries on my 4GB system is one of the biggest wasters of memory for me. Because I rarely have more than one or two hundred thousand dentries. And that's with several kernel trees worth of entries. Most desktop and probably even many types of servers will only use a fraction of that. So I introduce a new method for resizing hash tables with RCU, and apply that to the dentry hash. The primitive heuristic is that the hash size is doubled when the number of entries reaches 150% the hash size, and halved when the number is 50%. It should also be able to shrink under memory pressure, and scale up as large as we go. You would be better served by a data structure different from a hashtable. Possibly. Note that I wasn't intending to rewrite the dcache hash specifically, but just demonstrate my lockless dynamic data structure switch, and the dentry hash was the most obvious candidate. But considering that it is lockless, and basically free in the fastpath (ie. a single easily-predicted branch), then I'm better served by an dynamically sized dynamic hash table than an inappropriately sized one. Well, almost. The vmalloc issue is the downside, of course. But if I'm on a NUMA that is doing vmalloc hashes anyway, then there is little downside. Out of curiosity, what better data structure do you have in mind for the dentry hash? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc][patch] dynamic resizing dentry hash using RCU
On Sat, Feb 24, 2007 at 01:07:23PM +0900, KAMEZAWA Hiroyuki wrote: On Fri, 23 Feb 2007 16:37:43 +0100 Nick Piggin [EMAIL PROTECTED] wrote: +static void dcache_hash_resize(unsigned int new_shift); +static void mod_nr_dentry(int mod) +{ + unsigned long dentry_size; + + dentry_stat.nr_dentry += mod; + + dentry_size = 1 dentry_hash-shift; + if (unlikely(dentry_stat.nr_dentry dentry_size+(dentry_size1))) + dcache_hash_resize(dentry_hash-shift+1); + else if (unlikely(dentry_stat.nr_dentry (dentry_size1))) + dcache_hash_resize(dentry_hash-shift-1); +} + Do we need to do this kind of resizing in implicit automatic way ? I think it's good to show contention rate by /proc and add sysctl for resize this. Well having the kernel automatically adjust to the workload is better than a sysctl. But it could well be the case that these simple heuristics are bad (though they're fine for my system). But a manual override is a good idea as well. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/