Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 January 2008 11:49:25 -0600, Matt Mackall wrote: > > b) grouping objects of the same -type- (not size) together should mean > they have similar lifetimes and thereby keep fragmentation low > > (b) is known to be false, you just have to look at our dcache and icache > pinning. (b) is half-true, actually. The grouping by lifetime makes a lot of sense. LogFS has a similar problem to slabs (only full segments are useful, a single object can pin the segment). And when I grouped my objects very roughly by their life expectency, the impact was *HUGE*! In both cases, you want slabs/segments that are either close to 100% full or close to 0% full. It matters a lot when you have to move objects around and I would bet it matters even more when you cannot move objects and the slab just remains pinned. So just because the type alone is a relatively bad heuristic for life expectency does not make the concept false. Bonwick was onto something. He just failed in picking a good heuristic. Quite likely spreading by type was even a bonus when slab was developed, because even such a crude heuristic is slightly better than completely randomized lifetimes. I've been meaning to split the dentry cache into 2-3 seperate ones for a while and kept spending my time elsewhere. But I remain convinced that this will make a measurable difference. Jörn -- Never argue with idiots - first they drag you down to their level, then they beat you with experience. -- unknown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Andi Kleen wrote: > I did essentially that for my GBpages hugetlbfs patchkit. GB pages are already > beyond MAX_ORDER and increasing MAX_ORDER didn't seem attractive because > it would require aligning the zones all to 1GB which would quite nasty. I am very very interested in that work and I could not find it when I looked for it a couple of weeks back. Can you sent me a copy? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: > Well, I think we'd still have the same page size, in the sense that we'd > have a struct page for every hardware page and we'd still have hardware > page-sized pages in the page cache. We'd just change how we allocated > them. Right now we've got a stack that looks like: We would not change the hardware page. Cannot do that. But we would have preferential threadment for 64k and 2M pages in the page allocator? > buddy / page allocator > SL*B allocator > kmalloc > > And we'd change that to: > > buddy allocator > SL*B allocator > page allocator / kmalloc > > So get_free_page() would still hand you back a hardware page, it would > just do it through SL*B. Hmm Not sure what effect this would have. We already have the pcp's that have a similar effect. > > It would decrease listlock effect drastically for SLUB. > > Not sure what you're referring to here. Allocations in 64k chunks means 16 times less basic allocation blocks to manage for the slab allocators. So the metadata to be maintained by the allocators is reduces by that factor. SLUB only needs to touch the list_lock (in some situations like a free to a non cpu slab) if a block becomes completely empty or is going from fully allocated to partially allocated. The larger the block size the more objects are in a block and the less of these actions that need a per node lock are needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
> - huge pages (superpages for those crazy db people) > >Just a simple linked list of these things is fine, we'd never care >about coalescing large pages together anyway. I did essentially that for my GBpages hugetlbfs patchkit. GB pages are already beyond MAX_ORDER and increasing MAX_ORDER didn't seem attractive because it would require aligning the zones all to 1GB which would quite nasty. So it just grabbed them out of bootmem early and managed them in a per node list. Not sure it's a good idea for 2MB pages though. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Linus Torvalds wrote: > It's not even clear that a buddy allocator even for the high-order pages > is at all the right choice. Almost nobody actually wants >64kB blocks, and > the ones that *do* want bigger allocations tend to want *much* bigger > ones, so it's quite possible that it could be worth it to have something > like a three-level allocator: Excellent! I am definitely on board with this. > - huge pages (superpages for those crazy db people) > >Just a simple linked list of these things is fine, we'd never care >about coalescing large pages together anyway. > > - "large pages" (on the order of ~64kB) - with *perhaps* a buddy bitmap >setup to try to coalesce back into huge-pages, but more likely just >admitting that you'd need something like migration to ever get back a >hugepage that got split into large-pages. > >So maybe a simple bitmap allocator per huge-page for large pages. Say >you have a 4MB huge-page, and just a 64-bit free-bitmap per huge-page >when you split it into large pages. > > - slab/slub/slob for anything else, and "get_free_page()" ends up being >just a shorthand for saying "naturally aligned kmalloc of size >"PAGE_SIZE< > and maybe it would all work out ok. Hmmm... a 3 level allocator? Basically we would have BASE_PAGE STANDARD_PAGE and HUGE_PAGE? We could simply extend the page allocator to have 3 pcp lists for these sizes and go from there? Thinking about the arches this would mean BASE_PAGE STANDARD_PAGE HUGE_PAGE x86_64 4k 64k 2M i3864k 16k 4M ia6416k 256k1G ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 11:24 -0800, Christoph Lameter wrote: > On Thu, 10 Jan 2008, Matt Mackall wrote: > > > One idea I've been kicking around is pushing the boundary for the buddy > > allocator back a bit (to 64k, say) and using SL*B under that. The page > > allocators would call into buddy for larger than 64k (rare!) and SL*B > > otherwise. This would let us greatly improve our handling of things like > > task structs and skbs and possibly also things like 8k stacks and jumbo > > frames. As SL*B would never be competing with the page allocator for > > contiguous pages (the buddy allocator's granularity would be 64k), I > > don't think this would exacerbate the page-level fragmentation issues. > > This would create another large page size (and that would have my > enthusiastic support). Well, I think we'd still have the same page size, in the sense that we'd have a struct page for every hardware page and we'd still have hardware page-sized pages in the page cache. We'd just change how we allocated them. Right now we've got a stack that looks like: buddy / page allocator SL*B allocator kmalloc And we'd change that to: buddy allocator SL*B allocator page allocator / kmalloc So get_free_page() would still hand you back a hardware page, it would just do it through SL*B. > It would decrease listlock effect drastically for SLUB. Not sure what you're referring to here. > However, isnt this is basically confessing that the page allocator is not > efficient for 4k page allocations? Well I wasn't thinking of doing this for any performance reasons. But there certainly could be some. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: > > One idea I've been kicking around is pushing the boundary for the buddy > allocator back a bit (to 64k, say) and using SL*B under that. The page > allocators would call into buddy for larger than 64k (rare!) and SL*B > otherwise. This would let us greatly improve our handling of things like > task structs and skbs and possibly also things like 8k stacks and jumbo > frames. Yes, something like that may well be reasonable. It could possibly solve some of the issues for bigger page cache sizes too, but one issue is that many things actually end up having those power-of-two alignment constraints too - so an 8kB allocation would often still have to be naturally aligned, which then removes some of the freedom. > Crazy? It sounds like it might be worth trying out - there's just no way to know how well it would work. Buddy allocators sure as hell have problems too, no question about that. It's not like the page allocator is perfect. It's not even clear that a buddy allocator even for the high-order pages is at all the right choice. Almost nobody actually wants >64kB blocks, and the ones that *do* want bigger allocations tend to want *much* bigger ones, so it's quite possible that it could be worth it to have something like a three-level allocator: - huge pages (superpages for those crazy db people) Just a simple linked list of these things is fine, we'd never care about coalescing large pages together anyway. - "large pages" (on the order of ~64kB) - with *perhaps* a buddy bitmap setup to try to coalesce back into huge-pages, but more likely just admitting that you'd need something like migration to ever get back a hugepage that got split into large-pages. So maybe a simple bitmap allocator per huge-page for large pages. Say you have a 4MB huge-page, and just a 64-bit free-bitmap per huge-page when you split it into large pages. - slab/slub/slob for anything else, and "get_free_page()" ends up being just a shorthand for saying "naturally aligned kmalloc of size "PAGE_SIZE
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: > > I agree. Crap too. We removed the destructors. The constructors are needed > > so that objects in slab pages always have a definite state. That is f.e. > > necessary for slab defragmentation because it has to be able to inspect an > > object at an arbitrary time and either remove it or move it to another > > slab page. > > Are you saying that the state of -freed- objects matters for your active > defragmentation? That's odd. The state of the object immediately after it is allocated matters for a defrag solution. A kmalloc leads to an object in a undetermined state if you have no constructor. Code will then initialize the object but defrag f.e. must be able to inspect the object before. This means either that the freed object has a defined state or that kmalloc establishes that state before the object is marked as allocated. > > Constructors also make sense because the initialization of a cache object > > may be expensive. Initializing list heads and spinlocks can take some code > > and that code can be omitted if objects have a definite state when they > > are free. We saw that when measuring the buffer_head constructors effect > > on performance. > > Hmm. SLOB proves that you don't need to segregate objects based on > constructors, so you could combine even slabs that have constructors and > just delay construction until allocation. I'm surprised constructors > have measurable advantage.. That is not working if you need to inspect allocated objects at any time for a defrag solution. All objects in a defragmentable slab need to have a consistent object state if allocated. If you have some without constructors then these object have no defined state and may contain arbitrary bytes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: > One idea I've been kicking around is pushing the boundary for the buddy > allocator back a bit (to 64k, say) and using SL*B under that. The page > allocators would call into buddy for larger than 64k (rare!) and SL*B > otherwise. This would let us greatly improve our handling of things like > task structs and skbs and possibly also things like 8k stacks and jumbo > frames. As SL*B would never be competing with the page allocator for > contiguous pages (the buddy allocator's granularity would be 64k), I > don't think this would exacerbate the page-level fragmentation issues. This would create another large page size (and that would have my enthusiastic support). It would decrease listlock effect drastically for SLUB. Even the initial simplistic implementation of SLUB was superior on the database transaction tests (I think it was up ~1%) on IA64 from the get go. Likely due to the larger 16k page size there. The larger page size could also be used for the page cache (ducking and running.)? A 64k page size that could be allocated without zone locks would be very good for SLUB. However, isnt this is basically confessing that the page allocator is not efficient for 4k page allocations? I have seen some weaknesses there. The overhead in the allocation path in particular is bad and I was thinking about applying the same ideas used in SLUB also to the page allocator in order to bring the cycle count down from 500-1000 to 60 or so. Since both SLUB and SLOB use the page allocator for allocs >PAGE_SIZE this would not only benefit the general kernel but also the slab allocations. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 11:16 -0800, Christoph Lameter wrote: > On Thu, 10 Jan 2008, Matt Mackall wrote: > > > Here I'm going to differ with you. The premises of the SLAB concept > > (from the original paper) are: > > > > a) fragmentation of conventional allocators gets worse over time > > Even fragmentation of SLAB/SLUB gets worses over time. That is why we need > a defrag solution. > > > b) grouping objects of the same -type- (not size) together should mean > > they have similar lifetimes and thereby keep fragmentation low > > I agree that is crap. The lifetimes argument is mostly only exploitable in > benchmarks. That is why SLUB just groups them by size if possible. > > > d) constructors and destructors are cache-friendly > > I agree. Crap too. We removed the destructors. The constructors are needed > so that objects in slab pages always have a definite state. That is f.e. > necessary for slab defragmentation because it has to be able to inspect an > object at an arbitrary time and either remove it or move it to another > slab page. Are you saying that the state of -freed- objects matters for your active defragmentation? That's odd. > Constructors also make sense because the initialization of a cache object > may be expensive. Initializing list heads and spinlocks can take some code > and that code can be omitted if objects have a definite state when they > are free. We saw that when measuring the buffer_head constructors effect > on performance. Hmm. SLOB proves that you don't need to segregate objects based on constructors, so you could combine even slabs that have constructors and just delay construction until allocation. I'm surprised constructors have measurable advantage.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: > Here I'm going to differ with you. The premises of the SLAB concept > (from the original paper) are: > > a) fragmentation of conventional allocators gets worse over time Even fragmentation of SLAB/SLUB gets worses over time. That is why we need a defrag solution. > b) grouping objects of the same -type- (not size) together should mean > they have similar lifetimes and thereby keep fragmentation low I agree that is crap. The lifetimes argument is mostly only exploitable in benchmarks. That is why SLUB just groups them by size if possible. > d) constructors and destructors are cache-friendly I agree. Crap too. We removed the destructors. The constructors are needed so that objects in slab pages always have a definite state. That is f.e. necessary for slab defragmentation because it has to be able to inspect an object at an arbitrary time and either remove it or move it to another slab page. Constructors also make sense because the initialization of a cache object may be expensive. Initializing list heads and spinlocks can take some code and that code can be omitted if objects have a definite state when they are free. We saw that when measuring the buffer_head constructors effect on performance. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 10:28 -0800, Linus Torvalds wrote: > > On Thu, 10 Jan 2008, Matt Mackall wrote: > > > > > > (I'm not a fan of slabs per se - I think all the constructor/destructor > > > crap is just that: total crap - but the size/type binning is a big deal, > > > and I think SLOB was naïve to think a pure first-fit makes any sense. Now > > > you guys are size-binning by just two or three bins, and it seems to make > > > a difference for some loads, but compared to SLUB/SLAB it's a total hack). > > > > Here I'm going to differ with you. The premises of the SLAB concept > > (from the original paper) are: > > I really don't think we differ. > > The advantage of slab was largely the binning by type. Everything else was > just a big crock. SLUB does the binning better, by really just making the > type binning be about what really matters - the *size* of the type. > > So my argument was that the type/size binning makes sense (size more so > than type), but the rest of the original Sun arguments for why slab was > such a great idea were basically just the crap. > > Hard type binning was a mistake (but needed by slab due to the idiotic > notion that constructors/destructors are "good for caches" - bleargh). I > suspect that hard size binning is a mistake too (ie there are probably > cases where you do want to split unused bigger size areas), but the fact > that all of our allocators are two-level (with the page allocator acting > as a size-agnostic free space) may help it somewhat. > > And yes, I do agree that any current allocator has problems with the big > sizes that don't fit well into a page or two (like task_struct). That > said, most of those don't have lots of allocations under many normal > circumstances (even if there are uses that will really blow them up). > > The *big* slab users at least for me tend to be ext3_inode_cache and > dentry. Everything else is orders of magnitude less. And of the two bad > ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting > in ~10% fragmentation just due to the page thing, regardless of whether > you use an order-0 or order-1 page allocation). > > Of course, dentries fit better in a page (due to being smaller), but then > the bigger number of dentries per page make it harder to actually free > pages, so then you get fragmentation from that. Oh well. You can't win. One idea I've been kicking around is pushing the boundary for the buddy allocator back a bit (to 64k, say) and using SL*B under that. The page allocators would call into buddy for larger than 64k (rare!) and SL*B otherwise. This would let us greatly improve our handling of things like task structs and skbs and possibly also things like 8k stacks and jumbo frames. As SL*B would never be competing with the page allocator for contiguous pages (the buddy allocator's granularity would be 64k), I don't think this would exacerbate the page-level fragmentation issues. Crazy? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: > > > > (I'm not a fan of slabs per se - I think all the constructor/destructor > > crap is just that: total crap - but the size/type binning is a big deal, > > and I think SLOB was naïve to think a pure first-fit makes any sense. Now > > you guys are size-binning by just two or three bins, and it seems to make > > a difference for some loads, but compared to SLUB/SLAB it's a total hack). > > Here I'm going to differ with you. The premises of the SLAB concept > (from the original paper) are: I really don't think we differ. The advantage of slab was largely the binning by type. Everything else was just a big crock. SLUB does the binning better, by really just making the type binning be about what really matters - the *size* of the type. So my argument was that the type/size binning makes sense (size more so than type), but the rest of the original Sun arguments for why slab was such a great idea were basically just the crap. Hard type binning was a mistake (but needed by slab due to the idiotic notion that constructors/destructors are "good for caches" - bleargh). I suspect that hard size binning is a mistake too (ie there are probably cases where you do want to split unused bigger size areas), but the fact that all of our allocators are two-level (with the page allocator acting as a size-agnostic free space) may help it somewhat. And yes, I do agree that any current allocator has problems with the big sizes that don't fit well into a page or two (like task_struct). That said, most of those don't have lots of allocations under many normal circumstances (even if there are uses that will really blow them up). The *big* slab users at least for me tend to be ext3_inode_cache and dentry. Everything else is orders of magnitude less. And of the two bad ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting in ~10% fragmentation just due to the page thing, regardless of whether you use an order-0 or order-1 page allocation). Of course, dentries fit better in a page (due to being smaller), but then the bigger number of dentries per page make it harder to actually free pages, so then you get fragmentation from that. Oh well. You can't win. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
> I would suggest that if you guys are really serious about memory use, try > to do a size-based heap thing, and do best-fit in that heap. Or just some iirc best fit usually also has some nasty long term fragmentation behaviour. That is why it is usually also not used. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 08:13 -0800, Linus Torvalds wrote: > > On Thu, 10 Jan 2008, Pekka J Enberg wrote: > > > > We probably don't have the same version of GCC which perhaps affects > > memory layout (struct sizes) and thus allocation patterns? > > No, struct sizes will not change with compiler versions - that would > create binary incompatibilities for libraries etc. > > So apart from the kernel itself working around some old gcc bugs by making > spinlocks have different size depending on the compiler version, sizes of > structures should be the same (as long as the configuration is the same, > of course). > > However, a greedy first-fit allocator will be *very* sensitive to > allocation pattern differences, so timing will probably make a big > difference. In contrast, SLUB and SLAB both use fixed sizes per allocation > queue, which makes them almost totally impervious to any allocation > pattern from different allocation sizes (they still end up caring about > the pattern *within* one size, but those tend to be much stabler). > > There really is a reason why traditional heaps with first-fit are almost > never used for any real loads. > > (I'm not a fan of slabs per se - I think all the constructor/destructor > crap is just that: total crap - but the size/type binning is a big deal, > and I think SLOB was naïve to think a pure first-fit makes any sense. Now > you guys are size-binning by just two or three bins, and it seems to make > a difference for some loads, but compared to SLUB/SLAB it's a total hack). Here I'm going to differ with you. The premises of the SLAB concept (from the original paper) are: a) fragmentation of conventional allocators gets worse over time b) grouping objects of the same -type- (not size) together should mean they have similar lifetimes and thereby keep fragmentation low c) slabs can be O(1) d) constructors and destructors are cache-friendly There's some truth to (a), but the problem has been quite overstated, pre-SLAB Linux kernels and countless other systems run for years. And (b) is known to be false, you just have to look at our dcache and icache pinning. (c) of course is a whole separate argument and often trumps the others. And (d) is pretty much crap now too. And as it happens, SLOB basically always beats SLAB on size. SLUB only wins when it starts merging caches of different -types- based on -size-, effectively throwing out the whole (b) concept. Which is good, because its wrong. So SLUB starts to look a lot like a purely size-binned allocator. > I would suggest that if you guys are really serious about memory use, try > to do a size-based heap thing, and do best-fit in that heap. Or just some > really simple size-based binning, eg > > if (size > 2*PAGE_SIZE) > goto page_allocator; I think both SLOB and SLUB punt everything >= PAGE_SIZE off to the page allocator. > bin = lookup_bin[(size+31) >> 5] > > or whatever. Because first-fit is *known* to be bad. It is known to be crummy, but it is NOT known to actually be worse than the SLAB/SLUB approach when you consider both internal and external fragmentation. Power-of-two ala SLAB kmalloc basically guarantees your internal fragmentation will be 30% or so. In fact, first-fit is known to be pretty good if the ratio of object size to arena size is reasonable. I've already shown a system booting with 6% overhead compared to SLUB's 35% (and SLAB's nearly 70%). The fragmentation measurements for the small object list are in fact quite nice. Not the best benchmark, but it certainly shows that there's substantial room for improvement. Where it hurts is larger objects (task structs, SKBs), which are also a problem for SLAB/SLUB and I don't think any variation on the search order is going to help there. If we threw 64k pages at it, those problems might very well disappear. Hell, it's pretty tempting to throw vmalloc at it, especially on small boxes.. > At try to change it to address-ordered first-fit or something (which is > much more complex than just plain LIFO, but hey, that's life). Will think about that. Most of the literature here is of limited usefulness - even Knuth didn't look at 4k heaps, let alone collections of them. > I haven't checked much, but I *think* SLOB is just basic first-fit > (perhaps the "next-fit" variation?) Next-fit is known to be EVEN WORSE > than the simple first-fit when it comes to fragmentation (so no, Knuth was > not always right - let's face it, much of Knuth is simply outdated). The SLOB allocator is inherently two-level. I'm using a next-fit-like approach to decide which heap (page) to use and first-fit within that heap. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Pekka J Enberg wrote: > > We probably don't have the same version of GCC which perhaps affects > memory layout (struct sizes) and thus allocation patterns? No, struct sizes will not change with compiler versions - that would create binary incompatibilities for libraries etc. So apart from the kernel itself working around some old gcc bugs by making spinlocks have different size depending on the compiler version, sizes of structures should be the same (as long as the configuration is the same, of course). However, a greedy first-fit allocator will be *very* sensitive to allocation pattern differences, so timing will probably make a big difference. In contrast, SLUB and SLAB both use fixed sizes per allocation queue, which makes them almost totally impervious to any allocation pattern from different allocation sizes (they still end up caring about the pattern *within* one size, but those tend to be much stabler). There really is a reason why traditional heaps with first-fit are almost never used for any real loads. (I'm not a fan of slabs per se - I think all the constructor/destructor crap is just that: total crap - but the size/type binning is a big deal, and I think SLOB was naïve to think a pure first-fit makes any sense. Now you guys are size-binning by just two or three bins, and it seems to make a difference for some loads, but compared to SLUB/SLAB it's a total hack). I would suggest that if you guys are really serious about memory use, try to do a size-based heap thing, and do best-fit in that heap. Or just some really simple size-based binning, eg if (size > 2*PAGE_SIZE) goto page_allocator; bin = lookup_bin[(size+31) >> 5]; or whatever. Because first-fit is *known* to be bad. At try to change it to address-ordered first-fit or something (which is much more complex than just plain LIFO, but hey, that's life). I haven't checked much, but I *think* SLOB is just basic first-fit (perhaps the "next-fit" variation?) Next-fit is known to be EVEN WORSE than the simple first-fit when it comes to fragmentation (so no, Knuth was not always right - let's face it, much of Knuth is simply outdated). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 12:54 +0200, Pekka J Enberg wrote: > Hi Matt, > > On Thu, 10 Jan 2008, Pekka J Enberg wrote: > > I'll double check the results for SLUB next but it seems obvious that your > > patches are a net gain for SLOB and should be applied. One problem though > > with SLOB seems to be that its memory efficiency is not so stable. Any > > ideas why that is? We're seeing different numbers in each allocator indicating that the ordering of allocations is not stable. When fragmentation occurs, it magnifies the underlying instability. On my config, where the split list combats fragmentation extremely effectively, the stability is quite good. Perhaps I'll add a printk to allocs and generate some size histograms. lguest + grep = relayfs for dummies. > Ok, I did that. The number are stable and reproducible. In fact, the > average for SLUB is within 1 KB of the previous numbers. So, we have the > same .config, the same userspace, and the same hypervisor, so what's the > difference here? I've got: gcc version 4.2.3 20080102 (prerelease) (Debian 4.2.2-5) BusyBox v1.2.1 (2006.11.01-11:21+) Built-in shell (ash) -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
Hi Matt, On Thu, 10 Jan 2008, Pekka J Enberg wrote: > I'll double check the results for SLUB next but it seems obvious that your > patches are a net gain for SLOB and should be applied. One problem though > with SLOB seems to be that its memory efficiency is not so stable. Any > ideas why that is? Ok, I did that. The number are stable and reproducible. In fact, the average for SLUB is within 1 KB of the previous numbers. So, we have the same .config, the same userspace, and the same hypervisor, so what's the difference here? We probably don't have the same version of GCC which perhaps affects memory layout (struct sizes) and thus allocation patterns? I have included ver_linux from my laptop here: Linux haji 2.6.24-rc6 #21 SMP Thu Jan 10 12:30:59 EET 2008 i686 GNU/Linux Gnu C 4.1.3 Gnu make 3.81 binutils 2.18 util-linux 2.13 mount 2.13 module-init-tools 3.3-pre2 e2fsprogs 1.40.2 reiserfsprogs 3.6.19 pcmciautils014 PPP2.4.4 Linux C Library2.6.1 Dynamic linker (ldd) 2.6.1 Procps 3.2.7 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.97 udev 113 wireless-tools 29 Modules Loaded af_packet ipv6 binfmt_misc rfcomm l2cap uinput radeon drm acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative dock container joydev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device pcmcia psmouse snd ipw2200 serio_raw pcspkr hci_usb soundcore snd_page_alloc bluetooth ieee80211 ieee80211_crypt yenta_socket rsrc_nonstatic pcmcia_core iTCO_wdt iTCO_vendor_support video output shpchp pci_hotplug intel_agp button agpgart thinkpad_acpi nvram evdev sg sd_mod ata_piix floppy ata_generic libata scsi_mod e1000 ehci_hcd uhci_hcd usbcore thermal processor fan fuse Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Wed, 9 Jan 2008, Matt Mackall wrote: > > slob: split free list by size > [snip] Reviewed-by: Pekka Enberg <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
Hi Matt, On Wed, 9 Jan 2008, Matt Mackall wrote: > Huh, that's a fairly negligible change on your system. Is that with or > without the earlier patch? That doesn't appear to change much here. > Guess I'll have to clean up my stats patch and send it to you. Ok, if I apply both of the patches, I get better results for SLOB: [ the minimum, maximum, and average are captured from 10 individual runs ] Total Free (kB) Used (kB) (kB)min max average min max average SLUB (no debug) 26536 23868 23892 23877.6 2644 2668 2658.4 SLOB (both patches) 26548 23612 23860 23766.4 2688 2936 2781.6 SLOB (two lists) 26548 23456 23708 23603.2 2840 3092 2944.8 SLOB (vanilla)26548 23472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 26544 23316 23364 23343.2 3180 3228 3200.8 SLOB (merge fix) 26548 23260 23728 23385.2 2820 3288 3162.8 SLUB (with debug) 26484 23120 23136 23127.2 3348 3364 3356.8 I'll double check the results for SLUB next but it seems obvious that your patches are a net gain for SLOB and should be applied. One problem though with SLOB seems to be that its memory efficiency is not so stable. Any ideas why that is? Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
Hi Matt, On Wed, 9 Jan 2008, Matt Mackall wrote: Huh, that's a fairly negligible change on your system. Is that with or without the earlier patch? That doesn't appear to change much here. Guess I'll have to clean up my stats patch and send it to you. Ok, if I apply both of the patches, I get better results for SLOB: [ the minimum, maximum, and average are captured from 10 individual runs ] Total Free (kB) Used (kB) (kB)min max average min max average SLUB (no debug) 26536 23868 23892 23877.6 2644 2668 2658.4 SLOB (both patches) 26548 23612 23860 23766.4 2688 2936 2781.6 SLOB (two lists) 26548 23456 23708 23603.2 2840 3092 2944.8 SLOB (vanilla)26548 23472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 26544 23316 23364 23343.2 3180 3228 3200.8 SLOB (merge fix) 26548 23260 23728 23385.2 2820 3288 3162.8 SLUB (with debug) 26484 23120 23136 23127.2 3348 3364 3356.8 I'll double check the results for SLUB next but it seems obvious that your patches are a net gain for SLOB and should be applied. One problem though with SLOB seems to be that its memory efficiency is not so stable. Any ideas why that is? Pekka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Wed, 9 Jan 2008, Matt Mackall wrote: slob: split free list by size [snip] Reviewed-by: Pekka Enberg [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
Hi Matt, On Thu, 10 Jan 2008, Pekka J Enberg wrote: I'll double check the results for SLUB next but it seems obvious that your patches are a net gain for SLOB and should be applied. One problem though with SLOB seems to be that its memory efficiency is not so stable. Any ideas why that is? Ok, I did that. The number are stable and reproducible. In fact, the average for SLUB is within 1 KB of the previous numbers. So, we have the same .config, the same userspace, and the same hypervisor, so what's the difference here? We probably don't have the same version of GCC which perhaps affects memory layout (struct sizes) and thus allocation patterns? I have included ver_linux from my laptop here: Linux haji 2.6.24-rc6 #21 SMP Thu Jan 10 12:30:59 EET 2008 i686 GNU/Linux Gnu C 4.1.3 Gnu make 3.81 binutils 2.18 util-linux 2.13 mount 2.13 module-init-tools 3.3-pre2 e2fsprogs 1.40.2 reiserfsprogs 3.6.19 pcmciautils014 PPP2.4.4 Linux C Library2.6.1 Dynamic linker (ldd) 2.6.1 Procps 3.2.7 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.97 udev 113 wireless-tools 29 Modules Loaded af_packet ipv6 binfmt_misc rfcomm l2cap uinput radeon drm acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative dock container joydev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device pcmcia psmouse snd ipw2200 serio_raw pcspkr hci_usb soundcore snd_page_alloc bluetooth ieee80211 ieee80211_crypt yenta_socket rsrc_nonstatic pcmcia_core iTCO_wdt iTCO_vendor_support video output shpchp pci_hotplug intel_agp button agpgart thinkpad_acpi nvram evdev sg sd_mod ata_piix floppy ata_generic libata scsi_mod e1000 ehci_hcd uhci_hcd usbcore thermal processor fan fuse Pekka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 12:54 +0200, Pekka J Enberg wrote: Hi Matt, On Thu, 10 Jan 2008, Pekka J Enberg wrote: I'll double check the results for SLUB next but it seems obvious that your patches are a net gain for SLOB and should be applied. One problem though with SLOB seems to be that its memory efficiency is not so stable. Any ideas why that is? We're seeing different numbers in each allocator indicating that the ordering of allocations is not stable. When fragmentation occurs, it magnifies the underlying instability. On my config, where the split list combats fragmentation extremely effectively, the stability is quite good. Perhaps I'll add a printk to allocs and generate some size histograms. lguest + grep = relayfs for dummies. Ok, I did that. The number are stable and reproducible. In fact, the average for SLUB is within 1 KB of the previous numbers. So, we have the same .config, the same userspace, and the same hypervisor, so what's the difference here? I've got: gcc version 4.2.3 20080102 (prerelease) (Debian 4.2.2-5) BusyBox v1.2.1 (2006.11.01-11:21+) Built-in shell (ash) -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Pekka J Enberg wrote: We probably don't have the same version of GCC which perhaps affects memory layout (struct sizes) and thus allocation patterns? No, struct sizes will not change with compiler versions - that would create binary incompatibilities for libraries etc. So apart from the kernel itself working around some old gcc bugs by making spinlocks have different size depending on the compiler version, sizes of structures should be the same (as long as the configuration is the same, of course). However, a greedy first-fit allocator will be *very* sensitive to allocation pattern differences, so timing will probably make a big difference. In contrast, SLUB and SLAB both use fixed sizes per allocation queue, which makes them almost totally impervious to any allocation pattern from different allocation sizes (they still end up caring about the pattern *within* one size, but those tend to be much stabler). There really is a reason why traditional heaps with first-fit are almost never used for any real loads. (I'm not a fan of slabs per se - I think all the constructor/destructor crap is just that: total crap - but the size/type binning is a big deal, and I think SLOB was naïve to think a pure first-fit makes any sense. Now you guys are size-binning by just two or three bins, and it seems to make a difference for some loads, but compared to SLUB/SLAB it's a total hack). I would suggest that if you guys are really serious about memory use, try to do a size-based heap thing, and do best-fit in that heap. Or just some really simple size-based binning, eg if (size 2*PAGE_SIZE) goto page_allocator; bin = lookup_bin[(size+31) 5]; or whatever. Because first-fit is *known* to be bad. At try to change it to address-ordered first-fit or something (which is much more complex than just plain LIFO, but hey, that's life). I haven't checked much, but I *think* SLOB is just basic first-fit (perhaps the next-fit variation?) Next-fit is known to be EVEN WORSE than the simple first-fit when it comes to fragmentation (so no, Knuth was not always right - let's face it, much of Knuth is simply outdated). Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 08:13 -0800, Linus Torvalds wrote: On Thu, 10 Jan 2008, Pekka J Enberg wrote: We probably don't have the same version of GCC which perhaps affects memory layout (struct sizes) and thus allocation patterns? No, struct sizes will not change with compiler versions - that would create binary incompatibilities for libraries etc. So apart from the kernel itself working around some old gcc bugs by making spinlocks have different size depending on the compiler version, sizes of structures should be the same (as long as the configuration is the same, of course). However, a greedy first-fit allocator will be *very* sensitive to allocation pattern differences, so timing will probably make a big difference. In contrast, SLUB and SLAB both use fixed sizes per allocation queue, which makes them almost totally impervious to any allocation pattern from different allocation sizes (they still end up caring about the pattern *within* one size, but those tend to be much stabler). There really is a reason why traditional heaps with first-fit are almost never used for any real loads. (I'm not a fan of slabs per se - I think all the constructor/destructor crap is just that: total crap - but the size/type binning is a big deal, and I think SLOB was naïve to think a pure first-fit makes any sense. Now you guys are size-binning by just two or three bins, and it seems to make a difference for some loads, but compared to SLUB/SLAB it's a total hack). Here I'm going to differ with you. The premises of the SLAB concept (from the original paper) are: a) fragmentation of conventional allocators gets worse over time b) grouping objects of the same -type- (not size) together should mean they have similar lifetimes and thereby keep fragmentation low c) slabs can be O(1) d) constructors and destructors are cache-friendly There's some truth to (a), but the problem has been quite overstated, pre-SLAB Linux kernels and countless other systems run for years. And (b) is known to be false, you just have to look at our dcache and icache pinning. (c) of course is a whole separate argument and often trumps the others. And (d) is pretty much crap now too. And as it happens, SLOB basically always beats SLAB on size. SLUB only wins when it starts merging caches of different -types- based on -size-, effectively throwing out the whole (b) concept. Which is good, because its wrong. So SLUB starts to look a lot like a purely size-binned allocator. I would suggest that if you guys are really serious about memory use, try to do a size-based heap thing, and do best-fit in that heap. Or just some really simple size-based binning, eg if (size 2*PAGE_SIZE) goto page_allocator; I think both SLOB and SLUB punt everything = PAGE_SIZE off to the page allocator. bin = lookup_bin[(size+31) 5] or whatever. Because first-fit is *known* to be bad. It is known to be crummy, but it is NOT known to actually be worse than the SLAB/SLUB approach when you consider both internal and external fragmentation. Power-of-two ala SLAB kmalloc basically guarantees your internal fragmentation will be 30% or so. In fact, first-fit is known to be pretty good if the ratio of object size to arena size is reasonable. I've already shown a system booting with 6% overhead compared to SLUB's 35% (and SLAB's nearly 70%). The fragmentation measurements for the small object list are in fact quite nice. Not the best benchmark, but it certainly shows that there's substantial room for improvement. Where it hurts is larger objects (task structs, SKBs), which are also a problem for SLAB/SLUB and I don't think any variation on the search order is going to help there. If we threw 64k pages at it, those problems might very well disappear. Hell, it's pretty tempting to throw vmalloc at it, especially on small boxes.. At try to change it to address-ordered first-fit or something (which is much more complex than just plain LIFO, but hey, that's life). Will think about that. Most of the literature here is of limited usefulness - even Knuth didn't look at 4k heaps, let alone collections of them. I haven't checked much, but I *think* SLOB is just basic first-fit (perhaps the next-fit variation?) Next-fit is known to be EVEN WORSE than the simple first-fit when it comes to fragmentation (so no, Knuth was not always right - let's face it, much of Knuth is simply outdated). The SLOB allocator is inherently two-level. I'm using a next-fit-like approach to decide which heap (page) to use and first-fit within that heap. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
I would suggest that if you guys are really serious about memory use, try to do a size-based heap thing, and do best-fit in that heap. Or just some iirc best fit usually also has some nasty long term fragmentation behaviour. That is why it is usually also not used. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: (I'm not a fan of slabs per se - I think all the constructor/destructor crap is just that: total crap - but the size/type binning is a big deal, and I think SLOB was naïve to think a pure first-fit makes any sense. Now you guys are size-binning by just two or three bins, and it seems to make a difference for some loads, but compared to SLUB/SLAB it's a total hack). Here I'm going to differ with you. The premises of the SLAB concept (from the original paper) are: I really don't think we differ. The advantage of slab was largely the binning by type. Everything else was just a big crock. SLUB does the binning better, by really just making the type binning be about what really matters - the *size* of the type. So my argument was that the type/size binning makes sense (size more so than type), but the rest of the original Sun arguments for why slab was such a great idea were basically just the crap. Hard type binning was a mistake (but needed by slab due to the idiotic notion that constructors/destructors are good for caches - bleargh). I suspect that hard size binning is a mistake too (ie there are probably cases where you do want to split unused bigger size areas), but the fact that all of our allocators are two-level (with the page allocator acting as a size-agnostic free space) may help it somewhat. And yes, I do agree that any current allocator has problems with the big sizes that don't fit well into a page or two (like task_struct). That said, most of those don't have lots of allocations under many normal circumstances (even if there are uses that will really blow them up). The *big* slab users at least for me tend to be ext3_inode_cache and dentry. Everything else is orders of magnitude less. And of the two bad ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting in ~10% fragmentation just due to the page thing, regardless of whether you use an order-0 or order-1 page allocation). Of course, dentries fit better in a page (due to being smaller), but then the bigger number of dentries per page make it harder to actually free pages, so then you get fragmentation from that. Oh well. You can't win. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 10:28 -0800, Linus Torvalds wrote: On Thu, 10 Jan 2008, Matt Mackall wrote: (I'm not a fan of slabs per se - I think all the constructor/destructor crap is just that: total crap - but the size/type binning is a big deal, and I think SLOB was naïve to think a pure first-fit makes any sense. Now you guys are size-binning by just two or three bins, and it seems to make a difference for some loads, but compared to SLUB/SLAB it's a total hack). Here I'm going to differ with you. The premises of the SLAB concept (from the original paper) are: I really don't think we differ. The advantage of slab was largely the binning by type. Everything else was just a big crock. SLUB does the binning better, by really just making the type binning be about what really matters - the *size* of the type. So my argument was that the type/size binning makes sense (size more so than type), but the rest of the original Sun arguments for why slab was such a great idea were basically just the crap. Hard type binning was a mistake (but needed by slab due to the idiotic notion that constructors/destructors are good for caches - bleargh). I suspect that hard size binning is a mistake too (ie there are probably cases where you do want to split unused bigger size areas), but the fact that all of our allocators are two-level (with the page allocator acting as a size-agnostic free space) may help it somewhat. And yes, I do agree that any current allocator has problems with the big sizes that don't fit well into a page or two (like task_struct). That said, most of those don't have lots of allocations under many normal circumstances (even if there are uses that will really blow them up). The *big* slab users at least for me tend to be ext3_inode_cache and dentry. Everything else is orders of magnitude less. And of the two bad ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting in ~10% fragmentation just due to the page thing, regardless of whether you use an order-0 or order-1 page allocation). Of course, dentries fit better in a page (due to being smaller), but then the bigger number of dentries per page make it harder to actually free pages, so then you get fragmentation from that. Oh well. You can't win. One idea I've been kicking around is pushing the boundary for the buddy allocator back a bit (to 64k, say) and using SL*B under that. The page allocators would call into buddy for larger than 64k (rare!) and SL*B otherwise. This would let us greatly improve our handling of things like task structs and skbs and possibly also things like 8k stacks and jumbo frames. As SL*B would never be competing with the page allocator for contiguous pages (the buddy allocator's granularity would be 64k), I don't think this would exacerbate the page-level fragmentation issues. Crazy? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: Here I'm going to differ with you. The premises of the SLAB concept (from the original paper) are: a) fragmentation of conventional allocators gets worse over time Even fragmentation of SLAB/SLUB gets worses over time. That is why we need a defrag solution. b) grouping objects of the same -type- (not size) together should mean they have similar lifetimes and thereby keep fragmentation low I agree that is crap. The lifetimes argument is mostly only exploitable in benchmarks. That is why SLUB just groups them by size if possible. d) constructors and destructors are cache-friendly I agree. Crap too. We removed the destructors. The constructors are needed so that objects in slab pages always have a definite state. That is f.e. necessary for slab defragmentation because it has to be able to inspect an object at an arbitrary time and either remove it or move it to another slab page. Constructors also make sense because the initialization of a cache object may be expensive. Initializing list heads and spinlocks can take some code and that code can be omitted if objects have a definite state when they are free. We saw that when measuring the buffer_head constructors effect on performance. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: One idea I've been kicking around is pushing the boundary for the buddy allocator back a bit (to 64k, say) and using SL*B under that. The page allocators would call into buddy for larger than 64k (rare!) and SL*B otherwise. This would let us greatly improve our handling of things like task structs and skbs and possibly also things like 8k stacks and jumbo frames. As SL*B would never be competing with the page allocator for contiguous pages (the buddy allocator's granularity would be 64k), I don't think this would exacerbate the page-level fragmentation issues. This would create another large page size (and that would have my enthusiastic support). It would decrease listlock effect drastically for SLUB. Even the initial simplistic implementation of SLUB was superior on the database transaction tests (I think it was up ~1%) on IA64 from the get go. Likely due to the larger 16k page size there. The larger page size could also be used for the page cache (ducking and running.)? A 64k page size that could be allocated without zone locks would be very good for SLUB. However, isnt this is basically confessing that the page allocator is not efficient for 4k page allocations? I have seen some weaknesses there. The overhead in the allocation path in particular is bad and I was thinking about applying the same ideas used in SLUB also to the page allocator in order to bring the cycle count down from 500-1000 to 60 or so. Since both SLUB and SLOB use the page allocator for allocs PAGE_SIZE this would not only benefit the general kernel but also the slab allocations. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 11:16 -0800, Christoph Lameter wrote: On Thu, 10 Jan 2008, Matt Mackall wrote: Here I'm going to differ with you. The premises of the SLAB concept (from the original paper) are: a) fragmentation of conventional allocators gets worse over time Even fragmentation of SLAB/SLUB gets worses over time. That is why we need a defrag solution. b) grouping objects of the same -type- (not size) together should mean they have similar lifetimes and thereby keep fragmentation low I agree that is crap. The lifetimes argument is mostly only exploitable in benchmarks. That is why SLUB just groups them by size if possible. d) constructors and destructors are cache-friendly I agree. Crap too. We removed the destructors. The constructors are needed so that objects in slab pages always have a definite state. That is f.e. necessary for slab defragmentation because it has to be able to inspect an object at an arbitrary time and either remove it or move it to another slab page. Are you saying that the state of -freed- objects matters for your active defragmentation? That's odd. Constructors also make sense because the initialization of a cache object may be expensive. Initializing list heads and spinlocks can take some code and that code can be omitted if objects have a definite state when they are free. We saw that when measuring the buffer_head constructors effect on performance. Hmm. SLOB proves that you don't need to segregate objects based on constructors, so you could combine even slabs that have constructors and just delay construction until allocation. I'm surprised constructors have measurable advantage.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: I agree. Crap too. We removed the destructors. The constructors are needed so that objects in slab pages always have a definite state. That is f.e. necessary for slab defragmentation because it has to be able to inspect an object at an arbitrary time and either remove it or move it to another slab page. Are you saying that the state of -freed- objects matters for your active defragmentation? That's odd. The state of the object immediately after it is allocated matters for a defrag solution. A kmalloc leads to an object in a undetermined state if you have no constructor. Code will then initialize the object but defrag f.e. must be able to inspect the object before. This means either that the freed object has a defined state or that kmalloc establishes that state before the object is marked as allocated. Constructors also make sense because the initialization of a cache object may be expensive. Initializing list heads and spinlocks can take some code and that code can be omitted if objects have a definite state when they are free. We saw that when measuring the buffer_head constructors effect on performance. Hmm. SLOB proves that you don't need to segregate objects based on constructors, so you could combine even slabs that have constructors and just delay construction until allocation. I'm surprised constructors have measurable advantage.. That is not working if you need to inspect allocated objects at any time for a defrag solution. All objects in a defragmentable slab need to have a consistent object state if allocated. If you have some without constructors then these object have no defined state and may contain arbitrary bytes. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Matt Mackall wrote: Well, I think we'd still have the same page size, in the sense that we'd have a struct page for every hardware page and we'd still have hardware page-sized pages in the page cache. We'd just change how we allocated them. Right now we've got a stack that looks like: We would not change the hardware page. Cannot do that. But we would have preferential threadment for 64k and 2M pages in the page allocator? buddy / page allocator SL*B allocator kmalloc And we'd change that to: buddy allocator SL*B allocator page allocator / kmalloc So get_free_page() would still hand you back a hardware page, it would just do it through SL*B. Hmm Not sure what effect this would have. We already have the pcp's that have a similar effect. It would decrease listlock effect drastically for SLUB. Not sure what you're referring to here. Allocations in 64k chunks means 16 times less basic allocation blocks to manage for the slab allocators. So the metadata to be maintained by the allocators is reduces by that factor. SLUB only needs to touch the list_lock (in some situations like a free to a non cpu slab) if a block becomes completely empty or is going from fully allocated to partially allocated. The larger the block size the more objects are in a block and the less of these actions that need a per node lock are needed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Andi Kleen wrote: I did essentially that for my GBpages hugetlbfs patchkit. GB pages are already beyond MAX_ORDER and increasing MAX_ORDER didn't seem attractive because it would require aligning the zones all to 1GB which would quite nasty. I am very very interested in that work and I could not find it when I looked for it a couple of weeks back. Can you sent me a copy? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 Jan 2008, Linus Torvalds wrote: It's not even clear that a buddy allocator even for the high-order pages is at all the right choice. Almost nobody actually wants 64kB blocks, and the ones that *do* want bigger allocations tend to want *much* bigger ones, so it's quite possible that it could be worth it to have something like a three-level allocator: Excellent! I am definitely on board with this. - huge pages (superpages for those crazy db people) Just a simple linked list of these things is fine, we'd never care about coalescing large pages together anyway. - large pages (on the order of ~64kB) - with *perhaps* a buddy bitmap setup to try to coalesce back into huge-pages, but more likely just admitting that you'd need something like migration to ever get back a hugepage that got split into large-pages. So maybe a simple bitmap allocator per huge-page for large pages. Say you have a 4MB huge-page, and just a 64-bit free-bitmap per huge-page when you split it into large pages. - slab/slub/slob for anything else, and get_free_page() ends up being just a shorthand for saying naturally aligned kmalloc of size PAGE_SIZEorder and maybe it would all work out ok. Hmmm... a 3 level allocator? Basically we would have BASE_PAGE STANDARD_PAGE and HUGE_PAGE? We could simply extend the page allocator to have 3 pcp lists for these sizes and go from there? Thinking about the arches this would mean BASE_PAGE STANDARD_PAGE HUGE_PAGE x86_64 4k 64k 2M i3864k 16k 4M ia6416k 256k1G ? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
- huge pages (superpages for those crazy db people) Just a simple linked list of these things is fine, we'd never care about coalescing large pages together anyway. I did essentially that for my GBpages hugetlbfs patchkit. GB pages are already beyond MAX_ORDER and increasing MAX_ORDER didn't seem attractive because it would require aligning the zones all to 1GB which would quite nasty. So it just grabbed them out of bootmem early and managed them in a per node list. Not sure it's a good idea for 2MB pages though. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 10 January 2008 11:49:25 -0600, Matt Mackall wrote: b) grouping objects of the same -type- (not size) together should mean they have similar lifetimes and thereby keep fragmentation low (b) is known to be false, you just have to look at our dcache and icache pinning. (b) is half-true, actually. The grouping by lifetime makes a lot of sense. LogFS has a similar problem to slabs (only full segments are useful, a single object can pin the segment). And when I grouped my objects very roughly by their life expectency, the impact was *HUGE*! In both cases, you want slabs/segments that are either close to 100% full or close to 0% full. It matters a lot when you have to move objects around and I would bet it matters even more when you cannot move objects and the slab just remains pinned. So just because the type alone is a relatively bad heuristic for life expectency does not make the concept false. Bonwick was onto something. He just failed in picking a good heuristic. Quite likely spreading by type was even a bonus when slab was developed, because even such a crude heuristic is slightly better than completely randomized lifetimes. I've been meaning to split the dentry cache into 2-3 seperate ones for a while and kept spending my time elsewhere. But I remain convinced that this will make a measurable difference. Jörn -- Never argue with idiots - first they drag you down to their level, then they beat you with experience. -- unknown -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 00:43 +0200, Pekka J Enberg wrote: > Hi Matt, > > On Wed, 9 Jan 2008, Matt Mackall wrote: > > I kicked this around for a while, slept on it, and then came up with > > this little hack first thing this morning: > > > > > > slob: split free list by size > > > > [snip] > > > And the results are fairly miraculous, so please double-check them on > > your setup. The resulting statistics change to this: > > [snip] > > > So the average jumped by 714k from before the patch, became much more > > stable, and beat SLUB by 287k. There are also 7 perfectly filled pages > > now, up from 1 before. And we can't get a whole lot better than this: > > we're using 259 pages for 244 pages of actual data, so our total > > overhead is only 6%! For comparison, SLUB's using about 70 pages more > > for the same data, so its total overhead appears to be about 35%. > > Unfortunately I only see a slight improvement to SLOB (but it still gets > beaten by SLUB): > > [ the minimum, maximum, and average are captured from 10 individual runs ] > > Free (kB) Used (kB) > Total (kB) min max average min max average > SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 > SLOB (patched)2654823456 23708 23603.2 2840 3092 2944.8 > SLOB (vanilla)2654823472 23640 23579.6 2908 3076 2968.4 > SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 > SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 With your kernel config and my lguest+busybox setup, I get: SLUB: MemFree: 24208 kB MemFree: 24212 kB MemFree: 24212 kB MemFree: 24212 kB MemFree: 24216 kB MemFree: 24216 kB MemFree: 24220 kB MemFree: 24220 kB MemFree: 24224 kB MemFree: 24232 kB avg: 24217.2 SLOB with two lists: MemFree: 24204 kB MemFree: 24260 kB MemFree: 24260 kB MemFree: 24276 kB MemFree: 24288 kB MemFree: 24292 kB MemFree: 24312 kB MemFree: 24320 kB MemFree: 24336 kB MemFree: 24396 kB avg: 24294.4 Not sure why this result is so different from yours. Hacked this up to three lists to experiment and we now have: MemFree: 24348 kB MemFree: 24372 kB MemFree: 24372 kB MemFree: 24372 kB MemFree: 24372 kB MemFree: 24380 kB MemFree: 24384 kB MemFree: 24404 kB MemFree: 24404 kB MemFree: 24408 kB avg: 24344.4 Even the last version is still using about 250 pages of storage for 209 pages of data, so it's got about 20% overhead still. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 00:43 +0200, Pekka J Enberg wrote: > Hi Matt, > > On Wed, 9 Jan 2008, Matt Mackall wrote: > > I kicked this around for a while, slept on it, and then came up with > > this little hack first thing this morning: > > > > > > slob: split free list by size > > > > [snip] > > > And the results are fairly miraculous, so please double-check them on > > your setup. The resulting statistics change to this: > > [snip] > > > So the average jumped by 714k from before the patch, became much more > > stable, and beat SLUB by 287k. There are also 7 perfectly filled pages > > now, up from 1 before. And we can't get a whole lot better than this: > > we're using 259 pages for 244 pages of actual data, so our total > > overhead is only 6%! For comparison, SLUB's using about 70 pages more > > for the same data, so its total overhead appears to be about 35%. > > Unfortunately I only see a slight improvement to SLOB (but it still gets > beaten by SLUB): > > [ the minimum, maximum, and average are captured from 10 individual runs ] > > Free (kB) Used (kB) > Total (kB) min max average min max average > SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 > SLOB (patched)2654823456 23708 23603.2 2840 3092 2944.8 > SLOB (vanilla)2654823472 23640 23579.6 2908 3076 2968.4 > SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 > SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 Huh, that's a fairly negligible change on your system. Is that with or without the earlier patch? That doesn't appear to change much here. Guess I'll have to clean up my stats patch and send it to you. > What .config did you use? What kind of user-space do you have? (I am still > using the exact same configuration I described in the first mail.) I'm using lguest with my Thinkpad config (I'll send that separately), with busybox and init=doit like your setup. Was having trouble getting lguest going with your config, but I may have found the problem. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
Hi Matt, On Wed, 9 Jan 2008, Matt Mackall wrote: > I kicked this around for a while, slept on it, and then came up with > this little hack first thing this morning: > > > slob: split free list by size > [snip] > And the results are fairly miraculous, so please double-check them on > your setup. The resulting statistics change to this: [snip] > So the average jumped by 714k from before the patch, became much more > stable, and beat SLUB by 287k. There are also 7 perfectly filled pages > now, up from 1 before. And we can't get a whole lot better than this: > we're using 259 pages for 244 pages of actual data, so our total > overhead is only 6%! For comparison, SLUB's using about 70 pages more > for the same data, so its total overhead appears to be about 35%. Unfortunately I only see a slight improvement to SLOB (but it still gets beaten by SLUB): [ the minimum, maximum, and average are captured from 10 individual runs ] Free (kB) Used (kB) Total (kB) min max average min max average SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 SLOB (patched)2654823456 23708 23603.2 2840 3092 2944.8 SLOB (vanilla)2654823472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 What .config did you use? What kind of user-space do you have? (I am still using the exact same configuration I described in the first mail.) Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] greatly reduce SLOB external fragmentation
On Mon, 2008-01-07 at 20:06 +0200, Pekka J Enberg wrote: > Hi Matt, > > On Sun, 6 Jan 2008, Matt Mackall wrote: > > I don't have any particular "terrible" workloads for SLUB. But my > > attempts to simply boot with all three allocators to init=/bin/bash in, > > say, lguest show a fair margin for SLOB. > > Sorry, I once again have bad news ;-). I did some testing with > > lguest --block= 32 /boot/vmlinuz-2.6.24-rc6 root=/dev/vda > init=doit > > where rootfile is > > http://uml.nagafix.co.uk/BusyBox-1.5.0/BusyBox-1.5.0-x86-root_fs.bz2 > > and the "doit" script in the guest passed as init= is just > > #!/bin/sh > mount -t proc proc /proc > cat /proc/meminfo | grep MemTotal > cat /proc/meminfo | grep MemFree > cat /proc/meminfo | grep Slab > > and the results are: > > [ the minimum, maximum, and average are of captured from 10 individual runs ] > > Free (kB) Used (kB) > Total (kB) min max average min max average > SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 > SLOB 2654823472 23640 23579.6 2908 3076 2968.4 > SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 > SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 > > So it seems that on average SLUB uses 300 kilobytes *less memory* (!) (which > is > roughly 1% of total memory available) after boot than SLOB for my > configuration. > > One possible explanation is that the high internal fragmentation (space > allocated but not used) of SLUB kmalloc() only affects short-lived allocations > and thus does not show up in the more permanent memory footprint. Likewise, > it > could be that SLOB has higher external fragmentation (small blocks that are > unavailable for allocation) of which SLUB does not suffer from. Dunno, > haven't > investigated as my results are contradictory to yours. Yep, you we definitely onto something here. Here's what I got with 10 runs of SLUB on my setup: MemFree: 55780 kB MemFree: 55780 kB MemFree: 55780 kB MemFree: 55784 kB MemFree: 55788 kB MemFree: 55788 kB MemFree: 55788 kB MemFree: 55792 kB MemFree: 55796 kB MemFree: 55800 kB Avg: 55787.6 And with SLOB + my defrag fix: MemFree: 55236 kB MemFree: 55284 kB MemFree: 55292 kB MemFree: 55304 kB MemFree: 55384 kB MemFree: 55388 kB MemFree: 55412 kB MemFree: 55420 kB MemFree: 55436 kB MemFree: 55444 kB Avg: 55360.0 Ouch! So I added a bunch of statistics gathering: counted pages 409 unused 185242 biggest 1005 fragments 1416 slob pages 410 allocs 22528 frees 12109 active 10419 allocated 932647 page scans 11249 block scans 40650 kmallocs 10247 active 5450 allocated 3484680 overhead 48836 bigpages 827 active 17 total 427 used 245 The first line tells us about SLOB's free list, which has 409 pages, 185k unused, spread into 1416 fragments. The average fragment is 130 bytes. The next tells us that we've got 410 total SLOB pages (1 is fully allocated), we've done 22k allocs, 12k frees, have 10k allocations active, and 932k total memory allocated (including kmallocs). That means our average SLOB allocation is ~90 bytes. The kmallocs line tells us we've done 10k allocs, and 5k of them are not yet freed. Since boot, we requested 3.48MiB of kmalloc (without padding) and added on 49k of padding. Thus the average kmalloc is 340 bytes and has 4.77 bytes of padding (1.2% overhead, quite good!). SLAB and kmalloc objects => 4k are handed straight to the page allocator (same as SLUB), of which there are 17 active pages. So in total, SLOB is using 427 pages for what optimally could fit in 245 pages. In other words, external fragmentation is horrible. I kicked this around for a while, slept on it, and then came up with this little hack first thing this morning: slob: split free list by size diff -r 6901ca355181 mm/slob.c --- a/mm/slob.c Tue Jan 08 21:01:15 2008 -0600 +++ b/mm/slob.c Wed Jan 09 12:31:59 2008 -0600 @@ -112,7 +112,9 @@ static inline void free_slob_page(struct /* * All (partially) free slob pages go on this list. */ -static LIST_HEAD(free_slob_pages); +#define SLOB_BREAK_POINT 300 +static LIST_HEAD(free_slob_pages_big); +static LIST_HEAD(free_slob_pages_small); /* * slob_page: True for all slob pages (false for bigblock pages) @@ -140,9 +142,9 @@ static inline int slob_page_free(struct return test_bit(PG_private, >flags); } -static inline void set_slob_page_free(struct slob_page *sp) +static inline void set_slob_page_free(struct slob_page *sp, struct list_head *list) { - list_add(>list, _slob_pages); + list_add(>list, list); __set_bit(PG_private, >flags); } @@ -294,12 +296,18 @@ static void *slob_alloc(size_t size, gfp {
[RFC PATCH] greatly reduce SLOB external fragmentation
On Mon, 2008-01-07 at 20:06 +0200, Pekka J Enberg wrote: Hi Matt, On Sun, 6 Jan 2008, Matt Mackall wrote: I don't have any particular terrible workloads for SLUB. But my attempts to simply boot with all three allocators to init=/bin/bash in, say, lguest show a fair margin for SLOB. Sorry, I once again have bad news ;-). I did some testing with lguest --block=rootfile 32 /boot/vmlinuz-2.6.24-rc6 root=/dev/vda init=doit where rootfile is http://uml.nagafix.co.uk/BusyBox-1.5.0/BusyBox-1.5.0-x86-root_fs.bz2 and the doit script in the guest passed as init= is just #!/bin/sh mount -t proc proc /proc cat /proc/meminfo | grep MemTotal cat /proc/meminfo | grep MemFree cat /proc/meminfo | grep Slab and the results are: [ the minimum, maximum, and average are of captured from 10 individual runs ] Free (kB) Used (kB) Total (kB) min max average min max average SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 SLOB 2654823472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 So it seems that on average SLUB uses 300 kilobytes *less memory* (!) (which is roughly 1% of total memory available) after boot than SLOB for my configuration. One possible explanation is that the high internal fragmentation (space allocated but not used) of SLUB kmalloc() only affects short-lived allocations and thus does not show up in the more permanent memory footprint. Likewise, it could be that SLOB has higher external fragmentation (small blocks that are unavailable for allocation) of which SLUB does not suffer from. Dunno, haven't investigated as my results are contradictory to yours. Yep, you we definitely onto something here. Here's what I got with 10 runs of SLUB on my setup: MemFree: 55780 kB MemFree: 55780 kB MemFree: 55780 kB MemFree: 55784 kB MemFree: 55788 kB MemFree: 55788 kB MemFree: 55788 kB MemFree: 55792 kB MemFree: 55796 kB MemFree: 55800 kB Avg: 55787.6 And with SLOB + my defrag fix: MemFree: 55236 kB MemFree: 55284 kB MemFree: 55292 kB MemFree: 55304 kB MemFree: 55384 kB MemFree: 55388 kB MemFree: 55412 kB MemFree: 55420 kB MemFree: 55436 kB MemFree: 55444 kB Avg: 55360.0 Ouch! So I added a bunch of statistics gathering: counted pages 409 unused 185242 biggest 1005 fragments 1416 slob pages 410 allocs 22528 frees 12109 active 10419 allocated 932647 page scans 11249 block scans 40650 kmallocs 10247 active 5450 allocated 3484680 overhead 48836 bigpages 827 active 17 total 427 used 245 The first line tells us about SLOB's free list, which has 409 pages, 185k unused, spread into 1416 fragments. The average fragment is 130 bytes. The next tells us that we've got 410 total SLOB pages (1 is fully allocated), we've done 22k allocs, 12k frees, have 10k allocations active, and 932k total memory allocated (including kmallocs). That means our average SLOB allocation is ~90 bytes. The kmallocs line tells us we've done 10k allocs, and 5k of them are not yet freed. Since boot, we requested 3.48MiB of kmalloc (without padding) and added on 49k of padding. Thus the average kmalloc is 340 bytes and has 4.77 bytes of padding (1.2% overhead, quite good!). SLAB and kmalloc objects = 4k are handed straight to the page allocator (same as SLUB), of which there are 17 active pages. So in total, SLOB is using 427 pages for what optimally could fit in 245 pages. In other words, external fragmentation is horrible. I kicked this around for a while, slept on it, and then came up with this little hack first thing this morning: slob: split free list by size diff -r 6901ca355181 mm/slob.c --- a/mm/slob.c Tue Jan 08 21:01:15 2008 -0600 +++ b/mm/slob.c Wed Jan 09 12:31:59 2008 -0600 @@ -112,7 +112,9 @@ static inline void free_slob_page(struct /* * All (partially) free slob pages go on this list. */ -static LIST_HEAD(free_slob_pages); +#define SLOB_BREAK_POINT 300 +static LIST_HEAD(free_slob_pages_big); +static LIST_HEAD(free_slob_pages_small); /* * slob_page: True for all slob pages (false for bigblock pages) @@ -140,9 +142,9 @@ static inline int slob_page_free(struct return test_bit(PG_private, sp-flags); } -static inline void set_slob_page_free(struct slob_page *sp) +static inline void set_slob_page_free(struct slob_page *sp, struct list_head *list) { - list_add(sp-list, free_slob_pages); + list_add(sp-list, list); __set_bit(PG_private, sp-flags); } @@ -294,12 +296,18 @@ static void *slob_alloc(size_t size, gfp { struct slob_page *sp; struct
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
Hi Matt, On Wed, 9 Jan 2008, Matt Mackall wrote: I kicked this around for a while, slept on it, and then came up with this little hack first thing this morning: slob: split free list by size [snip] And the results are fairly miraculous, so please double-check them on your setup. The resulting statistics change to this: [snip] So the average jumped by 714k from before the patch, became much more stable, and beat SLUB by 287k. There are also 7 perfectly filled pages now, up from 1 before. And we can't get a whole lot better than this: we're using 259 pages for 244 pages of actual data, so our total overhead is only 6%! For comparison, SLUB's using about 70 pages more for the same data, so its total overhead appears to be about 35%. Unfortunately I only see a slight improvement to SLOB (but it still gets beaten by SLUB): [ the minimum, maximum, and average are captured from 10 individual runs ] Free (kB) Used (kB) Total (kB) min max average min max average SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 SLOB (patched)2654823456 23708 23603.2 2840 3092 2944.8 SLOB (vanilla)2654823472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 What .config did you use? What kind of user-space do you have? (I am still using the exact same configuration I described in the first mail.) Pekka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 00:43 +0200, Pekka J Enberg wrote: Hi Matt, On Wed, 9 Jan 2008, Matt Mackall wrote: I kicked this around for a while, slept on it, and then came up with this little hack first thing this morning: slob: split free list by size [snip] And the results are fairly miraculous, so please double-check them on your setup. The resulting statistics change to this: [snip] So the average jumped by 714k from before the patch, became much more stable, and beat SLUB by 287k. There are also 7 perfectly filled pages now, up from 1 before. And we can't get a whole lot better than this: we're using 259 pages for 244 pages of actual data, so our total overhead is only 6%! For comparison, SLUB's using about 70 pages more for the same data, so its total overhead appears to be about 35%. Unfortunately I only see a slight improvement to SLOB (but it still gets beaten by SLUB): [ the minimum, maximum, and average are captured from 10 individual runs ] Free (kB) Used (kB) Total (kB) min max average min max average SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 SLOB (patched)2654823456 23708 23603.2 2840 3092 2944.8 SLOB (vanilla)2654823472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 Huh, that's a fairly negligible change on your system. Is that with or without the earlier patch? That doesn't appear to change much here. Guess I'll have to clean up my stats patch and send it to you. What .config did you use? What kind of user-space do you have? (I am still using the exact same configuration I described in the first mail.) I'm using lguest with my Thinkpad config (I'll send that separately), with busybox and init=doit like your setup. Was having trouble getting lguest going with your config, but I may have found the problem. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] greatly reduce SLOB external fragmentation
On Thu, 2008-01-10 at 00:43 +0200, Pekka J Enberg wrote: Hi Matt, On Wed, 9 Jan 2008, Matt Mackall wrote: I kicked this around for a while, slept on it, and then came up with this little hack first thing this morning: slob: split free list by size [snip] And the results are fairly miraculous, so please double-check them on your setup. The resulting statistics change to this: [snip] So the average jumped by 714k from before the patch, became much more stable, and beat SLUB by 287k. There are also 7 perfectly filled pages now, up from 1 before. And we can't get a whole lot better than this: we're using 259 pages for 244 pages of actual data, so our total overhead is only 6%! For comparison, SLUB's using about 70 pages more for the same data, so its total overhead appears to be about 35%. Unfortunately I only see a slight improvement to SLOB (but it still gets beaten by SLUB): [ the minimum, maximum, and average are captured from 10 individual runs ] Free (kB) Used (kB) Total (kB) min max average min max average SLUB (no debug) 2653623868 23892 23877.6 2644 2668 2658.4 SLOB (patched)2654823456 23708 23603.2 2840 3092 2944.8 SLOB (vanilla)2654823472 23640 23579.6 2908 3076 2968.4 SLAB (no debug) 2654423316 23364 23343.2 3180 3228 3200.8 SLUB (with debug) 2648423120 23136 23127.2 3348 3364 3356.8 With your kernel config and my lguest+busybox setup, I get: SLUB: MemFree: 24208 kB MemFree: 24212 kB MemFree: 24212 kB MemFree: 24212 kB MemFree: 24216 kB MemFree: 24216 kB MemFree: 24220 kB MemFree: 24220 kB MemFree: 24224 kB MemFree: 24232 kB avg: 24217.2 SLOB with two lists: MemFree: 24204 kB MemFree: 24260 kB MemFree: 24260 kB MemFree: 24276 kB MemFree: 24288 kB MemFree: 24292 kB MemFree: 24312 kB MemFree: 24320 kB MemFree: 24336 kB MemFree: 24396 kB avg: 24294.4 Not sure why this result is so different from yours. Hacked this up to three lists to experiment and we now have: MemFree: 24348 kB MemFree: 24372 kB MemFree: 24372 kB MemFree: 24372 kB MemFree: 24372 kB MemFree: 24380 kB MemFree: 24384 kB MemFree: 24404 kB MemFree: 24404 kB MemFree: 24408 kB avg: 24344.4 Even the last version is still using about 250 pages of storage for 209 pages of data, so it's got about 20% overhead still. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/