Re: [dm-devel] [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-27 Thread Christopher Lameter
On Thu, 26 Apr 2018, Mikulas Patocka wrote: > > Hmmm... order 4 for these caches may cause some concern. These should stay > > under costly order I think. Otherwise allocations are no longer > > guaranteed. > > You said that slub has fallback to smaller order allocations. Yes it does... > The

Re: [dm-devel] [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-26 Thread Christopher Lameter
On Wed, 25 Apr 2018, Mikulas Patocka wrote: > > > > Could yo move that logic into slab_order()? It does something awfully > > similar. > > But slab_order (and its caller) limits the order to "max_order" and we > want more. > > Perhaps slab_order should be dropped and calculate_order totally >

Re: [dm-devel] [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-26 Thread Christopher Lameter
On Wed, 25 Apr 2018, Mikulas Patocka wrote: > Do you want this? It deletes slab_order and replaces it with the > "minimize_waste" logic directly. Well yes that looks better. Now we need to make it easy to read and less complicated. Maybe try to keep as much as possible of the old code and also

Re: [dm-devel] [PATCH] SLUB: Do not fallback to mininum order if __GFP_NORETRY is set

2018-04-18 Thread Christopher Lameter
On Wed, 18 Apr 2018, Mikulas Patocka wrote: > No, this would hit NULL pointer dereference if page is NULL and > __GFP_NORETRY is set. You want this: You are right Acked-by: Christoph Lameter -- dm-devel mailing list dm-devel@redhat.com

[dm-devel] [PATCH] SLUB: Do not fallback to mininum order if __GFP_NORETRY is set

2018-04-18 Thread Christopher Lameter
Mikulas Patoka wants to ensure that no fallback to lower order happens. I think __GFP_NORETRY should work correctly in that case too and not fall back. Allocating at a smaller order is a retry operation and should not be attempted. If the caller does not want retries then respect that.

Re: [dm-devel] [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-18 Thread Christopher Lameter
On Tue, 17 Apr 2018, Mikulas Patocka wrote: > I can make a slub-only patch with no extra flag (on a freshly booted > system it increases only the order of caches "TCPv6" and "sighand_cache" > by one - so it should not have unexpected effects): > > Doing a generic solution for slab would be more

Re: [dm-devel] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-17 Thread Christopher Lameter
On Mon, 16 Apr 2018, Vlastimil Babka wrote: > >> Its not a senseless increase. The more objects you fit into a slab page > >> the higher the performance of the allocator. > > It's not universally without a cost. It might increase internal > fragmentation of the slabs, if you end up with lots of

Re: [dm-devel] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-17 Thread Christopher Lameter
On Mon, 16 Apr 2018, Mikulas Patocka wrote: > dm-bufio deals gracefully with allocation failure, because it preallocates > some buffers with vmalloc, but other subsystems may not deal with it and > they cound return ENOMEM randomly or misbehave in other ways. So, the > "SLAB_MINIMIZE_WASTE" flag

Re: [dm-devel] [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-17 Thread Christopher Lameter
On Mon, 16 Apr 2018, Mikulas Patocka wrote: > This patch introduces a flag SLAB_MINIMIZE_WASTE for slab and slub. This > flag causes allocation of larger slab caches in order to minimize wasted > space. > > This is needed because we want to use dm-bufio for deduplication index and > there are

Re: [dm-devel] [PATCH RESEND] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-17 Thread Christopher Lameter
On Tue, 17 Apr 2018, Vlastimil Babka wrote: > On 04/17/2018 04:45 PM, Christopher Lameter wrote: > > But then higher order allocs are generally seen as problematic. > > I think in this case they are better than wasting/fragmenting 384kB for > 640kB object. Well typically

Re: [dm-devel] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-16 Thread Christopher Lameter
On Mon, 16 Apr 2018, Mikulas Patocka wrote: > > > > Or an increase in slab_max_order > > But that will increase it for all slabs (often senselessly - i.e. > kmalloc-4096 would have order 4MB). 4MB? Nope That is a power of two slab so no wasted space even with order 0. Its not a senseless

Re: [dm-devel] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-16 Thread Christopher Lameter
On Mon, 16 Apr 2018, Mikulas Patocka wrote: > > Please clarify further, thanks! > > Mike > > Yes, using a slab cache currently doesn't avoid this rouding (it needs the > SLAB_MINIMIZE_WASTE patch to do that). Or an increase in slab_max_order -- dm-devel mailing list dm-devel@redhat.com

Re: [dm-devel] [PATCH] SLUB: Do not fallback to mininum order if __GFP_NORETRY is set

2018-04-20 Thread Christopher Lameter
On Thu, 19 Apr 2018, Michal Hocko wrote: > Overriding __GFP_NORETRY is just a bad idea. It will make the semantic > of the flag just more confusing. Note there are users who use > __GFP_NORETRY as a way to suppress heavy memory pressure and/or the OOM > killer. You do not want to change the

Re: [dm-devel] [PATCH] SLUB: Do not fallback to mininum order if __GFP_NORETRY is set

2018-04-23 Thread Christopher Lameter
On Sat, 21 Apr 2018, Vlastimil Babka wrote: > > The problem is that SLUB does not honor GFP_NORETRY. The semantics of > > GFP_NORETRY are not followed. > > The caller might want SLUB to try hard to get that high-order page that > will minimize memory waste (e.g. 2MB page for 3 640k objects), and

Re: [dm-devel] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-17 Thread Christopher Lameter
On Mon, 16 Apr 2018, Mikulas Patocka wrote: > If you boot with slub_max_order=10, the kmalloc-8192 cache has 64 pages. > So yes, it increases the order of all slab caches (although not up to > 4MB). Hmmm... Ok. There is another setting slub_min_objects that controls how many objects to fit into

Re: [dm-devel] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-04-17 Thread Christopher Lameter
On Tue, 17 Apr 2018, Mikulas Patocka wrote: > > > The slub subsystem does actual fallback to low-order when the allocation > > > fails (it allows different order for each slab in the same cache), but > > > slab doesn't fallback and you get NULL if higher-order allocation fails. > > > So,

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-20 Thread Christopher Lameter
On Tue, 20 Mar 2018, Mikulas Patocka wrote: > > Maybe do the same thing for SLAB? > > Yes, but I need to change it for a specific cache, not for all caches. Why only some caches? > When the order is greater than 3 (PAGE_ALLOC_COSTLY_ORDER), the allocation > becomes unreliable, thus it is a bad

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Wed, 21 Mar 2018, Matthew Wilcox wrote: > > Have a look at include/linux/mempool.h. > > That's not what mempool is for. mempool is a cache of elements that were > allocated from slab in the first place. (OK, technically, you don't have > to use slab as the allocator, but since there is no

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Wed, 21 Mar 2018, Mikulas Patocka wrote: > For example, if someone creates a slab cache with the flag SLAB_CACHE_DMA, > and he allocates an object from this cache and this allocation races with > the user writing to /sys/kernel/slab/cache/order - then the allocator can > for a small period of

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
One other thought: If you want to improve the behavior for large scale objects allocated through kmalloc/kmemcache then we would certainly be glad to entertain those ideas. F.e. you could optimize the allcations > 2x PAGE_SIZE so that they do not allocate powers of two pages. It would be

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Wed, 21 Mar 2018, Matthew Wilcox wrote: > I don't know if that's a good idea. That will contribute to fragmentation > if the allocation is held onto for a short-to-medium length of time. > If the allocation is for a very long period of time then those pages > would have been unavailable

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Wed, 21 Mar 2018, Mikulas Patocka wrote: > > You should not be using the slab allocators for these. Allocate higher > > order pages or numbers of consecutive smaller pagess from the page > > allocator. The slab allocators are written for objects smaller than page > > size. > > So, do you argue

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Wed, 21 Mar 2018, Mikulas Patocka wrote: > > > F.e. you could optimize the allcations > 2x PAGE_SIZE so that they do not > > > allocate powers of two pages. It would be relatively easy to make > > > kmalloc_large round the allocation to the next page size and then allocate > > > N consecutive

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Wed, 21 Mar 2018, Mikulas Patocka wrote: > So, what would you recommend for allocating 640KB objects while minimizing > wasted space? > * alloc_pages - rounds up to the next power of two > * kmalloc - rounds up to the next power of two > * alloc_pages_exact - O(n*log n) complexity; and causes

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-23 Thread Christopher Lameter
On Wed, 21 Mar 2018, Mikulas Patocka wrote: > > + s->allocflags = allocflags; > > I'd also use "WRITE_ONCE(s->allocflags, allocflags)" here and when writing > s->oo and s->min to avoid some possible compiler misoptimizations. It only matters that 0 etc is never written. > Another problem is

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-20 Thread Christopher Lameter
On Tue, 20 Mar 2018, Matthew Wilcox wrote: > On Tue, Mar 20, 2018 at 01:25:09PM -0400, Mikulas Patocka wrote: > > The reason why we need this is that we are going to merge code that does > > block device deduplication (it was developed separatedly and sold as a > > commercial product), and the

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-21 Thread Christopher Lameter
On Tue, 20 Mar 2018, Mikulas Patocka wrote: > > > Another problem with slub_max_order is that it would pad all caches to > > > slub_max_order, even those that already have a power-of-two size (in that > > > case, the padding is counterproductive). > > > > No it does not. Slub will calculate the

Re: [dm-devel] [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE

2018-03-23 Thread Christopher Lameter
On Fri, 23 Mar 2018, Mikulas Patocka wrote: > This test isn't locked against anything, so it may race with concurrent > allocation. "any_slab_objects" may return false and a new object in the > slab cache may appear immediatelly after that. Ok the same reasoning applies to numerous other slab