Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-28 Thread Nick Piggin
On Thursday 20 September 2007 11:38, David Chinner wrote: On Wed, Sep 19, 2007 at 04:04:30PM +0200, Andrea Arcangeli wrote: Plus of course you don't like fsblock because it requires work to adapt a fs to it, I can't argue about that. No, I don't like fsblock because it is inherently a

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-24 Thread Kyle Moffett
On Sep 23, 2007, at 02:22:12, Goswin von Brederlow wrote: [EMAIL PROTECTED] (Mel Gorman) writes: On (16/09/07 23:58), Goswin von Brederlow didst pronounce: But when you already have say 10% of the ram in mixed groups then it is a sign the external fragmentation happens and some time should

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-24 Thread Andrea Arcangeli
On Sun, Sep 23, 2007 at 08:56:39AM +0200, Goswin von Brederlow wrote: As a user I know it because I didn't put a kernel source into /tmp. A programm can't reasonably know that. Various apps requires you (admin/user) to tune the size of their caches. Seems like you never tried to setup a

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-24 Thread Christoph Lameter
On Fri, 21 Sep 2007, Hugh Dickins wrote: I've found some fixes needed on top of your Large Blocksize Support patches: I'll send those to you in a moment. Looks like you didn't try much swapping! yup. Thanks for looking at it. I only managed to get ext2 working with larger blocksizes:

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-23 Thread Goswin von Brederlow
[EMAIL PROTECTED] (Mel Gorman) writes: On (17/09/07 00:38), Goswin von Brederlow didst pronounce: [EMAIL PROTECTED] (Mel Gorman) writes: On (15/09/07 02:31), Goswin von Brederlow didst pronounce: Mel Gorman [EMAIL PROTECTED] writes: Looking at my little test program evicting movable

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-23 Thread Goswin von Brederlow
Andrea Arcangeli [EMAIL PROTECTED] writes: On Mon, Sep 17, 2007 at 12:56:07AM +0200, Goswin von Brederlow wrote: When has free ever given any usefull free number? I can perfectly fine allocate another gigabyte of memory despide free saing 25MB. But that is because I know that the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-22 Thread Goswin von Brederlow
[EMAIL PROTECTED] (Mel Gorman) writes: On (16/09/07 23:31), Andrea Arcangeli didst pronounce: On Sun, Sep 16, 2007 at 09:54:18PM +0100, Mel Gorman wrote: Allocating ptes from slab is fairly simple but I think it would be better to allocate ptes in PAGE_SIZE (64k) chunks and preallocate the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-21 Thread Hugh Dickins
On Thu, 20 Sep 2007, Christoph Lameter wrote: On Thu, 20 Sep 2007, David Chinner wrote: Disagree, the mmap side is not a little change. That's not in the filesystem, though. ;) And its really only a minimal change for some function to loop over all 4k pages and elsewhere index the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-20 Thread Andrea Arcangeli
On Thu, Sep 20, 2007 at 11:38:21AM +1000, David Chinner wrote: Sure, and that's what I meant when I said VPC + large pages was a means to the end, not the only solution to the problem. The whole point is that it's not an end, it's an end to your own fs centric view only (which is sure fair

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-19 Thread Alex Tomas
On 9/19/07, David Chinner [EMAIL PROTECTED] wrote: The problem is this: to alter the fundamental block size of the filesystem we also need to alter the data block size and that is exactly the piece that linux does not support right now. So while we have the capability to use large block sizes

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-19 Thread Nick Piggin
On Wednesday 19 September 2007 04:30, Linus Torvalds wrote: On Tue, 18 Sep 2007, Nick Piggin wrote: ROFL! Yeah of course, how could I have forgotten about our trusty OOM killer as the solution to the fragmentation problem? It would only have been funnier if you had said to reboot every so

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-19 Thread Andrea Arcangeli
On Wed, Sep 19, 2007 at 03:09:10PM +1000, David Chinner wrote: Ok, let's step back for a moment and look at a basic, fundamental constraint of disks - seek capacity. A decade ago, a terabyte of filesystem had 30 disks behind it - a seek capacity of about 6000 seeks/s. Nowdays, that's a single

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-19 Thread David Chinner
On Wed, Sep 19, 2007 at 04:04:30PM +0200, Andrea Arcangeli wrote: On Wed, Sep 19, 2007 at 03:09:10PM +1000, David Chinner wrote: Ok, let's step back for a moment and look at a basic, fundamental constraint of disks - seek capacity. A decade ago, a terabyte of filesystem had 30 disks behind

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Mel Gorman
On (17/09/07 15:00), Christoph Lameter didst pronounce: On Sun, 16 Sep 2007, Nick Piggin wrote: I don't know how it would prevent fragmentation from building up anyway. It's commonly the case that potentially unmovable objects are allowed to fill up all of ram (dentries, inodes, etc).

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Jörn Engel
On Tue, 18 September 2007 11:00:40 +0100, Mel Gorman wrote: We still lack data on what sort of workloads really benefit from large blocks Compressing filesystems like jffs2 and logfs gain better compression ratio with larger blocks. Going from 4KiB to 64KiB gave somewhere around 10% benefit

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread David Chinner
On Tue, Sep 18, 2007 at 11:00:40AM +0100, Mel Gorman wrote: We still lack data on what sort of workloads really benefit from large blocks (assuming there are any that cannot also be solved by improving order-0). No we don't. All workloads benefit from larger block sizes when you've got a btree

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:00, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: I don't know how it would prevent fragmentation from building up anyway. It's commonly the case that potentially unmovable objects are allowed to fill up all of ram (dentries, inodes, etc).

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:21, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: So if you argue that vmap is a downside, then please tell me how you consider the -ENOMEM of your approach to be better? That is again pretty undifferentiated. Are we talking about

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:05, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: fsblock doesn't need any of those hacks, of course. Nor does mine for the low orders that we are considering. For order MAX_ORDER this is unavoidable since the page allocator cannot

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Tue, 18 Sep 2007, Nick Piggin wrote: ROFL! Yeah of course, how could I have forgotten about our trusty OOM killer as the solution to the fragmentation problem? It would only have been funnier if you had said to reboot every so often when memory gets fragmented :) Can we please stop this

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Andrea Arcangeli
On Tue, Sep 18, 2007 at 11:30:17AM -0700, Linus Torvalds wrote: The fact is, *none* of those things are true. The VM doesn't guarantee anything, and is already very much about statistics in many places. You Many? I can't recall anything besides PF_MEMALLOC and the decision that the VM is oom.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Andrea Arcangeli
On Mon, Sep 17, 2007 at 12:56:07AM +0200, Goswin von Brederlow wrote: When has free ever given any usefull free number? I can perfectly fine allocate another gigabyte of memory despide free saing 25MB. But that is because I know that the buffer/cached are not locked in. Well, as you said you

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Tue, 18 Sep 2007, Andrea Arcangeli wrote: Many? I can't recall anything besides PF_MEMALLOC and the decision that the VM is oom. *All* of the buddy bitmaps, *all* of the GPF_ATOMIC, *all* of the zone watermarks, everything that we depend on every single day, is in the end just about

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Christoph Lameter
On Tue, 18 Sep 2007, Nick Piggin wrote: On Tuesday 18 September 2007 08:00, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: I don't know how it would prevent fragmentation from building up anyway. It's commonly the case that potentially unmovable objects are allowed

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Christoph Lameter
On Tue, 18 Sep 2007, Nick Piggin wrote: We can avoid all doubt in this patchset as well by adding support for fallback to a vmalloced compound page. How would you do a vmapped fallback in your patchset? How would you keep track of pages 2..N if they don't exist in the radix tree? Through

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Wed, 19 Sep 2007, Nathan Scott wrote: FWIW (and I hate to let reality get in the way of a good conspiracy) - all SGI systems have always defaulted to using 4K blocksize filesystems; Yes. And I've been told that: there's very few customers who would use larger .. who apparently would

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nathan Scott
On Tue, 2007-09-18 at 12:44 -0700, Linus Torvalds wrote: This is not about performance. Never has been. It's about SGI wanting a way out of their current 16kB mess. Pass the crack pipe, Linus? The way to fix performance is to move to x86-64, and use 4kB pages and be happy. However, the SGI

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nathan Scott
On Tue, 2007-09-18 at 18:06 -0700, Linus Torvalds wrote: There is *no* valid reason for 16kB blocksizes unless you have legacy issues. That's not correct. The performance issues have nothing to do with the block-size, and We must be thinking of different performance issues. should be

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Wed, 19 Sep 2007, Rene Herman wrote: Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast machinery. Why would you care for the overhead af _small_ blocks? ..

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman
On 09/18/2007 09:44 PM, Linus Torvalds wrote: Nobody sane would *ever* argue for 16kB+ blocksizes in general. Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman
On 09/19/2007 05:50 AM, Linus Torvalds wrote: On Wed, 19 Sep 2007, Rene Herman wrote: Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast machinery. Why would you

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds
On Wed, 19 Sep 2007, Rene Herman wrote: I do feel larger blocksizes continue to make sense in general though. Packet writing on CD/DVD is a problem already today since the hardware needs 32K or 64K blocks and I'd expect to see more of these and similiar situations when flash gets (even)

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman
On 09/19/2007 06:33 AM, Linus Torvalds wrote: On Wed, 19 Sep 2007, Rene Herman wrote: I do feel larger blocksizes continue to make sense in general though. Packet writing on CD/DVD is a problem already today since the hardware needs 32K or 64K blocks and I'd expect to see more of these and

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread David Chinner
On Tue, Sep 18, 2007 at 06:06:52PM -0700, Linus Torvalds wrote: especially as the Linux kernel limitations in this area are well known. There's no 16K mess that SGI is trying to clean up here (and SGI have offered both IA64 and x86_64 systems

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Mel Gorman
On (17/09/07 00:38), Goswin von Brederlow didst pronounce: [EMAIL PROTECTED] (Mel Gorman) writes: On (15/09/07 02:31), Goswin von Brederlow didst pronounce: Mel Gorman [EMAIL PROTECTED] writes: On Fri, 2007-09-14 at 18:10 +0200, Goswin von Brederlow wrote: Nick Piggin [EMAIL

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Mel Gorman
On (17/09/07 00:48), Goswin von Brederlow didst pronounce: [EMAIL PROTECTED] (Mel Gorman) writes: On (16/09/07 17:08), Andrea Arcangeli didst pronounce: zooming in I see red pixels all over the squares mized with green pixels in the same square. This is exactly what happens with the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Mel Gorman
On (16/09/07 23:58), Goswin von Brederlow didst pronounce: [EMAIL PROTECTED] (Mel Gorman) writes: On (15/09/07 14:14), Goswin von Brederlow didst pronounce: Andrew Morton [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Mel Gorman
On (16/09/07 23:31), Andrea Arcangeli didst pronounce: On Sun, Sep 16, 2007 at 09:54:18PM +0100, Mel Gorman wrote: The 16MB is the size of a hugepage, the size of interest as far as I am concerned. Your idea makes sense for large block support, but much less for huge pages because you are

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Bernd Schmidt
Christoph Lameter wrote: True. That is why we want to limit the number of unmovable allocations and that is why ZONE_MOVABLE exists to limit those. However, unmovable allocations are already rare today. The overwhelming majority of allocations are movable and reclaimable. You can see that f.e.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Saturday 15 September 2007 03:52, Christoph Lameter wrote: On Fri, 14 Sep 2007, Nick Piggin wrote: [*] ok, this isn't quite true because if you can actually put a hard limit on unmovable allocations then anti-frag will fundamentally help -- get back to me on that when you get

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Saturday 15 September 2007 04:08, Christoph Lameter wrote: On Fri, 14 Sep 2007, Nick Piggin wrote: However fsblock can do everything that higher order pagecache can do in terms of avoiding vmap and giving contiguous memory to block devices by opportunistically allocating higher orders of

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Monday 17 September 2007 04:13, Mel Gorman wrote: On (15/09/07 14:14), Goswin von Brederlow didst pronounce: I keep coming back to the fact that movable objects should be moved out of the way for unmovable ones. Anything else just allows fragmentation to build up. This is easily

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Monday 17 September 2007 14:07, David Chinner wrote: On Fri, Sep 14, 2007 at 06:48:55AM +1000, Nick Piggin wrote: OK, the vunmap batching code wipes your TLB flushing and IPIs off the table. Diffstat below, but the TLB portions are here (besides that _everything_ is probably lower due

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Nick Piggin wrote: I don't know how it would prevent fragmentation from building up anyway. It's commonly the case that potentially unmovable objects are allowed to fill up all of ram (dentries, inodes, etc). Not in 2.6.23 with ZONE_MOVABLE. Unmovable objects are not

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Jörn Engel wrote: I bet! My (false) assumption was the same as Goswin's. If non-movable pages are clearly seperated from movable ones and will evict movable ones before polluting further mixed superpages, Nick's scenario would be nearly infinitely impossible.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Nick Piggin wrote: fsblock doesn't need any of those hacks, of course. Nor does mine for the low orders that we are considering. For order MAX_ORDER this is unavoidable since the page allocator cannot manage such large pages. It can be used for lower order if

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Mon, 17 Sep 2007, Bernd Schmidt wrote: Christoph Lameter wrote: True. That is why we want to limit the number of unmovable allocations and that is why ZONE_MOVABLE exists to limit those. However, unmovable allocations are already rare today. The overwhelming majority of allocations

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Nick Piggin wrote: So if you argue that vmap is a downside, then please tell me how you consider the -ENOMEM of your approach to be better? That is again pretty undifferentiated. Are we talking about low page In general. There is no -ENOMEM approach. Lower order

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
Andrea Arcangeli [EMAIL PROTECTED] writes: On Sat, Sep 15, 2007 at 10:14:44PM +0200, Goswin von Brederlow wrote: - Userspace allocates a lot of memory in those slabs. If with slabs you mean slab/slub, I can't follow, there has never been a single byte of userland memory allocated there since

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Andrea Arcangeli
On Sun, Sep 16, 2007 at 03:54:56PM +0200, Goswin von Brederlow wrote: Andrea Arcangeli [EMAIL PROTECTED] writes: On Sat, Sep 15, 2007 at 10:14:44PM +0200, Goswin von Brederlow wrote: - Userspace allocates a lot of memory in those slabs. If with slabs you mean slab/slub, I can't follow,

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Jörn Engel
On Sun, 16 September 2007 00:30:32 +0200, Andrea Arcangeli wrote: Movable? I rather assume all slab allocations aren't movable. Then slab defrag can try to tackle on users like dcache and inodes. Keep in mind that with the exception of updatedb, those inodes/dentries will be pinned and you

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Jörn Engel
On Sat, 15 September 2007 01:44:49 -0700, Andrew Morton wrote: On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I agree with your concern, those numbers are quite silly. The chances of 99.8% of pages being free and the remaining 0.2% being perfectly spread

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Mel Gorman
On (15/09/07 14:14), Goswin von Brederlow didst pronounce: Andrew Morton [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I agree with your concern, those numbers are quite silly. The chances of 99.8% of pages being free and the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Mel Gorman
On (15/09/07 17:51), Andrea Arcangeli didst pronounce: On Sat, Sep 15, 2007 at 02:14:42PM +0200, Goswin von Brederlow wrote: I keep coming back to the fact that movable objects should be moved out of the way for unmovable ones. Anything else just allows That's incidentally exactly what the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Jörn Engel wrote: I have been toying with the idea of having seperate caches for pinned and movable dentries. Downside of such a patch would be the number of memcpy() operations when moving dentries from one cache to the other. Totally inappropriate. I bet 99% of all

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Jörn Engel
On Sun, 16 September 2007 11:15:36 -0700, Linus Torvalds wrote: On Sun, 16 Sep 2007, Jörn Engel wrote: I have been toying with the idea of having seperate caches for pinned and movable dentries. Downside of such a patch would be the number of memcpy() operations when moving dentries

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Jörn Engel wrote: My approach is to have one for mount points and ramfs/tmpfs/sysfs/etc. which are pinned for their entire lifetime and another for regular files/inodes. One could take a three-way approach and have always-pinned, often-pinned and rarely-pinned. We

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Andrea Arcangeli
On Sun, Sep 16, 2007 at 07:15:04PM +0100, Mel Gorman wrote: Except now as I've repeatadly pointed out, you have internal fragmentation problems. If we went with the SLAB, we would need 16MB slabs on PowerPC for example to get the same sort of results and a lot of copying and moving when Well

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Mel Gorman
On (16/09/07 20:50), Andrea Arcangeli didst pronounce: On Sun, Sep 16, 2007 at 07:15:04PM +0100, Mel Gorman wrote: Except now as I've repeatadly pointed out, you have internal fragmentation problems. If we went with the SLAB, we would need 16MB slabs on PowerPC for example to get the same

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Mel Gorman
On (16/09/07 17:08), Andrea Arcangeli didst pronounce: On Sun, Sep 16, 2007 at 03:54:56PM +0200, Goswin von Brederlow wrote: Andrea Arcangeli [EMAIL PROTECTED] writes: On Sat, Sep 15, 2007 at 10:14:44PM +0200, Goswin von Brederlow wrote: - Userspace allocates a lot of memory in those

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Mel Gorman
On (15/09/07 02:31), Goswin von Brederlow didst pronounce: Mel Gorman [EMAIL PROTECTED] writes: On Fri, 2007-09-14 at 18:10 +0200, Goswin von Brederlow wrote: Nick Piggin [EMAIL PROTECTED] writes: In my attack, I cause the kernel to allocate lots of unmovable allocations and

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Andrea Arcangeli
On Sun, Sep 16, 2007 at 09:54:18PM +0100, Mel Gorman wrote: The 16MB is the size of a hugepage, the size of interest as far as I am concerned. Your idea makes sense for large block support, but much less for huge pages because you are incurring a cost in the general case for something that may

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Mel Gorman
On (16/09/07 19:53), J?rn Engel didst pronounce: On Sat, 15 September 2007 01:44:49 -0700, Andrew Morton wrote: On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I agree with your concern, those numbers are quite silly. The chances of 99.8% of pages being

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
[EMAIL PROTECTED] (Mel Gorman) writes: On (15/09/07 14:14), Goswin von Brederlow didst pronounce: Andrew Morton [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I agree with your concern, those numbers are quite silly. The chances

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
Jörn Engel [EMAIL PROTECTED] writes: On Sun, 16 September 2007 00:30:32 +0200, Andrea Arcangeli wrote: Movable? I rather assume all slab allocations aren't movable. Then slab defrag can try to tackle on users like dcache and inodes. Keep in mind that with the exception of updatedb, those

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
[EMAIL PROTECTED] (Mel Gorman) writes: On (15/09/07 02:31), Goswin von Brederlow didst pronounce: Mel Gorman [EMAIL PROTECTED] writes: On Fri, 2007-09-14 at 18:10 +0200, Goswin von Brederlow wrote: Nick Piggin [EMAIL PROTECTED] writes: In my attack, I cause the kernel to allocate

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Jörn Engel
On Mon, 17 September 2007 00:06:24 +0200, Goswin von Brederlow wrote: How probable is it that the dentry is needed again? If you copy it and it is not needed then you wasted time. If you throw it out and it is needed then you wasted time too. Depending on the probability one of the two is

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
Linus Torvalds [EMAIL PROTECTED] writes: On Sun, 16 Sep 2007, Jörn Engel wrote: My approach is to have one for mount points and ramfs/tmpfs/sysfs/etc. which are pinned for their entire lifetime and another for regular files/inodes. One could take a three-way approach and have

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
[EMAIL PROTECTED] (Mel Gorman) writes: On (16/09/07 17:08), Andrea Arcangeli didst pronounce: zooming in I see red pixels all over the squares mized with green pixels in the same square. This is exactly what happens with the variable order page cache and that's why it provides zero guarantees

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread Goswin von Brederlow
Andrea Arcangeli [EMAIL PROTECTED] writes: You ignore one other bit, when /usr/bin/free says 1G is free, with config-page-shift it's free no matter what, same goes for not mlocked cache. With variable order page cache, /usr/bin/free becomes mostly a lie as long as there's no 4k fallback (like

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-16 Thread David Chinner
On Fri, Sep 14, 2007 at 06:48:55AM +1000, Nick Piggin wrote: On Thursday 13 September 2007 12:01, Nick Piggin wrote: On Thursday 13 September 2007 23:03, David Chinner wrote: Then just do operations on directories with lots of files in them (tens of thousands). Every directory operation

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-15 Thread Andrew Morton
On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I agree with your concern, those numbers are quite silly. The chances of 99.8% of pages being free and the remaining 0.2% being perfectly spread across all 2MB large_pages are lower than those of SHA1 creating a

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-15 Thread Goswin von Brederlow
Andrew Morton [EMAIL PROTECTED] writes: On Tue, 11 Sep 2007 14:12:26 +0200 Jörn Engel [EMAIL PROTECTED] wrote: While I agree with your concern, those numbers are quite silly. The chances of 99.8% of pages being free and the remaining 0.2% being perfectly spread across all 2MB large_pages

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-15 Thread Andrea Arcangeli
On Sat, Sep 15, 2007 at 02:14:42PM +0200, Goswin von Brederlow wrote: I keep coming back to the fact that movable objects should be moved out of the way for unmovable ones. Anything else just allows That's incidentally exactly what the slab does, no need to reinvent the wheel for that, it's an

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-15 Thread Goswin von Brederlow
Andrea Arcangeli [EMAIL PROTECTED] writes: On Sat, Sep 15, 2007 at 02:14:42PM +0200, Goswin von Brederlow wrote: I keep coming back to the fact that movable objects should be moved out of the way for unmovable ones. Anything else just allows That's incidentally exactly what the slab does, no

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-15 Thread Andrea Arcangeli
On Sat, Sep 15, 2007 at 10:14:44PM +0200, Goswin von Brederlow wrote: How does that help? Will slabs move objects around to combine two 1. It helps providing a few guarantees: when you run /usr/bin/free you won't get a random number, but a strong _guarantee_. That ram will be available no matter

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Nick Piggin
On Thursday 13 September 2007 09:06, Christoph Lameter wrote: On Wed, 12 Sep 2007, Nick Piggin wrote: So lumpy reclaim does not change my formula nor significantly help against a fragmentation attack. AFAIKS. Lumpy reclaim improves the situation significantly because the overwhelming

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Nick Piggin
On Thursday 13 September 2007 12:01, Nick Piggin wrote: On Thursday 13 September 2007 23:03, David Chinner wrote: Then just do operations on directories with lots of files in them (tens of thousands). Every directory operation will require at least one vmap in this situation - e.g. a

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Nick Piggin
On Thursday 13 September 2007 09:17, Christoph Lameter wrote: On Wed, 12 Sep 2007, Nick Piggin wrote: I will still argue that my approach is the better technical solution for large block support than yours, I don't think we made progress on that. And I'm quite sure we agreed at the VM

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Goswin von Brederlow
Hi, Nick Piggin [EMAIL PROTECTED] writes: In my attack, I cause the kernel to allocate lots of unmovable allocations and deplete movable groups. I theoretically then only need to keep a small number (1/2^N) of these allocations around in order to DoS a page allocation of order N. I'm

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Mel Gorman
On Fri, 2007-09-14 at 18:10 +0200, Goswin von Brederlow wrote: Nick Piggin [EMAIL PROTECTED] writes: In my attack, I cause the kernel to allocate lots of unmovable allocations and deplete movable groups. I theoretically then only need to keep a small number (1/2^N) of these allocations

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Christoph Lameter
On Fri, 14 Sep 2007, Nick Piggin wrote: [*] ok, this isn't quite true because if you can actually put a hard limit on unmovable allocations then anti-frag will fundamentally help -- get back to me on that when you get patches to move most of the obvious ones. We have this hard

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Christoph Lameter
On Fri, 14 Sep 2007, Nick Piggin wrote: However fsblock can do everything that higher order pagecache can do in terms of avoiding vmap and giving contiguous memory to block devices by opportunistically allocating higher orders of pages, and falling back to vmap if they cannot be satisfied.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Christoph Lameter
On Fri, 14 Sep 2007, Christoph Lameter wrote: an -ENOMEM. Given the quantities of pages on todays machine--a 1 G machine s/1G/1T/ Sigh. has 256 milllion 4k pages--and the unmovable ratios we see today it 256k for 1G. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Goswin von Brederlow
Mel Gorman [EMAIL PROTECTED] writes: On Fri, 2007-09-14 at 18:10 +0200, Goswin von Brederlow wrote: Nick Piggin [EMAIL PROTECTED] writes: In my attack, I cause the kernel to allocate lots of unmovable allocations and deplete movable groups. I theoretically then only need to keep a small

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Goswin von Brederlow
Christoph Lameter [EMAIL PROTECTED] writes: On Fri, 14 Sep 2007, Christoph Lameter wrote: an -ENOMEM. Given the quantities of pages on todays machine--a 1 G machine s/1G/1T/ Sigh. has 256 milllion 4k pages--and the unmovable ratios we see today it 256k for 1G. 256k == 64 pages for 1GB

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 11:49, David Chinner wrote: On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote: I just gave 4 things which combined might easily reduce xfs vmap overhead by several orders of magnitude, all without changing much code at all. Patches would be greatly

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Mel Gorman
On (12/09/07 16:17), Christoph Lameter didst pronounce: On Wed, 12 Sep 2007, Nick Piggin wrote: I will still argue that my approach is the better technical solution for large block support than yours, I don't think we made progress on that. And I'm quite sure we agreed at the VM summit

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread David Chinner
On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote: On Thursday 13 September 2007 11:49, David Chinner wrote: On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote: I just gave 4 things which combined might easily reduce xfs vmap overhead by several orders of magnitude,

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 23:03, David Chinner wrote: On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote: Well, it may not be easy to _fix_, but it's easy to try a few improvements ;) How do I make an image and run a workload that will coerce XFS into doing a significant

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Christoph Lameter
On Thu, 13 Sep 2007, Mel Gorman wrote: Surely, we'll be able to detect the situation where the memory is really contiguous as a fast path and have a slower path where fragmentation was a problem. Yes I have a draft here now of a virtual compound page solution that I am testing with SLUB.

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Nick Piggin
On Wednesday 12 September 2007 11:49, David Chinner wrote: On Tue, Sep 11, 2007 at 04:00:17PM +1000, Nick Piggin wrote: OTOH, I'm not sure how much buy-in there was from the filesystems guys. Particularly Christoph H and XFS (which is strange because they already do vmapping in

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Andrea Arcangeli
On Tue, Sep 11, 2007 at 05:04:41PM -0700, Christoph Lameter wrote: I would think that your approach would be slower since you always have to populate 1 N ptes when mmapping a file? Plus there is a lot of wastage I don't have to populate them, I could just map one at time. The only reason I

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Martin J. Bligh
Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: But that's not my place to say, and I'm actually not arguing that high order pagecache does not have uses (especially as a practical, shorter-term solution which is unintrusive to filesystems). So no, I don't think I'm really

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Nick Piggin
On Wednesday 12 September 2007 11:49, David Chinner wrote: On Tue, Sep 11, 2007 at 04:00:17PM +1000, Nick Piggin wrote: OTOH, I'm not sure how much buy-in there was from the filesystems guys. Particularly Christoph H and XFS (which is strange because they already do vmapping in

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Nick Piggin
On Wednesday 12 September 2007 10:00, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: Yes. I think we differ on our interpretations of okay. In my interpretation, it is not OK to use this patch as a way to solve VM or FS or IO scalability issues, especially not while the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Christoph Lameter
On Wed, 12 Sep 2007, Nick Piggin wrote: In my attack, I cause the kernel to allocate lots of unmovable allocations and deplete movable groups. I theoretically then only need to keep a small number (1/2^N) of these allocations around in order to DoS a page allocation of order N. True. That is

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Christoph Lameter
On Wed, 12 Sep 2007, Nick Piggin wrote: I will still argue that my approach is the better technical solution for large block support than yours, I don't think we made progress on that. And I'm quite sure we agreed at the VM summit not to rely on your patches for VM or IO scalability. The

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Tuesday 11 September 2007 16:03, Christoph Lameter wrote: 5. VM scalability Large block sizes mean less state keeping for the information being transferred. For a 1TB file one needs to handle 256 million page structs in the VM if one uses 4k page size. A 64k page size reduces

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Andrea Arcangeli
On Tue, Sep 11, 2007 at 04:52:19AM +1000, Nick Piggin wrote: The idea that there even _is_ a bug to fail when higher order pages cannot be allocated was also brushed aside by some people at the vm/fs summit. I don't know if those people had gone through the math about this, but it goes

  1   2   >