[patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations
Tell the page allocator that pages allocated for a buffered write are expected to become dirty soon. Signed-off-by: Johannes Weiner jwei...@redhat.com --- fs/btrfs/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e7872e4..ea1b892 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, again: for (i = 0; i num_pages; i++) { pages[i] = find_or_create_page(inode-i_mapping, index + i, - GFP_NOFS); + GFP_NOFS | __GFP_WRITE); if (!pages[i]) { faili = i - 1; err = -ENOMEM; -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/4] 50% faster writing to your USB drive!*
*if you use ntfs-3g copy files larger than main memory or: per-zone dirty limits There have been several discussions and patches around the issue of dirty pages being written from page reclaim, that is, they reach the end of the LRU list before they are cleaned. Proposed reasons for this are the divergence of dirtying age from page cache age, on one hand, and unequal distribution of the globally limited dirty memory across the LRU lists of different zones. Mel's recent patches to reduce writes from reclaim, by simply skipping over dirty pages until a certain amount of memory pressure builds up, do help quite a bit. But they can only deal with a limited length of runs of dirty pages before kswapd goes to lower priority levels to balance the zone and begins writing. The unequal distribution of dirty memory between zones is easily observable through the statistics in /proc/zoneinfo, but the test results varied between filesystems. To get an overview of where and how often different page cache pages are created and dirtied, I hacked together an object tracker that remembers the instantiator of a page cache page and associates with it the paths that dirty or activate the page, together with counters that indicate how often those operations occur. Btrfs, for example, appears to be activating a significant amount of regularly written tree data with mark_page_accessed(), even with a purely linear, page-aligned write load. So in addition to the already unbounded dirty memory on smaller zones, this is a divergence between page age and dirtying age and leads to a situation where the pages reclaimed next are not the ones that are also flushed next: pgactivate min| median| max xfs: 5.000| 6.500| 20.000 fuse-ntfs: 5.000| 19.000| 275.000 ext4: 2.000| 67.000| 810.000 btrfs: 2915.000|3316.500|5786.000 ext4's delalloc, on the other hand, refuses regular write attemps from kjournald, but the write index of the inode is still advanced for cyclic write ranges and so the pages are not even immediately written when the inode is selected again. I cc'd the filesystem people because it is at least conceivable that things could be improved on their side, but I do think the problem is mainly with the VM and needs fixing there. This patch series implements per-zone dirty limits, derived from the configured global dirty limits and the individual zone size, that the page allocator uses to distribute pages allocated for writing across the allowable zones. Even with pages dirtied out of the inactive LRU order this gives page reclaim a minimum number of clean pages on each LRU so that balancing a zone should no longer require writeback in the common case. The previous version included code to wake the flushers and stall the allocation on NUMA setups where the load is bound to a node that is in itself not large enough to reach the global dirty limits, but I am still trying to get it to work reliably and dropped it for now, the series has merits even without it. Test results 15M DMA + 3246M DMA32 + 504 Normal = 3765M memory 40% dirty ratio 16G USB thumb drive 10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 15)) seconds nr_vmscan_write (stddev) min| median|max xfs vanilla: 549.747( 3.492) 0.000| 0.000| 0.000 patched: 550.996( 3.802) 0.000| 0.000| 0.000 fuse-ntfs vanilla:1183.094(53.178) 54349.000| 59341.000| 65163.000 patched: 558.049(17.914) 0.000| 0.000| 43.000 btrfs vanilla: 573.679(14.015)156657.000| 460178.000| 606926.000 patched: 563.365(11.368) 0.000| 0.000| 1362.000 ext4 vanilla: 561.197(15.782) 0.000|2725438.000|4143837.000 patched: 568.806(17.496) 0.000| 0.000| 0.000 Even though most filesystems already ignore the write request from reclaim, we were reluctant in the past to remove it, as it was still theoretically our only means to stay on top of the dirty pages on a per-zone basis. This patchset should get us closer to removing the dreaded writepage call from page reclaim altogether. Hannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/4] mm: exclude reserved pages from dirtyable memory
The amount of dirtyable pages should not include the total number of free pages: there is a number of reserved pages that the page allocator and kswapd always try to keep free. The closer (reclaimable pages - dirty pages) is to the number of reserved pages, the more likely it becomes for reclaim to run into dirty pages: +--+ --- | anon | | +--+ | | | | | | -- dirty limit new-- flusher new | file | | | | | | | | | -- dirty limit old-- flusher old | || +--+ --- reclaim | reserved | +--+ | kernel | +--+ Not treating reserved pages as dirtyable on a global level is only a conceptual fix. In reality, dirty pages are not distributed equally across zones and reclaim runs into dirty pages on a regular basis. But it is important to get this right before tackling the problem on a per-zone level, where the distance between reclaim and the dirty pages is mostly much smaller in absolute numbers. Signed-off-by: Johannes Weiner jwei...@redhat.com --- include/linux/mmzone.h |1 + mm/page-writeback.c|8 +--- mm/page_alloc.c|1 + 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1ed4116..e28f8e0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -316,6 +316,7 @@ struct zone { * sysctl_lowmem_reserve_ratio sysctl changes. */ unsigned long lowmem_reserve[MAX_NR_ZONES]; + unsigned long totalreserve_pages; #ifdef CONFIG_NUMA int node; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index da6d263..9f896db 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -169,8 +169,9 @@ static unsigned long highmem_dirtyable_memory(unsigned long total) struct zone *z = NODE_DATA(node)-node_zones[ZONE_HIGHMEM]; - x += zone_page_state(z, NR_FREE_PAGES) + -zone_reclaimable_pages(z); + x += zone_page_state(z, NR_FREE_PAGES) - + zone-totalreserve_pages; + x += zone_reclaimable_pages(z); } /* * Make sure that the number of highmem pages is never larger @@ -194,7 +195,8 @@ static unsigned long determine_dirtyable_memory(void) { unsigned long x; - x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); + x = global_page_state(NR_FREE_PAGES) - totalreserve_pages; + x += global_reclaimable_pages(); if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1dba05e..7e8e2ee 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5075,6 +5075,7 @@ static void calculate_totalreserve_pages(void) if (max zone-present_pages) max = zone-present_pages; + zone-totalreserve_pages = max; reserve_pages += max; } } -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/4] mm: writeback: distribute write pages across allowable zones
This patch allows allocators to pass __GFP_WRITE when they know in advance that the allocated page will be written to and become dirty soon. The page allocator will then attempt to distribute those allocations across zones, such that no single zone will end up full of dirty, and thus more or less, unreclaimable pages. The global dirty limits are put in proportion to the respective zone's amount of dirtyable memory and allocations diverted to other zones when the limit is reached. For now, the problem remains for NUMA configurations where the zones allowed for allocation are in sum not big enough to trigger the global dirty limits, but a future approach to solve this can reuse the per-zone dirty limit infrastructure laid out in this patch to have dirty throttling and the flusher threads consider individual zones. Signed-off-by: Johannes Weiner jwei...@redhat.com --- include/linux/gfp.h |4 ++- include/linux/writeback.h |1 + mm/page-writeback.c | 66 +--- mm/page_alloc.c | 22 ++- 4 files changed, 80 insertions(+), 13 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 3a76faf..50efc7e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -36,6 +36,7 @@ struct vm_area_struct; #endif #define ___GFP_NO_KSWAPD 0x40u #define ___GFP_OTHER_NODE 0x80u +#define ___GFP_WRITE 0x100u /* * GFP bitmasks.. @@ -85,6 +86,7 @@ struct vm_area_struct; #define __GFP_NO_KSWAPD((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ +#define __GFP_WRITE((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ /* * This may seem redundant, but it's a way of annotating false positives vs. @@ -92,7 +94,7 @@ struct vm_area_struct; */ #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) -#define __GFP_BITS_SHIFT 24/* Room for N __GFP_FOO bits */ +#define __GFP_BITS_SHIFT 25/* Room for N __GFP_FOO bits */ #define __GFP_BITS_MASK ((__force gfp_t)((1 __GFP_BITS_SHIFT) - 1)) /* This equals 0, but use constants in case they ever change */ diff --git a/include/linux/writeback.h b/include/linux/writeback.h index a5f495f..c96ee0c 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -104,6 +104,7 @@ void laptop_mode_timer_fn(unsigned long data); static inline void laptop_sync_completion(void) { } #endif void throttle_vm_writeout(gfp_t gfp_mask); +bool zone_dirty_ok(struct zone *zone); extern unsigned long global_dirty_limit; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9f896db..1fc714c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -142,6 +142,22 @@ unsigned long global_dirty_limit; static struct prop_descriptor vm_completions; static struct prop_descriptor vm_dirties; +static unsigned long zone_dirtyable_memory(struct zone *zone) +{ + unsigned long x; + /* +* To keep a reasonable ratio between dirty memory and lowmem, +* highmem is not considered dirtyable on a global level. +* +* But we allow individual highmem zones to hold a potentially +* bigger share of that global amount of dirty pages as long +* as they have enough free or reclaimable pages around. +*/ + x = zone_page_state(zone, NR_FREE_PAGES) - zone-totalreserve_pages; + x += zone_reclaimable_pages(zone); + return x; +} + /* * Work out the current dirty-memory clamping and background writeout * thresholds. @@ -417,7 +433,7 @@ static unsigned long hard_dirty_limit(unsigned long thresh) } /* - * global_dirty_limits - background-writeback and dirty-throttling thresholds + * dirty_limits - background-writeback and dirty-throttling thresholds * * Calculate the dirty thresholds based on sysctl parameters * - vm.dirty_background_ratio or vm.dirty_background_bytes @@ -425,24 +441,35 @@ static unsigned long hard_dirty_limit(unsigned long thresh) * The dirty limits will be lifted by 1/4 for PF_LESS_THROTTLE (ie. nfsd) and * real-time tasks. */ -void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty) +static void dirty_limits(struct zone *zone, +unsigned long *pbackground, +unsigned long *pdirty) { + unsigned long uninitialized_var(zone_memory); + unsigned long available_memory; + unsigned long global_memory; unsigned long background; - unsigned long dirty; - unsigned long uninitialized_var(available_memory); struct task_struct *tsk; + unsigned long dirty; - if (!vm_dirty_bytes || !dirty_background_bytes) - available_memory = determine_dirtyable_memory(); + global_memory = determine_dirtyable_memory(); + if (zone) + available_memory = zone_memory = zone_dirtyable_memory(zone); +
Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations
On Tue, Sep 20, 2011 at 03:45:15PM +0200, Johannes Weiner wrote: Tell the page allocator that pages allocated for a buffered write are expected to become dirty soon. Signed-off-by: Johannes Weiner jwei...@redhat.com --- fs/btrfs/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e7872e4..ea1b892 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, again: for (i = 0; i num_pages; i++) { pages[i] = find_or_create_page(inode-i_mapping, index + i, -GFP_NOFS); +GFP_NOFS | __GFP_WRITE); Btw and unrelated to this particular series, I think this should use grab_cache_page_write_begin() in the first place. Most grab_cache_page calls were replaced recently (a94733d Btrfs: use find_or_create_page instead of grab_cache_page) to be able to pass GFP_NOFS, but the pages are now also no longer __GFP_HIGHMEM and __GFP_MOVABLE, which irks both x86_32 and memory hotplug. It might be better to change grab_cache_page instead to take a flags argument that allows passing AOP_FLAG_NOFS and revert the sites back to this helper? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations
On 09/20/2011 09:56 AM, Johannes Weiner wrote: On Tue, Sep 20, 2011 at 03:45:15PM +0200, Johannes Weiner wrote: Tell the page allocator that pages allocated for a buffered write are expected to become dirty soon. Signed-off-by: Johannes Weiner jwei...@redhat.com --- fs/btrfs/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e7872e4..ea1b892 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, again: for (i = 0; i num_pages; i++) { pages[i] = find_or_create_page(inode-i_mapping, index + i, - GFP_NOFS); + GFP_NOFS | __GFP_WRITE); Btw and unrelated to this particular series, I think this should use grab_cache_page_write_begin() in the first place. Most grab_cache_page calls were replaced recently (a94733d Btrfs: use find_or_create_page instead of grab_cache_page) to be able to pass GFP_NOFS, but the pages are now also no longer __GFP_HIGHMEM and __GFP_MOVABLE, which irks both x86_32 and memory hotplug. It might be better to change grab_cache_page instead to take a flags argument that allows passing AOP_FLAG_NOFS and revert the sites back to this helper? So I can do pages[i] = grab_cache_page_write_begin(inode-i_mapping, index + i, AOP_FLAG_NOFS); right? All we need is nofs, so I can just go through and change everybody to that. I'd rather not have to go through and change grab_cache_page() to take a flags argument and change all the callers, I have a bad habit of screwing stuff like that up :). Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations
On Tue, Sep 20, 2011 at 10:09:38AM -0400, Josef Bacik wrote: On 09/20/2011 09:56 AM, Johannes Weiner wrote: On Tue, Sep 20, 2011 at 03:45:15PM +0200, Johannes Weiner wrote: Tell the page allocator that pages allocated for a buffered write are expected to become dirty soon. Signed-off-by: Johannes Weiner jwei...@redhat.com --- fs/btrfs/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e7872e4..ea1b892 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, again: for (i = 0; i num_pages; i++) { pages[i] = find_or_create_page(inode-i_mapping, index + i, - GFP_NOFS); + GFP_NOFS | __GFP_WRITE); Btw and unrelated to this particular series, I think this should use grab_cache_page_write_begin() in the first place. Most grab_cache_page calls were replaced recently (a94733d Btrfs: use find_or_create_page instead of grab_cache_page) to be able to pass GFP_NOFS, but the pages are now also no longer __GFP_HIGHMEM and __GFP_MOVABLE, which irks both x86_32 and memory hotplug. It might be better to change grab_cache_page instead to take a flags argument that allows passing AOP_FLAG_NOFS and revert the sites back to this helper? So I can do pages[i] = grab_cache_page_write_begin(inode-i_mapping, index + i, AOP_FLAG_NOFS); right? All we need is nofs, so I can just go through and change everybody to that. It does wait_on_page_writeback() in addition, so it may not be appropriate for every callsite, I haven't checked. But everything that grabs a page for writing should be fine if you do it like this. I'd rather not have to go through and change grab_cache_page() to take a flags argument and change all the callers, I have a bad habit of screwing stuff like that up :). Yeah, there are quite a few. If we can get around it, all the better. Hannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()
In addition to regular write shouldn't __do_fault and do_wp_page also calls this if they are called on file backed mappings? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Inefficient storing of ISO images with compress=lzo
On Mon, Sep 19, 2011 at 10:53:45AM +0800, Li Zefan wrote: With compress option specified, btrfs will try to compress the file, at most 128K at one time, and if the compressed result is not smaller, the file will be marked as uncompressable. I just tried with Fedora-14-i386-DVD.iso, and the first 896K is compressed, with a compress ratio about 71.7%, and the remaining data is not compressed. I'm curious how did you obtain that number and if it's a rough estimate (ie. some rounding up to 4k or such), or the % comes from exact numbers. AFAIK there are two possibilities to read compressed sizes: rough: * traverse extents, look for compressed extens and sum up extent_map-block_len, or just extent_map-len for uncompressed * block_len is rounded up to 4k * compressed inline size is not stored in any structur member, at most 4k exact: as you know, the only place where exact size of compressed data is stored are first 4 bytes of every compressed extent, counting exact size of compressed extent means to read those bytes, naturally. Touching non-metadata just to read compressed size does not look nice. I did some research in that area and my conclusion is that it there's a missing structure member compressed_length in extent_map (in-memory structure, no problem to add it there) which is filled from strcut btrfs_file_extent_item (on-disk structure, eg. holding compression type) -- disk format change :( Other members could not be used to calculate the compressed size, being either estimates by definition (ram_size) or contain size depending on other data (disk_num_bytes, depend on checksum size). Although there are 2 bytes spare for other compression types, there are none to hold the actual compression or encryption or whateverencoding length. So until there's going to be format change, there are the two ways, rough or slow, to read compressed size. (Unless I've missed something obvious etc.) Looking forward to your input or patches :) Thanks, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/4] mm: exclude reserved pages from dirtyable memory
On 09/20/2011 09:45 AM, Johannes Weiner wrote: The amount of dirtyable pages should not include the total number of free pages: there is a number of reserved pages that the page allocator and kswapd always try to keep free. The closer (reclaimable pages - dirty pages) is to the number of reserved pages, the more likely it becomes for reclaim to run into dirty pages: Signed-off-by: Johannes Weinerjwei...@redhat.com Reviewed-by: Rik van Riel r...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Just wondering if/how one goes about getting the btrfs checksum of a given file. Is there a way? Thanks! -Ken -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: your mail
On Tue, Sep 20, 2011 at 11:24:30AM -0400, Ken D'Ambrosio wrote: Just wondering if/how one goes about getting the btrfs checksum of a given file. Is there a way? Checksums are computed on individual 4k blocks, not on the whole file. There's no explicit interface for retrieving checksums, but if you understand the data structures, you can get hold of the checksums for a file using the BTRFS_IOC_TREE_SEARCH ioctl. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- How deep will this sub go? Oh, she'll go all the way to --- the bottom if we don't stop her. signature.asc Description: Digital signature
Re: your mail
[Your Reply-to: header was screwed up, so I'm sending this again. From: Ken D'Ambrosio k...@jots.org Reply-to: File's...@jots.org, checksum?@jots.org ] On Tue, Sep 20, 2011 at 04:35:40PM +0100, Hugo Mills wrote: On Tue, Sep 20, 2011 at 11:24:30AM -0400, Ken D'Ambrosio wrote: Just wondering if/how one goes about getting the btrfs checksum of a given file. Is there a way? Checksums are computed on individual 4k blocks, not on the whole file. There's no explicit interface for retrieving checksums, but if you understand the data structures, you can get hold of the checksums for a file using the BTRFS_IOC_TREE_SEARCH ioctl. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- How deep will this sub go? Oh, she'll go all the way to --- the bottom if we don't stop her. signature.asc Description: Digital signature
Re: [GIT PULL] Btrfs fixes
Hi Chris- This pull misses the clone reservation fix again... :) http://www.spinics.net/lists/linux-btrfs/msg11826.html Thanks! sage On Mon, 19 Sep 2011, Chris Mason wrote: Hi everyone, The for-linus branch of the btrfs tree on github: Head commit: a66e7cc626f42de6c745963fe0d807518fa49d39 git://github.com/chrismason/linux.git for-linus Has the following fixes. for-linus is against rc6, since some of these are regression fixes for earlier 3.1 btrfs commits. The most important of the bunch is Josef's dentry fix, which avoids enoents if we race with multiple procs hitting on the same inode. This bug is btrfs-specific, it came in with his optimization to cache the inode location during readdir. Li Zefan (3) commits (+9/-5): Btrfs: don't make a file partly checksummed through file clone (+5/-0) Btrfs: don't change inode flag of the dest clone file (+0/-1) Btrfs: fix pages truncation in btrfs_ioctl_clone() (+4/-4) Josef Bacik (1) commits (+11/-2): Btrfs: only clear the need lookup flag after the dentry is setup Jeff Liu (1) commits (+7/-2): BTRFS: Fix lseek return value for error Hidetoshi Seto (1) commits (+3/-2): btrfs: fix d_off in the first dirent Total: (6) commits (+30/-11) fs/btrfs/file.c |9 +++-- fs/btrfs/inode.c | 18 ++ fs/btrfs/ioctl.c | 14 +- 3 files changed, 30 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/4] mm: writeback: distribute write pages across allowable zones
On 09/20/2011 09:45 AM, Johannes Weiner wrote: This patch allows allocators to pass __GFP_WRITE when they know in advance that the allocated page will be written to and become dirty soon. The page allocator will then attempt to distribute those allocations across zones, such that no single zone will end up full of dirty, and thus more or less, unreclaimable pages. The global dirty limits are put in proportion to the respective zone's amount of dirtyable memory and allocations diverted to other zones when the limit is reached. For now, the problem remains for NUMA configurations where the zones allowed for allocation are in sum not big enough to trigger the global dirty limits, but a future approach to solve this can reuse the per-zone dirty limit infrastructure laid out in this patch to have dirty throttling and the flusher threads consider individual zones. Signed-off-by: Johannes Weinerjwei...@redhat.com Reviewed-by: Rik van Riel r...@redhat.com The amount of work done in a __GFP_WRITE allocation looks a little daunting, but doing that a million times probably outweighs waiting on the disk even once, so... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()
On 09/20/2011 10:25 AM, Christoph Hellwig wrote: In addition to regular write shouldn't __do_fault and do_wp_page also calls this if they are called on file backed mappings? Probably not do_wp_page since it always creates an anonymous page, which are not very relevant to the dirty page cache accounting. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()
On Tue, Sep 20, 2011 at 02:38:03PM -0400, Rik van Riel wrote: On 09/20/2011 10:25 AM, Christoph Hellwig wrote: In addition to regular write shouldn't __do_fault and do_wp_page also calls this if they are called on file backed mappings? Probably not do_wp_page since it always creates an anonymous page, which are not very relevant to the dirty page cache accounting. Well, it doesn't always - but for the case where it doesn't we do not allocate a new page at all so you're right in the end :) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations
On 09/20/2011 09:45 AM, Johannes Weiner wrote: Tell the page allocator that pages allocated for a buffered write are expected to become dirty soon. Signed-off-by: Johannes Weinerjwei...@redhat.com Reviewed-by: Rik van Riel r...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] missed btrfs fix
Sage mentioned I was missing a patch. So I've retested and updated the git tree. Since Linus did pull my tree yesterday, here's a new pull request with the single commit. Linus I have this in two flavors. One is merged on top of my for-linus branch, which was 3.1-rc6 + my last pull request: head: 0a7a0519d1789f3a222849421dbe91b6bddb88f5 git://github.com/chrismason/linux.git for-linus Second is just against the btrfs-3.0 tree. I have the two branches just so the N-1 world can update to the latest fixes without running the rest of the rc kernel. I know in the git universe these are all the same, but I'm assuming you'll want to skip my merge commit: head: b6f3409b2197e8fcedb43e6600e37b7cfbe0715b git://github.com/chrismason/linux.git btrfs-3.0 Sage Weil (1) commits (+6/-1): Btrfs: reserve sufficient space for ioctl clone fs/btrfs/ioctl.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:2299
On Mon, 2011-09-19 at 02:44 +0200, Maciej Marcin Piechotka wrote: On Tue, 2011-08-30 at 14:27 +0800, Miao Xie wrote: Unfortunately it results in freeze of system and I cannot give more details. Sometimes it happens not from fcron but then it does not result in freeze (???). Could you give me the method to reproduce it? Thanks Miao Sorry for spamming in this thread but I'm trying to post my findings in hope that somebody will understand what's going on. Recent crash gave some valuable information IMHO: 1. I started the autocompletion of path in zsh 2. At some point the zsh hanged. In ps the process was listed as runnable 3. Any access to root volume (the one that zsh was trying to readdir) finished in hang. 4. I was able to access the child volume (/home) 5. After some time the bug is hit. At this time strange things happens (screen freeze etc.). I guess that there is some strange interaction between KMS, X and now-hanged composite manager Next time it happend (also during listing root directory of volume 0) I observed the following thing - I can log out and unmount home but the volume 0 remains busy and cannot be unmounted. Things to consider: - It is not enabled/disabled by any mount option - Is it triggered when the parent volume (say volume 0) and child volume are both mounted? I cannot reproduce it when the parent volume is not mounted (snapshots are to subvolume) - Which case is it failing (I've tried to add printk but I cannot find the option in printk to print u64) - Why it happens only during night? Regards Regards signature.asc Description: This is a digitally signed message part