[patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations

2011-09-20 Thread Johannes Weiner
Tell the page allocator that pages allocated for a buffered write are
expected to become dirty soon.

Signed-off-by: Johannes Weiner jwei...@redhat.com
---
 fs/btrfs/file.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e7872e4..ea1b892 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root 
*root, struct file *file,
 again:
for (i = 0; i  num_pages; i++) {
pages[i] = find_or_create_page(inode-i_mapping, index + i,
-  GFP_NOFS);
+  GFP_NOFS | __GFP_WRITE);
if (!pages[i]) {
faili = i - 1;
err = -ENOMEM;
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/4] 50% faster writing to your USB drive!*

2011-09-20 Thread Johannes Weiner
*if you use ntfs-3g copy files larger than main memory

or: per-zone dirty limits

There have been several discussions and patches around the issue of
dirty pages being written from page reclaim, that is, they reach the
end of the LRU list before they are cleaned.

Proposed reasons for this are the divergence of dirtying age from page
cache age, on one hand, and unequal distribution of the globally
limited dirty memory across the LRU lists of different zones.

Mel's recent patches to reduce writes from reclaim, by simply skipping
over dirty pages until a certain amount of memory pressure builds up,
do help quite a bit.  But they can only deal with a limited length of
runs of dirty pages before kswapd goes to lower priority levels to
balance the zone and begins writing.

The unequal distribution of dirty memory between zones is easily
observable through the statistics in /proc/zoneinfo, but the test
results varied between filesystems.  To get an overview of where and
how often different page cache pages are created and dirtied, I hacked
together an object tracker that remembers the instantiator of a page
cache page and associates with it the paths that dirty or activate the
page, together with counters that indicate how often those operations
occur.

Btrfs, for example, appears to be activating a significant amount of
regularly written tree data with mark_page_accessed(), even with a
purely linear, page-aligned write load.  So in addition to the already
unbounded dirty memory on smaller zones, this is a divergence between
page age and dirtying age and leads to a situation where the pages
reclaimed next are not the ones that are also flushed next:

pgactivate
 min|  median| max
  xfs: 5.000|   6.500|  20.000
fuse-ntfs: 5.000|  19.000| 275.000
 ext4: 2.000|  67.000| 810.000 
btrfs:  2915.000|3316.500|5786.000

ext4's delalloc, on the other hand, refuses regular write attemps from
kjournald, but the write index of the inode is still advanced for
cyclic write ranges and so the pages are not even immediately written
when the inode is selected again.

I cc'd the filesystem people because it is at least conceivable that
things could be improved on their side, but I do think the problem is
mainly with the VM and needs fixing there.

This patch series implements per-zone dirty limits, derived from the
configured global dirty limits and the individual zone size, that the
page allocator uses to distribute pages allocated for writing across
the allowable zones.  Even with pages dirtied out of the inactive LRU
order this gives page reclaim a minimum number of clean pages on each
LRU so that balancing a zone should no longer require writeback in the
common case.

The previous version included code to wake the flushers and stall the
allocation on NUMA setups where the load is bound to a node that is in
itself not large enough to reach the global dirty limits, but I am
still trying to get it to work reliably and dropped it for now, the
series has merits even without it.

Test results

15M DMA + 3246M DMA32 + 504 Normal = 3765M memory
40% dirty ratio
16G USB thumb drive
10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10  15))

seconds nr_vmscan_write
(stddev)   min| median|max
xfs
vanilla: 549.747( 3.492) 0.000|  0.000|  0.000
patched: 550.996( 3.802) 0.000|  0.000|  0.000

fuse-ntfs
vanilla:1183.094(53.178) 54349.000|  59341.000|  65163.000
patched: 558.049(17.914) 0.000|  0.000| 43.000

btrfs
vanilla: 573.679(14.015)156657.000| 460178.000| 606926.000
patched: 563.365(11.368) 0.000|  0.000|   1362.000

ext4
vanilla: 561.197(15.782) 0.000|2725438.000|4143837.000
patched: 568.806(17.496) 0.000|  0.000|  0.000

Even though most filesystems already ignore the write request from
reclaim, we were reluctant in the past to remove it, as it was still
theoretically our only means to stay on top of the dirty pages on a
per-zone basis.  This patchset should get us closer to removing the
dreaded writepage call from page reclaim altogether.

Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/4] mm: exclude reserved pages from dirtyable memory

2011-09-20 Thread Johannes Weiner
The amount of dirtyable pages should not include the total number of
free pages: there is a number of reserved pages that the page
allocator and kswapd always try to keep free.

The closer (reclaimable pages - dirty pages) is to the number of
reserved pages, the more likely it becomes for reclaim to run into
dirty pages:

   +--+ ---
   |   anon   |  |
   +--+  |
   |  |  |
   |  |  -- dirty limit new-- flusher new
   |   file   |  | |
   |  |  | |
   |  |  -- dirty limit old-- flusher old
   |  ||
   +--+   --- reclaim
   | reserved |
   +--+
   |  kernel  |
   +--+

Not treating reserved pages as dirtyable on a global level is only a
conceptual fix.  In reality, dirty pages are not distributed equally
across zones and reclaim runs into dirty pages on a regular basis.

But it is important to get this right before tackling the problem on a
per-zone level, where the distance between reclaim and the dirty pages
is mostly much smaller in absolute numbers.

Signed-off-by: Johannes Weiner jwei...@redhat.com
---
 include/linux/mmzone.h |1 +
 mm/page-writeback.c|8 +---
 mm/page_alloc.c|1 +
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1ed4116..e28f8e0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -316,6 +316,7 @@ struct zone {
 * sysctl_lowmem_reserve_ratio sysctl changes.
 */
unsigned long   lowmem_reserve[MAX_NR_ZONES];
+   unsigned long   totalreserve_pages;
 
 #ifdef CONFIG_NUMA
int node;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index da6d263..9f896db 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -169,8 +169,9 @@ static unsigned long highmem_dirtyable_memory(unsigned long 
total)
struct zone *z =
NODE_DATA(node)-node_zones[ZONE_HIGHMEM];
 
-   x += zone_page_state(z, NR_FREE_PAGES) +
-zone_reclaimable_pages(z);
+   x += zone_page_state(z, NR_FREE_PAGES) -
+   zone-totalreserve_pages;
+   x += zone_reclaimable_pages(z);
}
/*
 * Make sure that the number of highmem pages is never larger
@@ -194,7 +195,8 @@ static unsigned long determine_dirtyable_memory(void)
 {
unsigned long x;
 
-   x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
+   x = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
+   x += global_reclaimable_pages();
 
if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1dba05e..7e8e2ee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5075,6 +5075,7 @@ static void calculate_totalreserve_pages(void)
 
if (max  zone-present_pages)
max = zone-present_pages;
+   zone-totalreserve_pages = max;
reserve_pages += max;
}
}
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/4] mm: writeback: distribute write pages across allowable zones

2011-09-20 Thread Johannes Weiner
This patch allows allocators to pass __GFP_WRITE when they know in
advance that the allocated page will be written to and become dirty
soon.  The page allocator will then attempt to distribute those
allocations across zones, such that no single zone will end up full of
dirty, and thus more or less, unreclaimable pages.

The global dirty limits are put in proportion to the respective zone's
amount of dirtyable memory and allocations diverted to other zones
when the limit is reached.

For now, the problem remains for NUMA configurations where the zones
allowed for allocation are in sum not big enough to trigger the global
dirty limits, but a future approach to solve this can reuse the
per-zone dirty limit infrastructure laid out in this patch to have
dirty throttling and the flusher threads consider individual zones.

Signed-off-by: Johannes Weiner jwei...@redhat.com
---
 include/linux/gfp.h   |4 ++-
 include/linux/writeback.h |1 +
 mm/page-writeback.c   |   66 +---
 mm/page_alloc.c   |   22 ++-
 4 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3a76faf..50efc7e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -36,6 +36,7 @@ struct vm_area_struct;
 #endif
 #define ___GFP_NO_KSWAPD   0x40u
 #define ___GFP_OTHER_NODE  0x80u
+#define ___GFP_WRITE   0x100u
 
 /*
  * GFP bitmasks..
@@ -85,6 +86,7 @@ struct vm_area_struct;
 
 #define __GFP_NO_KSWAPD((__force gfp_t)___GFP_NO_KSWAPD)
 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of 
other node */
+#define __GFP_WRITE((__force gfp_t)___GFP_WRITE)   /* Allocator intends to 
dirty page */
 
 /*
  * This may seem redundant, but it's a way of annotating false positives vs.
@@ -92,7 +94,7 @@ struct vm_area_struct;
  */
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 
-#define __GFP_BITS_SHIFT 24/* Room for N __GFP_FOO bits */
+#define __GFP_BITS_SHIFT 25/* Room for N __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1  __GFP_BITS_SHIFT) - 1))
 
 /* This equals 0, but use constants in case they ever change */
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a5f495f..c96ee0c 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -104,6 +104,7 @@ void laptop_mode_timer_fn(unsigned long data);
 static inline void laptop_sync_completion(void) { }
 #endif
 void throttle_vm_writeout(gfp_t gfp_mask);
+bool zone_dirty_ok(struct zone *zone);
 
 extern unsigned long global_dirty_limit;
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9f896db..1fc714c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -142,6 +142,22 @@ unsigned long global_dirty_limit;
 static struct prop_descriptor vm_completions;
 static struct prop_descriptor vm_dirties;
 
+static unsigned long zone_dirtyable_memory(struct zone *zone)
+{
+   unsigned long x;
+   /*
+* To keep a reasonable ratio between dirty memory and lowmem,
+* highmem is not considered dirtyable on a global level.
+*
+* But we allow individual highmem zones to hold a potentially
+* bigger share of that global amount of dirty pages as long
+* as they have enough free or reclaimable pages around.
+*/
+   x = zone_page_state(zone, NR_FREE_PAGES) - zone-totalreserve_pages;
+   x += zone_reclaimable_pages(zone);
+   return x;
+}
+
 /*
  * Work out the current dirty-memory clamping and background writeout
  * thresholds.
@@ -417,7 +433,7 @@ static unsigned long hard_dirty_limit(unsigned long thresh)
 }
 
 /*
- * global_dirty_limits - background-writeback and dirty-throttling thresholds
+ * dirty_limits - background-writeback and dirty-throttling thresholds
  *
  * Calculate the dirty thresholds based on sysctl parameters
  * - vm.dirty_background_ratio  or  vm.dirty_background_bytes
@@ -425,24 +441,35 @@ static unsigned long hard_dirty_limit(unsigned long 
thresh)
  * The dirty limits will be lifted by 1/4 for PF_LESS_THROTTLE (ie. nfsd) and
  * real-time tasks.
  */
-void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty)
+static void dirty_limits(struct zone *zone,
+unsigned long *pbackground,
+unsigned long *pdirty)
 {
+   unsigned long uninitialized_var(zone_memory);
+   unsigned long available_memory;
+   unsigned long global_memory;
unsigned long background;
-   unsigned long dirty;
-   unsigned long uninitialized_var(available_memory);
struct task_struct *tsk;
+   unsigned long dirty;
 
-   if (!vm_dirty_bytes || !dirty_background_bytes)
-   available_memory = determine_dirtyable_memory();
+   global_memory = determine_dirtyable_memory();
+   if (zone)
+   available_memory = zone_memory = zone_dirtyable_memory(zone);
+ 

Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations

2011-09-20 Thread Johannes Weiner
On Tue, Sep 20, 2011 at 03:45:15PM +0200, Johannes Weiner wrote:
 Tell the page allocator that pages allocated for a buffered write are
 expected to become dirty soon.
 
 Signed-off-by: Johannes Weiner jwei...@redhat.com
 ---
  fs/btrfs/file.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index e7872e4..ea1b892 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root 
 *root, struct file *file,
  again:
   for (i = 0; i  num_pages; i++) {
   pages[i] = find_or_create_page(inode-i_mapping, index + i,
 -GFP_NOFS);
 +GFP_NOFS | __GFP_WRITE);

Btw and unrelated to this particular series, I think this should use
grab_cache_page_write_begin() in the first place.

Most grab_cache_page calls were replaced recently (a94733d Btrfs: use
find_or_create_page instead of grab_cache_page) to be able to pass
GFP_NOFS, but the pages are now also no longer __GFP_HIGHMEM and
__GFP_MOVABLE, which irks both x86_32 and memory hotplug.

It might be better to change grab_cache_page instead to take a flags
argument that allows passing AOP_FLAG_NOFS and revert the sites back
to this helper?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations

2011-09-20 Thread Josef Bacik
On 09/20/2011 09:56 AM, Johannes Weiner wrote:
 On Tue, Sep 20, 2011 at 03:45:15PM +0200, Johannes Weiner wrote:
 Tell the page allocator that pages allocated for a buffered write are
 expected to become dirty soon.

 Signed-off-by: Johannes Weiner jwei...@redhat.com
 ---
  fs/btrfs/file.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index e7872e4..ea1b892 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root 
 *root, struct file *file,
  again:
  for (i = 0; i  num_pages; i++) {
  pages[i] = find_or_create_page(inode-i_mapping, index + i,
 -   GFP_NOFS);
 +   GFP_NOFS | __GFP_WRITE);
 
 Btw and unrelated to this particular series, I think this should use
 grab_cache_page_write_begin() in the first place.
 
 Most grab_cache_page calls were replaced recently (a94733d Btrfs: use
 find_or_create_page instead of grab_cache_page) to be able to pass
 GFP_NOFS, but the pages are now also no longer __GFP_HIGHMEM and
 __GFP_MOVABLE, which irks both x86_32 and memory hotplug.
 
 It might be better to change grab_cache_page instead to take a flags
 argument that allows passing AOP_FLAG_NOFS and revert the sites back
 to this helper?

So I can do

pages[i] = grab_cache_page_write_begin(inode-i_mapping, index + i,
   AOP_FLAG_NOFS);

right?  All we need is nofs, so I can just go through and change
everybody to that.  I'd rather not have to go through and change
grab_cache_page() to take a flags argument and change all the callers, I
have a bad habit of screwing stuff like that up :).  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations

2011-09-20 Thread Johannes Weiner
On Tue, Sep 20, 2011 at 10:09:38AM -0400, Josef Bacik wrote:
 On 09/20/2011 09:56 AM, Johannes Weiner wrote:
  On Tue, Sep 20, 2011 at 03:45:15PM +0200, Johannes Weiner wrote:
  Tell the page allocator that pages allocated for a buffered write are
  expected to become dirty soon.
 
  Signed-off-by: Johannes Weiner jwei...@redhat.com
  ---
   fs/btrfs/file.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index e7872e4..ea1b892 100644
  --- a/fs/btrfs/file.c
  +++ b/fs/btrfs/file.c
  @@ -1084,7 +1084,7 @@ static noinline int prepare_pages(struct btrfs_root 
  *root, struct file *file,
   again:
 for (i = 0; i  num_pages; i++) {
 pages[i] = find_or_create_page(inode-i_mapping, index + i,
  - GFP_NOFS);
  + GFP_NOFS | __GFP_WRITE);
  
  Btw and unrelated to this particular series, I think this should use
  grab_cache_page_write_begin() in the first place.
  
  Most grab_cache_page calls were replaced recently (a94733d Btrfs: use
  find_or_create_page instead of grab_cache_page) to be able to pass
  GFP_NOFS, but the pages are now also no longer __GFP_HIGHMEM and
  __GFP_MOVABLE, which irks both x86_32 and memory hotplug.
  
  It might be better to change grab_cache_page instead to take a flags
  argument that allows passing AOP_FLAG_NOFS and revert the sites back
  to this helper?
 
 So I can do
 
 pages[i] = grab_cache_page_write_begin(inode-i_mapping, index + i,
  AOP_FLAG_NOFS);
 
 right?  All we need is nofs, so I can just go through and change
 everybody to that.

It does wait_on_page_writeback() in addition, so it may not be
appropriate for every callsite, I haven't checked.  But everything
that grabs a page for writing should be fine if you do it like this.

 I'd rather not have to go through and change grab_cache_page() to
 take a flags argument and change all the callers, I have a bad habit
 of screwing stuff like that up :).

Yeah, there are quite a few.  If we can get around it, all the better.

Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()

2011-09-20 Thread Christoph Hellwig
In addition to regular write shouldn't __do_fault and do_wp_page also
calls this if they are called on file backed mappings?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Inefficient storing of ISO images with compress=lzo

2011-09-20 Thread David Sterba
On Mon, Sep 19, 2011 at 10:53:45AM +0800, Li Zefan wrote:
 With compress option specified, btrfs will try to compress the file, at most
 128K at one time, and if the compressed result is not smaller, the file will
 be marked as uncompressable.
 
 I just tried with Fedora-14-i386-DVD.iso, and the first 896K is compressed,
 with a compress ratio about 71.7%, and the remaining data is not compressed.

I'm curious how did you obtain that number and if it's a rough estimate
(ie. some rounding up to 4k or such), or the % comes from exact numbers.

AFAIK there are two possibilities to read compressed sizes:

rough:
* traverse extents, look for compressed extens and sum up
  extent_map-block_len, or just extent_map-len for uncompressed

* block_len is rounded up to 4k
* compressed inline size is not stored in any structur member, at most 4k


exact:
as you know, the only place where exact size of compressed data is
stored are first 4 bytes of every compressed extent, counting exact size
of compressed extent means to read those bytes, naturally.


Touching non-metadata just to read compressed size does not look nice. I
did some research in that area and my conclusion is that it there's a
missing structure member compressed_length in extent_map (in-memory
structure, no problem to add it there) which is filled from
strcut btrfs_file_extent_item (on-disk structure, eg. holding
compression type) -- disk format change :( Other members could not be
used to calculate the compressed size, being either estimates by
definition (ram_size) or contain size depending on other data
(disk_num_bytes, depend on checksum size).

Although there are 2 bytes spare for other compression types, there are
none to hold the actual compression or encryption or whateverencoding
length.

So until there's going to be format change, there are the two ways,
rough or slow, to read compressed size.  (Unless I've missed something
obvious etc.)

Looking forward to your input or patches :)


Thanks,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] mm: exclude reserved pages from dirtyable memory

2011-09-20 Thread Rik van Riel

On 09/20/2011 09:45 AM, Johannes Weiner wrote:

The amount of dirtyable pages should not include the total number of
free pages: there is a number of reserved pages that the page
allocator and kswapd always try to keep free.

The closer (reclaimable pages - dirty pages) is to the number of
reserved pages, the more likely it becomes for reclaim to run into
dirty pages:



Signed-off-by: Johannes Weinerjwei...@redhat.com


Reviewed-by: Rik van Riel r...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2011-09-20 Thread Ken D'Ambrosio
Just wondering if/how one goes about getting the btrfs checksum of a given
file.  Is there a way?

Thanks!

-Ken





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: your mail

2011-09-20 Thread Hugo Mills
On Tue, Sep 20, 2011 at 11:24:30AM -0400, Ken D'Ambrosio wrote:
 Just wondering if/how one goes about getting the btrfs checksum of a given
 file.  Is there a way?

   Checksums are computed on individual 4k blocks, not on the whole
file. There's no explicit interface for retrieving checksums, but if
you understand the data structures, you can get hold of the checksums
for a file using the BTRFS_IOC_TREE_SEARCH ioctl.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- How deep will this sub go? Oh,  she'll go all the way to ---   
the bottom if we don't stop her.


signature.asc
Description: Digital signature


Re: your mail

2011-09-20 Thread Hugo Mills
   [Your Reply-to: header was screwed up, so I'm sending this again.

From: Ken D'Ambrosio k...@jots.org
Reply-to: File's...@jots.org, checksum?@jots.org

]

On Tue, Sep 20, 2011 at 04:35:40PM +0100, Hugo Mills wrote:
 On Tue, Sep 20, 2011 at 11:24:30AM -0400, Ken D'Ambrosio wrote:
  Just wondering if/how one goes about getting the btrfs checksum of a given
  file.  Is there a way?
 
Checksums are computed on individual 4k blocks, not on the whole
 file. There's no explicit interface for retrieving checksums, but if
 you understand the data structures, you can get hold of the checksums
 for a file using the BTRFS_IOC_TREE_SEARCH ioctl.
 
Hugo.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- How deep will this sub go? Oh,  she'll go all the way to ---   
the bottom if we don't stop her.


signature.asc
Description: Digital signature


Re: [GIT PULL] Btrfs fixes

2011-09-20 Thread Sage Weil
Hi Chris-

This pull misses the clone reservation fix again... :)

http://www.spinics.net/lists/linux-btrfs/msg11826.html

Thanks!
sage



On Mon, 19 Sep 2011, Chris Mason wrote:

 Hi everyone,
 
 The for-linus branch of the btrfs tree on github:
 
 Head commit: a66e7cc626f42de6c745963fe0d807518fa49d39
 git://github.com/chrismason/linux.git for-linus
 
 Has the following fixes.  for-linus is against rc6, since some of these
 are regression fixes for earlier 3.1 btrfs commits.  The most important
 of the bunch is Josef's dentry fix, which avoids enoents if we race with
 multiple procs hitting on the same inode.  This bug is btrfs-specific,
 it came in with his optimization to cache the inode location during
 readdir.
 
 Li Zefan (3) commits (+9/-5):
 Btrfs: don't make a file partly checksummed through file clone (+5/-0)
 Btrfs: don't change inode flag of the dest clone file (+0/-1)
 Btrfs: fix pages truncation in btrfs_ioctl_clone() (+4/-4)
 
 Josef Bacik (1) commits (+11/-2):
 Btrfs: only clear the need lookup flag after the dentry is setup
 
 Jeff Liu (1) commits (+7/-2):
 BTRFS: Fix lseek return value for error
 
 Hidetoshi Seto (1) commits (+3/-2):
 btrfs: fix d_off in the first dirent
 
 Total: (6) commits (+30/-11)
 
  fs/btrfs/file.c  |9 +++--
  fs/btrfs/inode.c |   18 ++
  fs/btrfs/ioctl.c |   14 +-
  3 files changed, 30 insertions(+), 11 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/4] mm: writeback: distribute write pages across allowable zones

2011-09-20 Thread Rik van Riel

On 09/20/2011 09:45 AM, Johannes Weiner wrote:

This patch allows allocators to pass __GFP_WRITE when they know in
advance that the allocated page will be written to and become dirty
soon.  The page allocator will then attempt to distribute those
allocations across zones, such that no single zone will end up full of
dirty, and thus more or less, unreclaimable pages.

The global dirty limits are put in proportion to the respective zone's
amount of dirtyable memory and allocations diverted to other zones
when the limit is reached.

For now, the problem remains for NUMA configurations where the zones
allowed for allocation are in sum not big enough to trigger the global
dirty limits, but a future approach to solve this can reuse the
per-zone dirty limit infrastructure laid out in this patch to have
dirty throttling and the flusher threads consider individual zones.

Signed-off-by: Johannes Weinerjwei...@redhat.com


Reviewed-by: Rik van Riel r...@redhat.com

The amount of work done in a __GFP_WRITE allocation looks
a little daunting, but doing that a million times probably
outweighs waiting on the disk even once, so...
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()

2011-09-20 Thread Rik van Riel

On 09/20/2011 10:25 AM, Christoph Hellwig wrote:

In addition to regular write shouldn't __do_fault and do_wp_page also
calls this if they are called on file backed mappings?



Probably not do_wp_page since it always creates an
anonymous page, which are not very relevant to the
dirty page cache accounting.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()

2011-09-20 Thread Christoph Hellwig
On Tue, Sep 20, 2011 at 02:38:03PM -0400, Rik van Riel wrote:
 On 09/20/2011 10:25 AM, Christoph Hellwig wrote:
 In addition to regular write shouldn't __do_fault and do_wp_page also
 calls this if they are called on file backed mappings?
 
 
 Probably not do_wp_page since it always creates an
 anonymous page, which are not very relevant to the
 dirty page cache accounting.

Well, it doesn't always - but for the case where it doesn't we
do not allocate a new page at all so you're right in the end :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] Btrfs: pass __GFP_WRITE for buffered write page allocations

2011-09-20 Thread Rik van Riel

On 09/20/2011 09:45 AM, Johannes Weiner wrote:

Tell the page allocator that pages allocated for a buffered write are
expected to become dirty soon.

Signed-off-by: Johannes Weinerjwei...@redhat.com


Reviewed-by: Rik van Riel r...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] missed btrfs fix

2011-09-20 Thread Chris Mason
Sage mentioned I was missing a patch.  So I've retested and updated the
git tree.  Since Linus did pull my tree yesterday, here's a new pull
request with the single commit.

Linus I have this in two flavors.  One is merged on top of my for-linus
branch, which was 3.1-rc6 + my last pull request:

head: 0a7a0519d1789f3a222849421dbe91b6bddb88f5
git://github.com/chrismason/linux.git for-linus

Second is just against the btrfs-3.0 tree.  I have the two branches just
so the N-1 world can update to the latest fixes without running the rest
of the rc kernel.  I know in the git universe these are all the same,
but I'm assuming you'll want to skip my merge commit:

head: b6f3409b2197e8fcedb43e6600e37b7cfbe0715b
git://github.com/chrismason/linux.git btrfs-3.0

Sage Weil (1) commits (+6/-1):
Btrfs: reserve sufficient space for ioctl clone

 fs/btrfs/ioctl.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:2299

2011-09-20 Thread Maciej Marcin Piechotka
On Mon, 2011-09-19 at 02:44 +0200, Maciej Marcin Piechotka wrote:
 On Tue, 2011-08-30 at 14:27 +0800, Miao Xie wrote:
   
   Unfortunately it results in freeze of system and I cannot give more
   details. Sometimes it happens not from fcron but then it does not result
   in freeze (???).
  
  Could you give me the method to reproduce it?
  
  Thanks
  Miao
 
 Sorry for spamming in this thread but I'm trying to post my findings in
 hope that somebody will understand what's going on.
 
 Recent crash gave some valuable information IMHO:
 
  1. I started the autocompletion of path in zsh
  2. At some point the zsh hanged. In ps the process was listed as
 runnable
  3. Any access to root volume (the one that zsh was trying to readdir)
 finished in hang.
  4. I was able to access the child volume (/home)
  5. After some time the bug is hit. At this time strange things happens
 (screen freeze etc.). I guess that there is some strange interaction
 between KMS, X and now-hanged composite manager
 
 Next time it happend (also during listing root directory of volume 0) I
 observed the following thing - I can log out and unmount home but the
 volume 0 remains busy and cannot be unmounted.
 
 Things to consider:
 
  - It is not enabled/disabled by any mount option
  - Is it triggered when the parent volume (say volume 0) and child
 volume are both mounted?

I cannot reproduce it when the parent volume is not mounted (snapshots
are to subvolume)

  - Which case is it failing (I've tried to add printk but I cannot find
 the option in printk to print u64)
  - Why it happens only during night?
 
 Regards

Regards


signature.asc
Description: This is a digitally signed message part