Re: [PATCH] btrfs file write debugging patch

2011-03-07 Thread Maria Wikström
mån 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder:
 On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500:
  Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
   I've constructed a test patch that is currently addressing all the
   issues on my system.
  
   The portion of Openmotif that was having issues with page faults works
   correctly with this patch, and gcc-4.4.5 builds without issue.
  
   I extracted only the portion of the first patch that corrects the
   handling of dirty_pages when copied==0, and incorporated the second
   patch that falls back to one-page-at-a-time if there are troubles with
   page faults.
 
  Just to make sure I understand, could you please post the full combined
  path that was giving you trouble with gcc?  We do need to make sure the
  pages are properly up to date if we fall back to partial writes.
 
  Ok, I was able to reproduce this easily with fsx.  The problem is that I
  wasn't making sure the last partial page in the write was up to date
  when it was also the first page in the write.
 
  Here is the updated patch, it has all the fixes we've found so far:
 
 
 This latest patch that Chris has sent out fixes the issues I've been
 encountering.
 
 I can build gcc-4.4.5 without problems.
 
 Also, the portion of Openmotif that was having issues with page faults
 is working correctly.
 
 Let me know if you still would like to see the path names for the
 portions of the gcc-4.4.5 build that were giving me issues.  I didn't
 save that information, but I can regenerate it.  But it sounds like
 it's irrelevant now.

With the patch I can compile libgcrypt without any problem, so it solves
my problems to.

// Maria



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-07 Thread Johannes Hirte
On Monday 07 March 2011 20:56:50 Maria Wikström wrote:
 mån 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder:
  On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com 
wrote:
   Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500:
   Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
I've constructed a test patch that is currently addressing all the
issues on my system.

The portion of Openmotif that was having issues with page faults
works correctly with this patch, and gcc-4.4.5 builds without
issue.

I extracted only the portion of the first patch that corrects the
handling of dirty_pages when copied==0, and incorporated the second
patch that falls back to one-page-at-a-time if there are troubles
with page faults.
   
   Just to make sure I understand, could you please post the full
   combined path that was giving you trouble with gcc?  We do need to
   make sure the pages are properly up to date if we fall back to
   partial writes.
   
   Ok, I was able to reproduce this easily with fsx.  The problem is that
   I wasn't making sure the last partial page in the write was up to date
   when it was also the first page in the write.
  
   Here is the updated patch, it has all the fixes we've found so far:
  This latest patch that Chris has sent out fixes the issues I've been
  encountering.
  
  I can build gcc-4.4.5 without problems.
  
  Also, the portion of Openmotif that was having issues with page faults
  is working correctly.
  
  Let me know if you still would like to see the path names for the
  portions of the gcc-4.4.5 build that were giving me issues.  I didn't
  save that information, but I can regenerate it.  But it sounds like
  it's irrelevant now.
 
 With the patch I can compile libgcrypt without any problem, so it solves
 my problems to.

Can confirm this. And the bug seems to be hardware-related. On my Pentium4 
system it was 100% reproducible, on my Atom-based system I couldn't trigger 
it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-07 Thread Zhong, Xin
From my experience, only x86 32bit kernel has this problem and 64bit kernel do 
not have it. 

However, atom based system do not have it although it's installed with a 32bit 
kernel. 

-Original Message-
From: Johannes Hirte [mailto:johannes.hi...@fem.tu-ilmenau.de] 
Sent: Tuesday, March 08, 2011 6:12 AM
To: Maria Wikström
Cc: Mitch Harder; Chris Mason; Xin Zhong; Zhong, Xin; linux-btrfs
Subject: Re: [PATCH] btrfs file write debugging patch

On Monday 07 March 2011 20:56:50 Maria Wikström wrote:
 mån 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder:
  On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com 
wrote:
   Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500:
   Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
I've constructed a test patch that is currently addressing all the
issues on my system.

The portion of Openmotif that was having issues with page faults
works correctly with this patch, and gcc-4.4.5 builds without
issue.

I extracted only the portion of the first patch that corrects the
handling of dirty_pages when copied==0, and incorporated the second
patch that falls back to one-page-at-a-time if there are troubles
with page faults.
   
   Just to make sure I understand, could you please post the full
   combined path that was giving you trouble with gcc?  We do need to
   make sure the pages are properly up to date if we fall back to
   partial writes.
   
   Ok, I was able to reproduce this easily with fsx.  The problem is that
   I wasn't making sure the last partial page in the write was up to date
   when it was also the first page in the write.
  
   Here is the updated patch, it has all the fixes we've found so far:
  This latest patch that Chris has sent out fixes the issues I've been
  encountering.
  
  I can build gcc-4.4.5 without problems.
  
  Also, the portion of Openmotif that was having issues with page faults
  is working correctly.
  
  Let me know if you still would like to see the path names for the
  portions of the gcc-4.4.5 build that were giving me issues.  I didn't
  save that information, but I can regenerate it.  But it sounds like
  it's irrelevant now.
 
 With the patch I can compile libgcrypt without any problem, so it solves
 my problems to.

Can confirm this. And the bug seems to be hardware-related. On my Pentium4 
system it was 100% reproducible, on my Atom-based system I couldn't trigger 
it.
N�r��yb�X��ǧv�^�)޺{.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH] btrfs file write debugging patch

2011-03-06 Thread Chris Mason
Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
 I've constructed a test patch that is currently addressing all the
 issues on my system.
 
 The portion of Openmotif that was having issues with page faults works
 correctly with this patch, and gcc-4.4.5 builds without issue.
 
 I extracted only the portion of the first patch that corrects the
 handling of dirty_pages when copied==0, and incorporated the second
 patch that falls back to one-page-at-a-time if there are troubles with
 page faults.

Just to make sure I understand, could you please post the full combined
path that was giving you trouble with gcc?  We do need to make sure the
pages are properly up to date if we fall back to partial writes.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-06 Thread Chris Mason
Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500:
 Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
  I've constructed a test patch that is currently addressing all the
  issues on my system.
  
  The portion of Openmotif that was having issues with page faults works
  correctly with this patch, and gcc-4.4.5 builds without issue.
  
  I extracted only the portion of the first patch that corrects the
  handling of dirty_pages when copied==0, and incorporated the second
  patch that falls back to one-page-at-a-time if there are troubles with
  page faults.
 
 Just to make sure I understand, could you please post the full combined
 path that was giving you trouble with gcc?  We do need to make sure the
 pages are properly up to date if we fall back to partial writes.

Ok, I was able to reproduce this easily with fsx.  The problem is that I
wasn't making sure the last partial page in the write was up to date
when it was also the first page in the write.

Here is the updated patch, it has all the fixes we've found so far:

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7084140..5986ac7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -763,6 +763,27 @@ out:
 }
 
 /*
+ * on error we return an unlocked page and the error value
+ * on success we return a locked page and 0
+ */
+static int prepare_uptodate_page(struct page *page, u64 pos)
+{
+   int ret = 0;
+
+   if ((pos  (PAGE_CACHE_SIZE - 1))  !PageUptodate(page)) {
+   ret = btrfs_readpage(NULL, page);
+   if (ret)
+   return ret;
+   lock_page(page);
+   if (!PageUptodate(page)) {
+   unlock_page(page);
+   return -EIO;
+   }
+   }
+   return 0;
+}
+
+/*
  * this gets pages into the page cache and locks them down, it also properly
  * waits for data=ordered extents to finish before allowing the pages to be
  * modified.
@@ -777,6 +798,7 @@ static noinline int prepare_pages(struct btrfs_root *root, 
struct file *file,
unsigned long index = pos  PAGE_CACHE_SHIFT;
struct inode *inode = fdentry(file)-d_inode;
int err = 0;
+   int faili = 0;
u64 start_pos;
u64 last_pos;
 
@@ -794,15 +816,24 @@ again:
for (i = 0; i  num_pages; i++) {
pages[i] = grab_cache_page(inode-i_mapping, index + i);
if (!pages[i]) {
-   int c;
-   for (c = i - 1; c = 0; c--) {
-   unlock_page(pages[c]);
-   page_cache_release(pages[c]);
-   }
-   return -ENOMEM;
+   faili = i - 1;
+   err = -ENOMEM;
+   goto fail;
+   }
+
+   if (i == 0)
+   err = prepare_uptodate_page(pages[i], pos);
+   if (i == num_pages - 1)
+   err = prepare_uptodate_page(pages[i],
+   pos + write_bytes);
+   if (err) {
+   page_cache_release(pages[i]);
+   faili = i - 1;
+   goto fail;
}
wait_on_page_writeback(pages[i]);
}
+   err = 0;
if (start_pos  inode-i_size) {
struct btrfs_ordered_extent *ordered;
lock_extent_bits(BTRFS_I(inode)-io_tree,
@@ -842,6 +873,14 @@ again:
WARN_ON(!PageLocked(pages[i]));
}
return 0;
+fail:
+   while (faili = 0) {
+   unlock_page(pages[faili]);
+   page_cache_release(pages[faili]);
+   faili--;
+   }
+   return err;
+
 }
 
 static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
@@ -851,7 +890,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
struct file *file = iocb-ki_filp;
struct inode *inode = fdentry(file)-d_inode;
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct page *pinned[2];
struct page **pages = NULL;
struct iov_iter i;
loff_t *ppos = iocb-ki_pos;
@@ -872,9 +910,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
will_write = ((file-f_flags  O_DSYNC) || IS_SYNC(inode) ||
  (file-f_flags  O_DIRECT));
 
-   pinned[0] = NULL;
-   pinned[1] = NULL;
-
start_pos = pos;
 
vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE);
@@ -962,32 +997,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
first_index = pos  PAGE_CACHE_SHIFT;
last_index = (pos + iov_iter_count(i))  PAGE_CACHE_SHIFT;
 
-   /*
-* there are lots of better ways to do this, but this code
-* makes sure the first and last page in the file range are
-* up to date and ready for cow
-*/
-   if ((pos  (PAGE_CACHE_SIZE - 1))) {
-   

Re: [PATCH] btrfs file write debugging patch

2011-03-06 Thread Mitch Harder
On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500:
 Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
  I've constructed a test patch that is currently addressing all the
  issues on my system.
 
  The portion of Openmotif that was having issues with page faults works
  correctly with this patch, and gcc-4.4.5 builds without issue.
 
  I extracted only the portion of the first patch that corrects the
  handling of dirty_pages when copied==0, and incorporated the second
  patch that falls back to one-page-at-a-time if there are troubles with
  page faults.

 Just to make sure I understand, could you please post the full combined
 path that was giving you trouble with gcc?  We do need to make sure the
 pages are properly up to date if we fall back to partial writes.

 Ok, I was able to reproduce this easily with fsx.  The problem is that I
 wasn't making sure the last partial page in the write was up to date
 when it was also the first page in the write.

 Here is the updated patch, it has all the fixes we've found so far:


This latest patch that Chris has sent out fixes the issues I've been
encountering.

I can build gcc-4.4.5 without problems.

Also, the portion of Openmotif that was having issues with page faults
is working correctly.

Let me know if you still would like to see the path names for the
portions of the gcc-4.4.5 build that were giving me issues.  I didn't
save that information, but I can regenerate it.  But it sounds like
it's irrelevant now.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-06 Thread Zhong, Xin
That's great! :-)

-Original Message-
From: Mitch Harder [mailto:mitch.har...@sabayonlinux.org] 
Sent: Monday, March 07, 2011 2:08 PM
To: Chris Mason
Cc: Xin Zhong; Zhong, Xin; linux-btrfs
Subject: Re: [PATCH] btrfs file write debugging patch

On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500:
 Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500:
  I've constructed a test patch that is currently addressing all the
  issues on my system.
 
  The portion of Openmotif that was having issues with page faults works
  correctly with this patch, and gcc-4.4.5 builds without issue.
 
  I extracted only the portion of the first patch that corrects the
  handling of dirty_pages when copied==0, and incorporated the second
  patch that falls back to one-page-at-a-time if there are troubles with
  page faults.

 Just to make sure I understand, could you please post the full combined
 path that was giving you trouble with gcc?  We do need to make sure the
 pages are properly up to date if we fall back to partial writes.

 Ok, I was able to reproduce this easily with fsx.  The problem is that I
 wasn't making sure the last partial page in the write was up to date
 when it was also the first page in the write.

 Here is the updated patch, it has all the fixes we've found so far:


This latest patch that Chris has sent out fixes the issues I've been
encountering.

I can build gcc-4.4.5 without problems.

Also, the portion of Openmotif that was having issues with page faults
is working correctly.

Let me know if you still would like to see the path names for the
portions of the gcc-4.4.5 build that were giving me issues.  I didn't
save that information, but I can regenerate it.  But it sounds like
it's irrelevant now.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-05 Thread Mitch Harder
2011/3/4 Xin Zhong thierryzh...@hotmail.com:

 So it works well now with the two patches from Chris on your system. Am I 
 right?


No.

I am getting errors building gcc-4.4.5 with the two patches from Chris.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-05 Thread Mitch Harder
I've constructed a test patch that is currently addressing all the
issues on my system.

The portion of Openmotif that was having issues with page faults works
correctly with this patch, and gcc-4.4.5 builds without issue.

I extracted only the portion of the first patch that corrects the
handling of dirty_pages when copied==0, and incorporated the second
patch that falls back to one-page-at-a-time if there are troubles with
page faults.

--- fs/btrfs/file.c 2011-03-05 07:34:43.025131607 -0600
+++ /usr/src/linux/fs/btrfs/file.c  2011-03-05 07:41:45.001260294 -0600
@@ -1023,8 +1023,20 @@

copied = btrfs_copy_from_user(pos, num_pages,
   write_bytes, pages, i);
-   dirty_pages = (copied + offset + PAGE_CACHE_SIZE - 1) 
-   PAGE_CACHE_SHIFT;
+
+   /*
+* if we have trouble faulting in the pages, fall
+* back to one page at a time
+*/
+   if (copied  write_bytes)
+   nrptrs = 1;
+
+   if (copied == 0)
+   dirty_pages = 0;
+   else
+   dirty_pages = (copied + offset +
+  PAGE_CACHE_SIZE - 1) 
+  PAGE_CACHE_SHIFT;

if (num_pages  dirty_pages) {
if (copied  0)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-05 Thread Mitch Harder
On Thu, Mar 3, 2011 at 8:41 PM, Josef Bacik jo...@redhat.com wrote:
 On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote:
 Where can I find your patch? Thanks!


 It's in my btrfs-work git tree, it's based on the latest git pull from linus 
 so
 you can just pull it onto a linus tree and you should be good to go.  The
 specific patch is

 Btrfs: simplify our write path

 Thanks,

 Josef


Josef:

I've been testing the kernel from you git tree (as of commit f192
Btrfs: add a comment explaining what btrfs_cont_expand does).

I am still running into issues when running the portion of Openmotif
that has been generating the problem with page faults when given an
unaligned address.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-05 Thread Xin Zhong

I think Josef's patch only address the one-by-one page processing issue. 
But do not address the issue that dirty_pages should be set to 0 when copied is 
0. 


 Date: Sat, 5 Mar 2011 10:56:52 -0600
 Subject: Re: [PATCH] btrfs file write debugging patch
 From: mitch.har...@sabayonlinux.org
 To: jo...@redhat.com
 CC: xin.zh...@intel.com; chris.ma...@oracle.com; thierryzh...@hotmail.com; 
 linux-btrfs@vger.kernel.org

 On Thu, Mar 3, 2011 at 8:41 PM, Josef Bacik  wrote:
  On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote:
  Where can I find your patch? Thanks!
 
 
  It's in my btrfs-work git tree, it's based on the latest git pull from 
  linus so
  you can just pull it onto a linus tree and you should be good to go.  The
  specific patch is
 
  Btrfs: simplify our write path
 
  Thanks,
 
  Josef
 

 Josef:

 I've been testing the kernel from you git tree (as of commit f192
 Btrfs: add a comment explaining what btrfs_cont_expand does).

 I am still running into issues when running the portion of Openmotif
 that has been generating the problem with page faults when given an
 unaligned address.
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-04 Thread Zhong, Xin
Looks good to me. Thanks! 

Another mysterious thing is that this problem can only be recreated on x86 
32bit system. I can not recreate it on x86_64 system using my test case. Any 
one have any idea about it?

-Original Message-
From: Josef Bacik [mailto:jo...@redhat.com] 
Sent: Friday, March 04, 2011 10:41 AM
To: Zhong, Xin
Cc: Josef Bacik; Chris Mason; Mitch Harder; Xin Zhong; 
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs file write debugging patch

On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote:
 Where can I find your patch? Thanks!


It's in my btrfs-work git tree, it's based on the latest git pull from linus so
you can just pull it onto a linus tree and you should be good to go.  The
specific patch is

Btrfs: simplify our write path

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-04 Thread Chris Mason
Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500:
 Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
  It seems that if we give an unaligned address to btrfs write and the buffer 
  reside on more than 2 pages. It will trigger this bug.
  If we give an aligned address to btrfs write, it works well no matter how 
  many pages are given. 
  
  I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
  trigger pagefault handling when the address is not aligned. I do not quite 
  understand the reason behind it. But the solution should be to process the 
  page one by one. And that's also what generic file write routine does. 
  
  Any suggestion are welcomed. Thanks!
 
 Great job guys.  I'm using this on top of my debugging patch.  It passes
 the unaligned test but I'll give it a real run tonight and look for
 other problems.
 
 (This is almost entirely untested, please don't use it quite yet)

 
 -chris
 
 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index 89a6a26..6a44add 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
  
  copied = btrfs_copy_from_user(pos, num_pages,
 write_bytes, pages, i);
 +
 +/*
 + * if we have trouble faulting in the pages, fall
 + * back to one page at a time
 + */
 +if (copied  write_bytes)
 +nrptrs = 1;
 +
  if (copied == 0)
  dirty_pages = 0;
  else

Ok, this is working well for me.  Anyone see any problems with it?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-04 Thread Xin Zhong

It works well for me too. 


 From: chris.ma...@oracle.com
 To: chris.ma...@oracle.com
 CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; 
 thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org
 Subject: RE: [PATCH] btrfs file write debugging patch
 Date: Fri, 4 Mar 2011 07:19:39 -0500

 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500:
  Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
   It seems that if we give an unaligned address to btrfs write and the 
   buffer reside on more than 2 pages. It will trigger this bug.
   If we give an aligned address to btrfs write, it works well no matter how 
   many pages are given.
  
   I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
   trigger pagefault handling when the address is not aligned. I do not 
   quite understand the reason behind it. But the solution should be to 
   process the page one by one. And that's also what generic file write 
   routine does.
  
   Any suggestion are welcomed. Thanks!
 
  Great job guys. I'm using this on top of my debugging patch. It passes
  the unaligned test but I'll give it a real run tonight and look for
  other problems.
 
  (This is almost entirely untested, please don't use it quite yet)

 
  -chris
 
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index 89a6a26..6a44add 100644
  --- a/fs/btrfs/file.c
  +++ b/fs/btrfs/file.c
  @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
  *iocb,
 
  copied = btrfs_copy_from_user(pos, num_pages,
  write_bytes, pages, i);
  +
  + /*
  + * if we have trouble faulting in the pages, fall
  + * back to one page at a time
  + */
  + if (copied  write_bytes)
  + nrptrs = 1;
  +
  if (copied == 0)
  dirty_pages = 0;
  else

 Ok, this is working well for me. Anyone see any problems with it?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-04 Thread Mitch Harder
2011/3/4 Xin Zhong thierryzh...@hotmail.com:

 It works well for me too.

 
 From: chris.ma...@oracle.com
 To: chris.ma...@oracle.com
 CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; 
 thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org
 Subject: RE: [PATCH] btrfs file write debugging patch
 Date: Fri, 4 Mar 2011 07:19:39 -0500

 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500:
  Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
   It seems that if we give an unaligned address to btrfs write and the 
   buffer reside on more than 2 pages. It will trigger this bug.
   If we give an aligned address to btrfs write, it works well no matter 
   how many pages are given.
  
   I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
   trigger pagefault handling when the address is not aligned. I do not 
   quite understand the reason behind it. But the solution should be to 
   process the page one by one. And that's also what generic file write 
   routine does.
  
   Any suggestion are welcomed. Thanks!
 
  Great job guys. I'm using this on top of my debugging patch. It passes
  the unaligned test but I'll give it a real run tonight and look for
  other problems.
 
  (This is almost entirely untested, please don't use it quite yet)

 
  -chris
 
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index 89a6a26..6a44add 100644
  --- a/fs/btrfs/file.c
  +++ b/fs/btrfs/file.c
  @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
  *iocb,
 
  copied = btrfs_copy_from_user(pos, num_pages,
  write_bytes, pages, i);
  +
  + /*
  + * if we have trouble faulting in the pages, fall
  + * back to one page at a time
  + */
  + if (copied  write_bytes)
  + nrptrs = 1;
  +
  if (copied == 0)
  dirty_pages = 0;
  else

 Ok, this is working well for me. Anyone see any problems with it?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html



I've applied this patch on top of the debugging patch at the head of
the thread, and I'm having trouble building gcc now.

When building gcc-4.4.5, I get errors like the following:

Comparing stages 2 and 3
Bootstrap comparison failure!
./cp/call.o differs
./cp/decl.o differs
./cp/pt.o differs
./cp/class.o differs
./cp/decl2.o differs
snip.
./matrix-reorg.o differs
./tree-inline.o differs
./gcc.o differs
./gcc-options.o differs
make[2]: *** [compare] Error 1
make[1]: *** [stage3-bubble] Error 2
make: *** [bootstrap-lean] Error 2
emake failed

I've went back and rebuilt my kernel without these two debugging
patches, and gcc-4.4.5 builds without error on that kernel.

I haven't yet tested building gcc-4.4.5 with just the debugging patch
at the head of the thread, so I'll test that, and report back.

But I was wondering if anybody else can replicate this issue.

BTW, I've been doing most of my testing on an x86 system.  My x86_64
systems haven't had as much trouble, but I haven't been robustingly
checking my x86_64 systems for these issues.

I noticed that page fault handling is different by architecture.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-04 Thread Mitch Harder
On Fri, Mar 4, 2011 at 9:33 AM, Mitch Harder
mitch.har...@sabayonlinux.org wrote:
 2011/3/4 Xin Zhong thierryzh...@hotmail.com:

 It works well for me too.

 
 From: chris.ma...@oracle.com
 To: chris.ma...@oracle.com
 CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; 
 thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org
 Subject: RE: [PATCH] btrfs file write debugging patch
 Date: Fri, 4 Mar 2011 07:19:39 -0500

 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500:
  Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
   It seems that if we give an unaligned address to btrfs write and the 
   buffer reside on more than 2 pages. It will trigger this bug.
   If we give an aligned address to btrfs write, it works well no matter 
   how many pages are given.
  
   I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
   trigger pagefault handling when the address is not aligned. I do not 
   quite understand the reason behind it. But the solution should be to 
   process the page one by one. And that's also what generic file write 
   routine does.
  
   Any suggestion are welcomed. Thanks!
 
  Great job guys. I'm using this on top of my debugging patch. It passes
  the unaligned test but I'll give it a real run tonight and look for
  other problems.
 
  (This is almost entirely untested, please don't use it quite yet)

 
  -chris
 
  diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
  index 89a6a26..6a44add 100644
  --- a/fs/btrfs/file.c
  +++ b/fs/btrfs/file.c
  @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
  *iocb,
 
  copied = btrfs_copy_from_user(pos, num_pages,
  write_bytes, pages, i);
  +
  + /*
  + * if we have trouble faulting in the pages, fall
  + * back to one page at a time
  + */
  + if (copied  write_bytes)
  + nrptrs = 1;
  +
  if (copied == 0)
  dirty_pages = 0;
  else

 Ok, this is working well for me. Anyone see any problems with it?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html



 I've applied this patch on top of the debugging patch at the head of
 the thread, and I'm having trouble building gcc now.

 When building gcc-4.4.5, I get errors like the following:

 Comparing stages 2 and 3
 Bootstrap comparison failure!
 ./cp/call.o differs
 ./cp/decl.o differs
 ./cp/pt.o differs
 ./cp/class.o differs
 ./cp/decl2.o differs
 snip.
 ./matrix-reorg.o differs
 ./tree-inline.o differs
 ./gcc.o differs
 ./gcc-options.o differs
 make[2]: *** [compare] Error 1
 make[1]: *** [stage3-bubble] Error 2
 make: *** [bootstrap-lean] Error 2
 emake failed

 I've went back and rebuilt my kernel without these two debugging
 patches, and gcc-4.4.5 builds without error on that kernel.

 I haven't yet tested building gcc-4.4.5 with just the debugging patch
 at the head of the thread, so I'll test that, and report back.

 But I was wondering if anybody else can replicate this issue.

 BTW, I've been doing most of my testing on an x86 system.  My x86_64
 systems haven't had as much trouble, but I haven't been robustingly
 checking my x86_64 systems for these issues.

 I noticed that page fault handling is different by architecture.


Some followup...

I'm encountering this issue with Bootstrap comparison failure! in a
gcc-4.4.5 build when only the patch at the head of the thread is
applied (leaving the recent patch to limit pages to one-by-one on page
fault out).

I just hadn't run across this issue until I started playing with
patches to limit the pages to one-by-one on page fault errors.

So it may not be associated with the last patch.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-04 Thread Xin Zhong

So it works well now with the two patches from Chris on your system. Am I right?


 Date: Fri, 4 Mar 2011 11:21:36 -0600
 Subject: Re: [PATCH] btrfs file write debugging patch
 From: mitch.har...@sabayonlinux.org
 To: thierryzh...@hotmail.com
 CC: chris.ma...@oracle.com; xin.zh...@intel.com; linux-btrfs@vger.kernel.org

 On Fri, Mar 4, 2011 at 9:33 AM, Mitch Harder
  wrote:
  2011/3/4 Xin Zhong :
 
  It works well for me too.
 
  
  From: chris.ma...@oracle.com
  To: chris.ma...@oracle.com
  CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; 
  thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org
  Subject: RE: [PATCH] btrfs file write debugging patch
  Date: Fri, 4 Mar 2011 07:19:39 -0500
 
  Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500:
   Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
It seems that if we give an unaligned address to btrfs write and the 
buffer reside on more than 2 pages. It will trigger this bug.
If we give an aligned address to btrfs write, it works well no matter 
how many pages are given.
   
I use ftrace to observe it. It seems iov_iter_fault_in_readable do 
not trigger pagefault handling when the address is not aligned. I do 
not quite understand the reason behind it. But the solution should be 
to process the page one by one. And that's also what generic file 
write routine does.
   
Any suggestion are welcomed. Thanks!
  
   Great job guys. I'm using this on top of my debugging patch. It passes
   the unaligned test but I'll give it a real run tonight and look for
   other problems.
  
   (This is almost entirely untested, please don't use it quite yet)
 
  
   -chris
  
   diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
   index 89a6a26..6a44add 100644
   --- a/fs/btrfs/file.c
   +++ b/fs/btrfs/file.c
   @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
   *iocb,
  
   copied = btrfs_copy_from_user(pos, num_pages,
   write_bytes, pages, i);
   +
   + /*
   + * if we have trouble faulting in the pages, fall
   + * back to one page at a time
   + */
   + if (copied  write_bytes)
   + nrptrs = 1;
   +
   if (copied == 0)
   dirty_pages = 0;
   else
 
  Ok, this is working well for me. Anyone see any problems with it?
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at http://vger.kernel.org/majordomo-info.html
 
 
 
  I've applied this patch on top of the debugging patch at the head of
  the thread, and I'm having trouble building gcc now.
 
  When building gcc-4.4.5, I get errors like the following:
 
  Comparing stages 2 and 3
  Bootstrap comparison failure!
  ./cp/call.o differs
  ./cp/decl.o differs
  ./cp/pt.o differs
  ./cp/class.o differs
  ./cp/decl2.o differs
  snip.
  ./matrix-reorg.o differs
  ./tree-inline.o differs
  ./gcc.o differs
  ./gcc-options.o differs
  make[2]: *** [compare] Error 1
  make[1]: *** [stage3-bubble] Error 2
  make: *** [bootstrap-lean] Error 2
  emake failed
 
  I've went back and rebuilt my kernel without these two debugging
  patches, and gcc-4.4.5 builds without error on that kernel.
 
  I haven't yet tested building gcc-4.4.5 with just the debugging patch
  at the head of the thread, so I'll test that, and report back.
 
  But I was wondering if anybody else can replicate this issue.
 
  BTW, I've been doing most of my testing on an x86 system.  My x86_64
  systems haven't had as much trouble, but I haven't been robustingly
  checking my x86_64 systems for these issues.
 
  I noticed that page fault handling is different by architecture.
 

 Some followup...

 I'm encountering this issue with Bootstrap comparison failure! in a
 gcc-4.4.5 build when only the patch at the head of the thread is
 applied (leaving the recent patch to limit pages to one-by-one on page
 fault out).

 I just hadn't run across this issue until I started playing with
 patches to limit the pages to one-by-one on page fault errors.

 So it may not be associated with the last patch.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-03 Thread Chris Mason
Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
 I downloaded openmotif and run the command as Mitch mentioned and was able to 
 recreate the problem locally. And I managed to simplify the command into a 
 very simple program which can capture the problem easily. See below code:
 
 #include sys/types.h
 #include sys/stat.h
 #include fcntl.h
 static char a[4096*3];
 int main()
 {
 int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
 write(fd,a+1, 4096*2);
 exit(0);
 }
 
 It seems that if we give an unaligned address to btrfs write and the buffer 
 reside on more than 2 pages. It will trigger this bug.
 If we give an aligned address to btrfs write, it works well no matter how 
 many pages are given. 
 
 I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
 trigger pagefault handling when the address is not aligned. I do not quite 
 understand the reason behind it. But the solution should be to process the 
 page one by one. And that's also what generic file write routine does. 
 
 Any suggestion are welcomed. Thanks!

Great job guys.  I'm using this on top of my debugging patch.  It passes
the unaligned test but I'll give it a real run tonight and look for
other problems.

(This is almost entirely untested, please don't use it quite yet)

-chris

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 89a6a26..6a44add 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 
copied = btrfs_copy_from_user(pos, num_pages,
   write_bytes, pages, i);
+
+   /*
+* if we have trouble faulting in the pages, fall
+* back to one page at a time
+*/
+   if (copied  write_bytes)
+   nrptrs = 1;
+
if (copied == 0)
dirty_pages = 0;
else
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-03 Thread Josef Bacik
On Thu, Mar 03, 2011 at 08:51:55PM -0500, Chris Mason wrote:
 Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
  I downloaded openmotif and run the command as Mitch mentioned and was able 
  to recreate the problem locally. And I managed to simplify the command into 
  a very simple program which can capture the problem easily. See below code:
  
  #include sys/types.h
  #include sys/stat.h
  #include fcntl.h
  static char a[4096*3];
  int main()
  {
  int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
  write(fd,a+1, 4096*2);
  exit(0);
  }
  
  It seems that if we give an unaligned address to btrfs write and the buffer 
  reside on more than 2 pages. It will trigger this bug.
  If we give an aligned address to btrfs write, it works well no matter how 
  many pages are given. 
  
  I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
  trigger pagefault handling when the address is not aligned. I do not quite 
  understand the reason behind it. But the solution should be to process the 
  page one by one. And that's also what generic file write routine does. 
  
  Any suggestion are welcomed. Thanks!
 
 Great job guys.  I'm using this on top of my debugging patch.  It passes
 the unaligned test but I'll give it a real run tonight and look for
 other problems.
 
 (This is almost entirely untested, please don't use it quite yet)
 
 -chris
 
 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index 89a6a26..6a44add 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
  
   copied = btrfs_copy_from_user(pos, num_pages,
  write_bytes, pages, i);
 +
 + /*
 +  * if we have trouble faulting in the pages, fall
 +  * back to one page at a time
 +  */
 + if (copied  write_bytes)
 + nrptrs = 1;
 +
   if (copied == 0)
   dirty_pages = 0;
   else

Btw this situation is taken care of in my write path rewrite patch, if copied ==
0 we switch to one segment at a time.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-03 Thread Zhong, Xin
Where can I find your patch? Thanks!

-Original Message-
From: Josef Bacik [mailto:jo...@redhat.com] 
Sent: Friday, March 04, 2011 10:32 AM
To: Chris Mason
Cc: Zhong, Xin; Mitch Harder; Xin Zhong; linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs file write debugging patch

On Thu, Mar 03, 2011 at 08:51:55PM -0500, Chris Mason wrote:
 Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500:
  I downloaded openmotif and run the command as Mitch mentioned and was able 
  to recreate the problem locally. And I managed to simplify the command into 
  a very simple program which can capture the problem easily. See below code:
  
  #include sys/types.h
  #include sys/stat.h
  #include fcntl.h
  static char a[4096*3];
  int main()
  {
  int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
  write(fd,a+1, 4096*2);
  exit(0);
  }
  
  It seems that if we give an unaligned address to btrfs write and the buffer 
  reside on more than 2 pages. It will trigger this bug.
  If we give an aligned address to btrfs write, it works well no matter how 
  many pages are given. 
  
  I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
  trigger pagefault handling when the address is not aligned. I do not quite 
  understand the reason behind it. But the solution should be to process the 
  page one by one. And that's also what generic file write routine does. 
  
  Any suggestion are welcomed. Thanks!
 
 Great job guys.  I'm using this on top of my debugging patch.  It passes
 the unaligned test but I'll give it a real run tonight and look for
 other problems.
 
 (This is almost entirely untested, please don't use it quite yet)
 
 -chris
 
 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index 89a6a26..6a44add 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
  
   copied = btrfs_copy_from_user(pos, num_pages,
  write_bytes, pages, i);
 +
 + /*
 +  * if we have trouble faulting in the pages, fall
 +  * back to one page at a time
 +  */
 + if (copied  write_bytes)
 + nrptrs = 1;
 +
   if (copied == 0)
   dirty_pages = 0;
   else

Btw this situation is taken care of in my write path rewrite patch, if copied ==
0 we switch to one segment at a time.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-03 Thread Josef Bacik
On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote:
 Where can I find your patch? Thanks!


It's in my btrfs-work git tree, it's based on the latest git pull from linus so
you can just pull it onto a linus tree and you should be good to go.  The
specific patch is

Btrfs: simplify our write path

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-02 Thread Xin Zhong

Sorry, I forgot to mention that you need to undo below commit in btrfs-unstable 
to recreate the problem:
Btrfs: fix fiemap bugs with delalloc (+224/-42)
Otherwise, it will run into enospc error. I am not sure if it's the same 
problem.


 From: xin.zh...@intel.com
 To: mitch.har...@sabayonlinux.org; thierryzh...@hotmail.com
 CC: linux-btrfs@vger.kernel.org
 Date: Wed, 2 Mar 2011 18:58:49 +0800
 Subject: RE: [PATCH] btrfs file write debugging patch

 I downloaded openmotif and run the command as Mitch mentioned and was able to 
 recreate the problem locally. And I managed to simplify the command into a 
 very simple program which can capture the problem easily. See below code:

 #include 
 #include 
 #include 
 static char a[4096*3];
 int main()
 {
 int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
 write(fd,a+1, 4096*2);
 exit(0);
 }

 It seems that if we give an unaligned address to btrfs write and the buffer 
 reside on more than 2 pages. It will trigger this bug.
 If we give an aligned address to btrfs write, it works well no matter how 
 many pages are given.

 I use ftrace to observe it. It seems iov_iter_fault_in_readable do not 
 trigger pagefault handling when the address is not aligned. I do not quite 
 understand the reason behind it. But the solution should be to process the 
 page one by one. And that's also what generic file write routine does.

 Any suggestion are welcomed. Thanks!

 -Original Message-
 From: linux-btrfs-ow...@vger.kernel.org 
 [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder
 Sent: Wednesday, March 02, 2011 5:09 AM
 To: Xin Zhong
 Cc: linux-btrfs@vger.kernel.org
 Subject: Re: [PATCH] btrfs file write debugging patch

 2011/3/1 Xin Zhong :
 
  Hi, Mitch
  I think you can config ftrace to just trace function calls of btrfs.ko 
  which will save a lot of trace buffer space. See below command:
  #echo ':mod:btrfs'  /sys/kernel/debug/tracing/set_ftrace_filterAnd please 
  send out the full ftrace log again.
 
  Another helpful information might be the strace log of the wmldbcreate 
  process. It will show us the io pattern of this command.
  Thanks a lot for your help!
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-btrfs
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at  http://vger.kernel.org/majordomo-info.html
 

 I manually ran an strace around the build command (wmldbcreate) that is 
 causing my problem, and I am attaching the strace for that.

 Please note that wmldbcreate does not seem to care when an error is returned, 
 and continues on. So the error is occurring somewhat silently in the middle, 
 and isn't the last item. The error is probably associated with one of the 
 12288 byte writes.

 I have re-run an ftrace following the conditions above, and have hosted that 
 file (~1.1MB compressed) on my local server at:

 http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz

 Please note I am still using some debugging modifications of my own to file.c.

 They server the purpose of:
 (1) Avoiding an infinite loop by identifying when the problem is occuring, 
 and exiting with error after 256 loops.
 (2) Stopping the trace after exiting to keep from flooding the ftrace buffer.
 (3) Provide debugging comments (all prefaced with TPK: in the trace).

 Let me know if you want me to change any of the conditions.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Zhong, Xin
Is your system running out of memory or is there any other thread like 
flush-btrfs competing for the same page?

I can only see one process in your ftrace log. You may need to trace all 
btrfs.ko function calls instead of a single process. Thanks!

-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder
Sent: Tuesday, March 01, 2011 4:20 AM
To: Maria Wikström
Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; 
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs file write debugging patch

2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now 
  the same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with 
  no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





 I've applied the patch at the head of this thread (with the jiffies
 debugging commented out) and I'm attaching a ftrace using the
 function_graph tracer when I'm stuck in the loop.  I've just snipped
 out a couple of the loops (the full trace file is quite large, and
 mostly repititious).

 I'm going to try to modify file.c with some trace_printk debugging to
 show the values of several of the relevant variables at various
 stages.

 I'm going to try to exit the loop after 256 tries with an EFAULT so I
 can stop the tracing at that point and capture a trace of the entry
 into the problem (the ftrace ring buffer fills up too fast for me to
 capture the entry point).


As promised, I'm put together a modified file.c with many trace_printk
debugging statements to augment the ftrace.

The trace is ~128K compressed (about 31,600 lines or 2.6MB
uncompressed), so I'm putting it up on my local server instead of
attaching.  Let me know if it would be more appropriate to send to the
list as an attachment.

http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz

I preface all my trace_printk comments with TPK: to make skipping
through the trace easier.

The trace contains the trace of about 3 or 4 successful passes through
the btrfs_file_aio_write() function to show what a successful trace
looks like.

The pass through the btrfs_file_aio_write() that breaks begins on line 1088.

I let it loop through the while (iov_iter_count(i)  0) {} loop for
256 times when copied==0 (otherwise it would loop infinitely).  Then
exit out and stop the trace.

For reference

RE: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Zhong, Xin
Hi Mitch,

I suspect there's a lock contention between flush-btrfs (lock_dellalloc_pages) 
and btrfs_file_aio_write. However I can not recreate it locally. Could you 
please try below patch? Thanks!

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
goto out;
}
 
-   ret = btrfs_delalloc_reserve_space(inode,
-   num_pages  PAGE_CACHE_SHIFT);
-   if (ret)
-   goto out;
-
ret = prepare_pages(root, file, pages, num_pages,
pos, first_index, last_index,
write_bytes);
-   if (ret) {
-   btrfs_delalloc_release_space(inode,
+   if (ret)
+   goto out;
+   
+   ret = btrfs_delalloc_reserve_space(inode,
num_pages  PAGE_CACHE_SHIFT);
+   if (ret) {
+   btrfs_drop_pages(pages, num_pages);
goto out;
}


-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Zhong, Xin
Sent: Tuesday, March 01, 2011 6:15 PM
To: Mitch Harder; Maria Wikström
Cc: Josef Bacik; Johannes Hirte; Chris Mason; linux-btrfs@vger.kernel.org
Subject: RE: [PATCH] btrfs file write debugging patch

Is your system running out of memory or is there any other thread like 
flush-btrfs competing for the same page?

I can only see one process in your ftrace log. You may need to trace all 
btrfs.ko function calls instead of a single process. Thanks!

-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder
Sent: Tuesday, March 01, 2011 4:20 AM
To: Maria Wikström
Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; 
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs file write debugging patch

2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now 
  the same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with 
  no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo

Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Mitch Harder
On Tue, Mar 1, 2011 at 4:14 AM, Zhong, Xin xin.zh...@intel.com wrote:
 Is your system running out of memory or is there any other thread like 
 flush-btrfs competing for the same page?


There's no sign of memory pressure.  Although I only have 1 GB in this
box, I'm still show ~1/2 GB RAM free during this build.  There's no
swap space allocated, and nothing in dmesg that indicates there's a
transient spike of RAM pressure.

 I can only see one process in your ftrace log. You may need to trace all 
 btrfs.ko function calls instead of a single process. Thanks!


That ftrace.log was run with ftrace defaults for a function trace.  It
should collect calls from the whole system.

For the sake of consistency, I am intentionally trying to insure that
very few other things are going on at the same time as this build.
And I'm building with -j1 so things will happen the same way each
time.

Also, I supplied just the tail end of the trace log.  The full log
shows a few of the other build processes leading up to the problem,
but the ftrace ring buffer fills up surprisingly fast.  Even with a
50MB ring buffer for ftrace, I usually collect less than 1 second of
information when something busy like a build is going on.

Let me know if you'd like to see the full log.  It's bigger, but I can
find someplace to put it.

But I'm pretty sure that wmldbcreate is the only thing that is going
on when the breakage occurs.  Otherwise I wouldn't get such consistent
breakage in the same spot every time.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Mitch Harder
On Tue, Mar 1, 2011 at 5:56 AM, Zhong, Xin xin.zh...@intel.com wrote:
 Hi Mitch,

 I suspect there's a lock contention between flush-btrfs 
 (lock_dellalloc_pages) and btrfs_file_aio_write. However I can not recreate 
 it locally. Could you please try below patch? Thanks!

 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb 
 *iocb,
                        goto out;
                }

 -               ret = btrfs_delalloc_reserve_space(inode,
 -                                       num_pages  PAGE_CACHE_SHIFT);
 -               if (ret)
 -                       goto out;
 -
                ret = prepare_pages(root, file, pages, num_pages,
                                    pos, first_index, last_index,
                                    write_bytes);
 -               if (ret) {
 -                       btrfs_delalloc_release_space(inode,
 +               if (ret)
 +                       goto out;
 +
 +               ret = btrfs_delalloc_reserve_space(inode,
                                        num_pages  PAGE_CACHE_SHIFT);
 +               if (ret) {
 +                       btrfs_drop_pages(pages, num_pages);
                        goto out;
                }



Thanks.

I've tested this patch, but the build is still failing at the same
point as before.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Xin Zhong

Hi, Mitch
I think you can config ftrace to just trace function calls of btrfs.ko which 
will save a lot of trace buffer space. See below command:
#echo ':mod:btrfs'  /sys/kernel/debug/tracing/set_ftrace_filterAnd please send 
out the full ftrace log again.

Another helpful information might be the strace log of the wmldbcreate process. 
It will show us the io pattern of this command.
Thanks a lot for your help!
  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Mitch Harder
2011/3/1 Xin Zhong thierryzh...@hotmail.com:

 Hi, Mitch
 I think you can config ftrace to just trace function calls of btrfs.ko which 
 will save a lot of trace buffer space. See below command:
 #echo ':mod:btrfs'  /sys/kernel/debug/tracing/set_ftrace_filterAnd please 
 send out the full ftrace log again.

 Another helpful information might be the strace log of the wmldbcreate 
 process. It will show us the io pattern of this command.
 Thanks a lot for your help!

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


I manually ran an strace around the build command (wmldbcreate) that
is causing my problem, and I am attaching the strace for that.

Please note that wmldbcreate does not seem to care when an error is
returned, and continues on.  So the error is occurring somewhat
silently in the middle, and isn't the last item.  The error is
probably associated with one of the 12288 byte writes.

I have re-run an ftrace following the conditions above, and have
hosted that file (~1.1MB compressed) on my local server at:

http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz

Please note I am still using some debugging modifications of my own to file.c.

They server the purpose of:
(1) Avoiding an infinite loop by identifying when the problem is
occuring, and exiting with error after 256 loops.
(2) Stopping the trace after exiting to keep from flooding the ftrace buffer.
(3) Provide debugging comments (all prefaced with TPK: in the trace).

Let me know if you want me to change any of the conditions.


wmldbcreate-strace.gz
Description: GNU Zip compressed data


Re: [PATCH] btrfs file write debugging patch

2011-03-01 Thread Piotr Szymaniak
On Mon, Feb 28, 2011 at 02:20:22PM -0600, Mitch Harder wrote:
 As promised, I'm put together a modified file.c with many trace_printk
 debugging statements to augment the ftrace.
 *snip*

Just my few cents. I've applied the patch from Chris Mason (Sun, 27 Feb
2011 20:46:05 -0500) and this one from Mitch (Mon, 28 Feb 2011 14:20:22
-0600) on top of vanilla 2.6.38-rc6 and it seems that it resolves my
issues with hanging `svn info' during libgcrypt emerge.

Piotr Szymaniak.
-- 
 - (...) Nie wyobrazam sobie, co ta gora miesa moglaby ci dac, czego ja
nie   moglbym   ofiarowac.  Oczywiscie  poza  piecdziesiecioma  funtami
rozrosnietych miesni.
 - Moze mnie wlasnie pociagaja rozrosniete miesnie. (...) W koncu wielu
mezczyzn pociaga rozrosnieta tkanka tluszczowa piersi.
  -- Graham Masterton, The Wells of Hell


pgp0s4aN9vbmU.pgp
Description: PGP signature


RE: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Zhong, Xin
One possible issue I can see is in the random failure case #2 that 
copy_from_user only process half of the data. 

For example, if it write a 4k aligned page and copy_from_user only write 2k. 
Then it will not call btrfs_delalloc_release_space since num_pages and 
dirty_pages are both 1. 
In the next round, it write another 2k and btrfs_delalloc_reserve_space is 
called twice for the same page. 

Is it a problem? Thanks!

-Original Message-
From: Chris Mason [mailto:chris.ma...@oracle.com] 
Sent: Monday, February 28, 2011 9:46 AM
To: Mitch Harder
Cc: Maria Wikström; Zhong, Xin; Johannes Hirte; linux-btrfs@vger.kernel.org
Subject: [PATCH] btrfs file write debugging patch

Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
 Some clarification on my previous message...
 
 After looking at my ftrace log more closely, I can see where Btrfs is
 trying to release the allocated pages.  However, the calculation for
 the number of dirty_pages is equal to 1 when copied == 0.
 
 So I'm seeing at least two problems:
 (1)  It keeps looping when copied == 0.
 (2)  One dirty page is not being released on every loop even though
 copied == 0 (at least this problem keeps it from being an infinite
 loop by eventually exhausting reserveable space on the disk).

Hi everyone,

There are actually tow bugs here.  First the one that Mitch hit, and a
second one that still results in bad file_write results with my
debugging hunks (the first two hunks below) in place.

My patch fixes Mitch's bug by checking for copied == 0 after
btrfs_copy_from_user and going the correct delalloc accounting.  This
one looks solved, but you'll notice the patch is bigger.

First, I add some random failures to btrfs_copy_from_user() by failing
everyone once and a while.  This was much more reliable than trying to
use memory pressure than making copy_from_user fail.

If copy_from_user fails and we partially update a page, we end up with a
page that may go away due to memory pressure.  But, btrfs_file_write
assumes that only the first and last page may have good data that needs
to be read off the disk.

This patch ditches that code and puts it into prepare_pages instead.
But I'm still having some errors during long stress.sh runs.  Ideas are
more than welcome, hopefully some other timezones will kick in ideas
while I sleep.

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7084140..89a6a26 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t pos, int 
num_pages,
int offset = pos  (PAGE_CACHE_SIZE - 1);
int total_copied = 0;
 
+   if ((jiffies % 10) == 0)
+   return 0;
+
+   if ((jiffies % 25) == 0) {
+   write_bytes /= 2;
+   }
+
while (write_bytes  0) {
size_t count = min_t(size_t,
 PAGE_CACHE_SIZE - offset, write_bytes);
@@ -763,6 +770,26 @@ out:
 }
 
 /*
+ * on error we return an unlocked page and the error value
+ * on success we return a locked page and 0
+ */
+static int prepare_uptodate_page(struct page *page, u64 pos)
+{
+   int ret = 0;
+   if ((pos  (PAGE_CACHE_SIZE - 1))  !PageUptodate(page)) {
+   ret = btrfs_readpage(NULL, page);
+   if (ret)
+   return ret;
+   lock_page(page);
+   if (!PageUptodate(page)) {
+   unlock_page(page);
+   return -EIO;
+   }
+   }
+   return 0;
+}
+
+/*
  * this gets pages into the page cache and locks them down, it also properly
  * waits for data=ordered extents to finish before allowing the pages to be
  * modified.
@@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_root *root, 
struct file *file,
unsigned long index = pos  PAGE_CACHE_SHIFT;
struct inode *inode = fdentry(file)-d_inode;
int err = 0;
+   int faili = 0;
u64 start_pos;
u64 last_pos;
 
@@ -794,15 +822,24 @@ again:
for (i = 0; i  num_pages; i++) {
pages[i] = grab_cache_page(inode-i_mapping, index + i);
if (!pages[i]) {
-   int c;
-   for (c = i - 1; c = 0; c--) {
-   unlock_page(pages[c]);
-   page_cache_release(pages[c]);
-   }
-   return -ENOMEM;
+   faili = i - 1;
+   err = -ENOMEM;
+   goto fail;
+   }
+
+   if (i == 0)
+   err = prepare_uptodate_page(pages[i], pos);
+   else if (i == num_pages - 1)
+   err = prepare_uptodate_page(pages[i],
+   pos + write_bytes);
+   if (err) {
+   page_cache_release(pages[i]);
+   faili = i - 1

Re: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Johannes Hirte
On Monday 28 February 2011 02:46:05 Chris Mason wrote:
 Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
  Some clarification on my previous message...
  
  After looking at my ftrace log more closely, I can see where Btrfs is
  trying to release the allocated pages.  However, the calculation for
  the number of dirty_pages is equal to 1 when copied == 0.
  
  So I'm seeing at least two problems:
  (1)  It keeps looping when copied == 0.
  (2)  One dirty page is not being released on every loop even though
  copied == 0 (at least this problem keeps it from being an infinite
  loop by eventually exhausting reserveable space on the disk).
 
 Hi everyone,
 
 There are actually tow bugs here.  First the one that Mitch hit, and a
 second one that still results in bad file_write results with my
 debugging hunks (the first two hunks below) in place.
 
 My patch fixes Mitch's bug by checking for copied == 0 after
 btrfs_copy_from_user and going the correct delalloc accounting.  This
 one looks solved, but you'll notice the patch is bigger.
 
 First, I add some random failures to btrfs_copy_from_user() by failing
 everyone once and a while.  This was much more reliable than trying to
 use memory pressure than making copy_from_user fail.
 
 If copy_from_user fails and we partially update a page, we end up with a
 page that may go away due to memory pressure.  But, btrfs_file_write
 assumes that only the first and last page may have good data that needs
 to be read off the disk.
 
 This patch ditches that code and puts it into prepare_pages instead.
 But I'm still having some errors during long stress.sh runs.  Ideas are
 more than welcome, hopefully some other timezones will kick in ideas
 while I sleep.

At least it doesn't fix the emerge-problem for me. The behavior is now the same 
as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no 
further interaction to get the emerge-process hang with a svn-process 
consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the 
spawned svn-process stays and it needs a reboot to get rid of it. 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Chris Mason
Excerpts from Johannes Hirte's message of 2011-02-28 05:13:59 -0500:
 On Monday 28 February 2011 02:46:05 Chris Mason wrote:
  Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
   Some clarification on my previous message...
   
   After looking at my ftrace log more closely, I can see where Btrfs is
   trying to release the allocated pages.  However, the calculation for
   the number of dirty_pages is equal to 1 when copied == 0.
   
   So I'm seeing at least two problems:
   (1)  It keeps looping when copied == 0.
   (2)  One dirty page is not being released on every loop even though
   copied == 0 (at least this problem keeps it from being an infinite
   loop by eventually exhausting reserveable space on the disk).
  
  Hi everyone,
  
  There are actually tow bugs here.  First the one that Mitch hit, and a
  second one that still results in bad file_write results with my
  debugging hunks (the first two hunks below) in place.
  
  My patch fixes Mitch's bug by checking for copied == 0 after
  btrfs_copy_from_user and going the correct delalloc accounting.  This
  one looks solved, but you'll notice the patch is bigger.
  
  First, I add some random failures to btrfs_copy_from_user() by failing
  everyone once and a while.  This was much more reliable than trying to
  use memory pressure than making copy_from_user fail.
  
  If copy_from_user fails and we partially update a page, we end up with a
  page that may go away due to memory pressure.  But, btrfs_file_write
  assumes that only the first and last page may have good data that needs
  to be read off the disk.
  
  This patch ditches that code and puts it into prepare_pages instead.
  But I'm still having some errors during long stress.sh runs.  Ideas are
  more than welcome, hopefully some other timezones will kick in ideas
  while I sleep.
 
 At least it doesn't fix the emerge-problem for me. The behavior is now the 
 same 
 as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no 
 further interaction to get the emerge-process hang with a svn-process 
 consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the 
 spawned svn-process stays and it needs a reboot to get rid of it. 

I think your problem really is more enospc related.  Still working on
that as well.  But please don't try the patch without removing the
debugging hunk at the top (anything that mentions jiffies).

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Josef Bacik
On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
 On Monday 28 February 2011 02:46:05 Chris Mason wrote:
  Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
   Some clarification on my previous message...
   
   After looking at my ftrace log more closely, I can see where Btrfs is
   trying to release the allocated pages.  However, the calculation for
   the number of dirty_pages is equal to 1 when copied == 0.
   
   So I'm seeing at least two problems:
   (1)  It keeps looping when copied == 0.
   (2)  One dirty page is not being released on every loop even though
   copied == 0 (at least this problem keeps it from being an infinite
   loop by eventually exhausting reserveable space on the disk).
  
  Hi everyone,
  
  There are actually tow bugs here.  First the one that Mitch hit, and a
  second one that still results in bad file_write results with my
  debugging hunks (the first two hunks below) in place.
  
  My patch fixes Mitch's bug by checking for copied == 0 after
  btrfs_copy_from_user and going the correct delalloc accounting.  This
  one looks solved, but you'll notice the patch is bigger.
  
  First, I add some random failures to btrfs_copy_from_user() by failing
  everyone once and a while.  This was much more reliable than trying to
  use memory pressure than making copy_from_user fail.
  
  If copy_from_user fails and we partially update a page, we end up with a
  page that may go away due to memory pressure.  But, btrfs_file_write
  assumes that only the first and last page may have good data that needs
  to be read off the disk.
  
  This patch ditches that code and puts it into prepare_pages instead.
  But I'm still having some errors during long stress.sh runs.  Ideas are
  more than welcome, hopefully some other timezones will kick in ideas
  while I sleep.
 
 At least it doesn't fix the emerge-problem for me. The behavior is now the 
 same 
 as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no 
 further interaction to get the emerge-process hang with a svn-process 
 consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the 
 spawned svn-process stays and it needs a reboot to get rid of it. 

Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
looping?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Mitch Harder
2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now the 
  same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with 
  no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





I've applied the patch at the head of this thread (with the jiffies
debugging commented out) and I'm attaching a ftrace using the
function_graph tracer when I'm stuck in the loop.  I've just snipped
out a couple of the loops (the full trace file is quite large, and
mostly repititious).

I'm going to try to modify file.c with some trace_printk debugging to
show the values of several of the relevant variables at various
stages.

I'm going to try to exit the loop after 256 tries with an EFAULT so I
can stop the tracing at that point and capture a trace of the entry
into the problem (the ftrace ring buffer fills up too fast for me to
capture the entry point).


ftrace-btrfs-file-write-debugging.gz
Description: GNU Zip compressed data


Re: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Mitch Harder
2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now 
  the same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with 
  no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





 I've applied the patch at the head of this thread (with the jiffies
 debugging commented out) and I'm attaching a ftrace using the
 function_graph tracer when I'm stuck in the loop.  I've just snipped
 out a couple of the loops (the full trace file is quite large, and
 mostly repititious).

 I'm going to try to modify file.c with some trace_printk debugging to
 show the values of several of the relevant variables at various
 stages.

 I'm going to try to exit the loop after 256 tries with an EFAULT so I
 can stop the tracing at that point and capture a trace of the entry
 into the problem (the ftrace ring buffer fills up too fast for me to
 capture the entry point).


As promised, I'm put together a modified file.c with many trace_printk
debugging statements to augment the ftrace.

The trace is ~128K compressed (about 31,600 lines or 2.6MB
uncompressed), so I'm putting it up on my local server instead of
attaching.  Let me know if it would be more appropriate to send to the
list as an attachment.

http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz

I preface all my trace_printk comments with TPK: to make skipping
through the trace easier.

The trace contains the trace of about 3 or 4 successful passes through
the btrfs_file_aio_write() function to show what a successful trace
looks like.

The pass through the btrfs_file_aio_write() that breaks begins on line 1088.

I let it loop through the while (iov_iter_count(i)  0) {} loop for
256 times when copied==0 (otherwise it would loop infinitely).  Then
exit out and stop the trace.

For reference, here's a diff file for the debugging statements I've
added to file.c:

Let me know if you would like me to re-run this trial with different
debugging lines.

 fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c
--- fs/btrfs/file.c 2011-02-28 10:13:45.0 -0600
+++ /usr/src/linux/fs/btrfs/file.c  2011-02-28 13:07:11.0 -0600
@@ -53,12 +53,14 @@
int offset = pos  (PAGE_CACHE_SIZE - 1);
int total_copied = 0;

+   /***
if ((jiffies % 10) == 0)
return 0;

if ((jiffies % 25) == 0) {

Re: [PATCH] btrfs file write debugging patch

2011-02-28 Thread Mitch Harder
2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org:
 2011/2/28 Maria Wikström ma...@ponstudios.se:
 mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
 On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
  On Monday 28 February 2011 02:46:05 Chris Mason wrote:
   Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
Some clarification on my previous message...
   
After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when copied == 0.
   
So I'm seeing at least two problems:
(1)  It keeps looping when copied == 0.
(2)  One dirty page is not being released on every loop even though
copied == 0 (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
  
   Hi everyone,
  
   There are actually tow bugs here.  First the one that Mitch hit, and a
   second one that still results in bad file_write results with my
   debugging hunks (the first two hunks below) in place.
  
   My patch fixes Mitch's bug by checking for copied == 0 after
   btrfs_copy_from_user and going the correct delalloc accounting.  This
   one looks solved, but you'll notice the patch is bigger.
  
   First, I add some random failures to btrfs_copy_from_user() by failing
   everyone once and a while.  This was much more reliable than trying to
   use memory pressure than making copy_from_user fail.
  
   If copy_from_user fails and we partially update a page, we end up with 
   a
   page that may go away due to memory pressure.  But, btrfs_file_write
   assumes that only the first and last page may have good data that needs
   to be read off the disk.
  
   This patch ditches that code and puts it into prepare_pages instead.
   But I'm still having some errors during long stress.sh runs.  Ideas are
   more than welcome, hopefully some other timezones will kick in ideas
   while I sleep.
 
  At least it doesn't fix the emerge-problem for me. The behavior is now 
  the same
  as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' 
  with no
  further interaction to get the emerge-process hang with a svn-process
  consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
  spawned svn-process stays and it needs a reboot to get rid of it.

 Can you cat /proc/$pid/wchan a few times so we can get an idea of where 
 it's
 looping?  Thanks,

 Josef

 It behaves the same way here with btrfs-unstable.
 The output of cat /proc/$pid/wchan is 0.

 // Maria

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





 I've applied the patch at the head of this thread (with the jiffies
 debugging commented out) and I'm attaching a ftrace using the
 function_graph tracer when I'm stuck in the loop.  I've just snipped
 out a couple of the loops (the full trace file is quite large, and
 mostly repititious).

 I'm going to try to modify file.c with some trace_printk debugging to
 show the values of several of the relevant variables at various
 stages.

 I'm going to try to exit the loop after 256 tries with an EFAULT so I
 can stop the tracing at that point and capture a trace of the entry
 into the problem (the ftrace ring buffer fills up too fast for me to
 capture the entry point).


 As promised, I'm put together a modified file.c with many trace_printk
 debugging statements to augment the ftrace.

 The trace is ~128K compressed (about 31,600 lines or 2.6MB
 uncompressed), so I'm putting it up on my local server instead of
 attaching.  Let me know if it would be more appropriate to send to the
 list as an attachment.

 http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz

 I preface all my trace_printk comments with TPK: to make skipping
 through the trace easier.

 The trace contains the trace of about 3 or 4 successful passes through
 the btrfs_file_aio_write() function to show what a successful trace
 looks like.

 The pass through the btrfs_file_aio_write() that breaks begins on line 1088.

 I let it loop through the while (iov_iter_count(i)  0) {} loop for
 256 times when copied==0 (otherwise it would loop infinitely).  Then
 exit out and stop the trace.

 For reference, here's a diff file for the debugging statements I've
 added to file.c:

 Let me know if you would like me to re-run this trial with different
 debugging lines.

  fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c
 --- fs/btrfs/file.c     2011-02-28 10:13:45.0 -0600
 +++ /usr/src/linux/fs/btrfs/file.c      2011-02-28 13:07:11.0 -0600
 @@ -53,12 +53,14 @@
        int offset = pos  (PAGE_CACHE_SIZE - 1);
        int total_copied = 0;

 +       /***
        if ((jiffies % 

[PATCH] btrfs file write debugging patch

2011-02-27 Thread Chris Mason
Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
 Some clarification on my previous message...
 
 After looking at my ftrace log more closely, I can see where Btrfs is
 trying to release the allocated pages.  However, the calculation for
 the number of dirty_pages is equal to 1 when copied == 0.
 
 So I'm seeing at least two problems:
 (1)  It keeps looping when copied == 0.
 (2)  One dirty page is not being released on every loop even though
 copied == 0 (at least this problem keeps it from being an infinite
 loop by eventually exhausting reserveable space on the disk).

Hi everyone,

There are actually tow bugs here.  First the one that Mitch hit, and a
second one that still results in bad file_write results with my
debugging hunks (the first two hunks below) in place.

My patch fixes Mitch's bug by checking for copied == 0 after
btrfs_copy_from_user and going the correct delalloc accounting.  This
one looks solved, but you'll notice the patch is bigger.

First, I add some random failures to btrfs_copy_from_user() by failing
everyone once and a while.  This was much more reliable than trying to
use memory pressure than making copy_from_user fail.

If copy_from_user fails and we partially update a page, we end up with a
page that may go away due to memory pressure.  But, btrfs_file_write
assumes that only the first and last page may have good data that needs
to be read off the disk.

This patch ditches that code and puts it into prepare_pages instead.
But I'm still having some errors during long stress.sh runs.  Ideas are
more than welcome, hopefully some other timezones will kick in ideas
while I sleep.

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7084140..89a6a26 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t pos, int 
num_pages,
int offset = pos  (PAGE_CACHE_SIZE - 1);
int total_copied = 0;
 
+   if ((jiffies % 10) == 0)
+   return 0;
+
+   if ((jiffies % 25) == 0) {
+   write_bytes /= 2;
+   }
+
while (write_bytes  0) {
size_t count = min_t(size_t,
 PAGE_CACHE_SIZE - offset, write_bytes);
@@ -763,6 +770,26 @@ out:
 }
 
 /*
+ * on error we return an unlocked page and the error value
+ * on success we return a locked page and 0
+ */
+static int prepare_uptodate_page(struct page *page, u64 pos)
+{
+   int ret = 0;
+   if ((pos  (PAGE_CACHE_SIZE - 1))  !PageUptodate(page)) {
+   ret = btrfs_readpage(NULL, page);
+   if (ret)
+   return ret;
+   lock_page(page);
+   if (!PageUptodate(page)) {
+   unlock_page(page);
+   return -EIO;
+   }
+   }
+   return 0;
+}
+
+/*
  * this gets pages into the page cache and locks them down, it also properly
  * waits for data=ordered extents to finish before allowing the pages to be
  * modified.
@@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_root *root, 
struct file *file,
unsigned long index = pos  PAGE_CACHE_SHIFT;
struct inode *inode = fdentry(file)-d_inode;
int err = 0;
+   int faili = 0;
u64 start_pos;
u64 last_pos;
 
@@ -794,15 +822,24 @@ again:
for (i = 0; i  num_pages; i++) {
pages[i] = grab_cache_page(inode-i_mapping, index + i);
if (!pages[i]) {
-   int c;
-   for (c = i - 1; c = 0; c--) {
-   unlock_page(pages[c]);
-   page_cache_release(pages[c]);
-   }
-   return -ENOMEM;
+   faili = i - 1;
+   err = -ENOMEM;
+   goto fail;
+   }
+
+   if (i == 0)
+   err = prepare_uptodate_page(pages[i], pos);
+   else if (i == num_pages - 1)
+   err = prepare_uptodate_page(pages[i],
+   pos + write_bytes);
+   if (err) {
+   page_cache_release(pages[i]);
+   faili = i - 1;
+   goto fail;
}
wait_on_page_writeback(pages[i]);
}
+   err = 0;
if (start_pos  inode-i_size) {
struct btrfs_ordered_extent *ordered;
lock_extent_bits(BTRFS_I(inode)-io_tree,
@@ -842,6 +879,14 @@ again:
WARN_ON(!PageLocked(pages[i]));
}
return 0;
+fail:
+   while (faili = 0) {
+   unlock_page(pages[faili]);
+   page_cache_release(pages[faili]);
+   faili--;
+   }
+   return err;
+
 }
 
 static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
@@ -851,7 +896,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb