Re: [PATCH] btrfs file write debugging patch
mån 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder: On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. With the patch I can compile libgcrypt without any problem, so it solves my problems to. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Monday 07 March 2011 20:56:50 Maria Wikström wrote: mån 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder: On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. With the patch I can compile libgcrypt without any problem, so it solves my problems to. Can confirm this. And the bug seems to be hardware-related. On my Pentium4 system it was 100% reproducible, on my Atom-based system I couldn't trigger it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
From my experience, only x86 32bit kernel has this problem and 64bit kernel do not have it. However, atom based system do not have it although it's installed with a 32bit kernel. -Original Message- From: Johannes Hirte [mailto:johannes.hi...@fem.tu-ilmenau.de] Sent: Tuesday, March 08, 2011 6:12 AM To: Maria Wikström Cc: Mitch Harder; Chris Mason; Xin Zhong; Zhong, Xin; linux-btrfs Subject: Re: [PATCH] btrfs file write debugging patch On Monday 07 March 2011 20:56:50 Maria Wikström wrote: mån 2011-03-07 klockan 00:07 -0600 skrev Mitch Harder: On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. With the patch I can compile libgcrypt without any problem, so it solves my problems to. Can confirm this. And the bug seems to be hardware-related. On my Pentium4 system it was 100% reproducible, on my Atom-based system I couldn't trigger it. N�r��yb�X��ǧv�^�){.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
Re: [PATCH] btrfs file write debugging patch
Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7084140..5986ac7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -763,6 +763,27 @@ out: } /* + * on error we return an unlocked page and the error value + * on success we return a locked page and 0 + */ +static int prepare_uptodate_page(struct page *page, u64 pos) +{ + int ret = 0; + + if ((pos (PAGE_CACHE_SIZE - 1)) !PageUptodate(page)) { + ret = btrfs_readpage(NULL, page); + if (ret) + return ret; + lock_page(page); + if (!PageUptodate(page)) { + unlock_page(page); + return -EIO; + } + } + return 0; +} + +/* * this gets pages into the page cache and locks them down, it also properly * waits for data=ordered extents to finish before allowing the pages to be * modified. @@ -777,6 +798,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, unsigned long index = pos PAGE_CACHE_SHIFT; struct inode *inode = fdentry(file)-d_inode; int err = 0; + int faili = 0; u64 start_pos; u64 last_pos; @@ -794,15 +816,24 @@ again: for (i = 0; i num_pages; i++) { pages[i] = grab_cache_page(inode-i_mapping, index + i); if (!pages[i]) { - int c; - for (c = i - 1; c = 0; c--) { - unlock_page(pages[c]); - page_cache_release(pages[c]); - } - return -ENOMEM; + faili = i - 1; + err = -ENOMEM; + goto fail; + } + + if (i == 0) + err = prepare_uptodate_page(pages[i], pos); + if (i == num_pages - 1) + err = prepare_uptodate_page(pages[i], + pos + write_bytes); + if (err) { + page_cache_release(pages[i]); + faili = i - 1; + goto fail; } wait_on_page_writeback(pages[i]); } + err = 0; if (start_pos inode-i_size) { struct btrfs_ordered_extent *ordered; lock_extent_bits(BTRFS_I(inode)-io_tree, @@ -842,6 +873,14 @@ again: WARN_ON(!PageLocked(pages[i])); } return 0; +fail: + while (faili = 0) { + unlock_page(pages[faili]); + page_cache_release(pages[faili]); + faili--; + } + return err; + } static ssize_t btrfs_file_aio_write(struct kiocb *iocb, @@ -851,7 +890,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, struct file *file = iocb-ki_filp; struct inode *inode = fdentry(file)-d_inode; struct btrfs_root *root = BTRFS_I(inode)-root; - struct page *pinned[2]; struct page **pages = NULL; struct iov_iter i; loff_t *ppos = iocb-ki_pos; @@ -872,9 +910,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, will_write = ((file-f_flags O_DSYNC) || IS_SYNC(inode) || (file-f_flags O_DIRECT)); - pinned[0] = NULL; - pinned[1] = NULL; - start_pos = pos; vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE); @@ -962,32 +997,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, first_index = pos PAGE_CACHE_SHIFT; last_index = (pos + iov_iter_count(i)) PAGE_CACHE_SHIFT; - /* -* there are lots of better ways to do this, but this code -* makes sure the first and last page in the file range are -* up to date and ready for cow -*/ - if ((pos (PAGE_CACHE_SIZE - 1))) { -
Re: [PATCH] btrfs file write debugging patch
On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
That's great! :-) -Original Message- From: Mitch Harder [mailto:mitch.har...@sabayonlinux.org] Sent: Monday, March 07, 2011 2:08 PM To: Chris Mason Cc: Xin Zhong; Zhong, Xin; linux-btrfs Subject: Re: [PATCH] btrfs file write debugging patch On Sun, Mar 6, 2011 at 6:58 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Chris Mason's message of 2011-03-06 13:00:27 -0500: Excerpts from Mitch Harder's message of 2011-03-05 11:50:14 -0500: I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. Just to make sure I understand, could you please post the full combined path that was giving you trouble with gcc? We do need to make sure the pages are properly up to date if we fall back to partial writes. Ok, I was able to reproduce this easily with fsx. The problem is that I wasn't making sure the last partial page in the write was up to date when it was also the first page in the write. Here is the updated patch, it has all the fixes we've found so far: This latest patch that Chris has sent out fixes the issues I've been encountering. I can build gcc-4.4.5 without problems. Also, the portion of Openmotif that was having issues with page faults is working correctly. Let me know if you still would like to see the path names for the portions of the gcc-4.4.5 build that were giving me issues. I didn't save that information, but I can regenerate it. But it sounds like it's irrelevant now. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
2011/3/4 Xin Zhong thierryzh...@hotmail.com: So it works well now with the two patches from Chris on your system. Am I right? No. I am getting errors building gcc-4.4.5 with the two patches from Chris. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
I've constructed a test patch that is currently addressing all the issues on my system. The portion of Openmotif that was having issues with page faults works correctly with this patch, and gcc-4.4.5 builds without issue. I extracted only the portion of the first patch that corrects the handling of dirty_pages when copied==0, and incorporated the second patch that falls back to one-page-at-a-time if there are troubles with page faults. --- fs/btrfs/file.c 2011-03-05 07:34:43.025131607 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-03-05 07:41:45.001260294 -0600 @@ -1023,8 +1023,20 @@ copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); - dirty_pages = (copied + offset + PAGE_CACHE_SIZE - 1) - PAGE_CACHE_SHIFT; + + /* +* if we have trouble faulting in the pages, fall +* back to one page at a time +*/ + if (copied write_bytes) + nrptrs = 1; + + if (copied == 0) + dirty_pages = 0; + else + dirty_pages = (copied + offset + + PAGE_CACHE_SIZE - 1) + PAGE_CACHE_SHIFT; if (num_pages dirty_pages) { if (copied 0) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Thu, Mar 3, 2011 at 8:41 PM, Josef Bacik jo...@redhat.com wrote: On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: Where can I find your patch? Thanks! It's in my btrfs-work git tree, it's based on the latest git pull from linus so you can just pull it onto a linus tree and you should be good to go. The specific patch is Btrfs: simplify our write path Thanks, Josef Josef: I've been testing the kernel from you git tree (as of commit f192 Btrfs: add a comment explaining what btrfs_cont_expand does). I am still running into issues when running the portion of Openmotif that has been generating the problem with page faults when given an unaligned address. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
I think Josef's patch only address the one-by-one page processing issue. But do not address the issue that dirty_pages should be set to 0 when copied is 0. Date: Sat, 5 Mar 2011 10:56:52 -0600 Subject: Re: [PATCH] btrfs file write debugging patch From: mitch.har...@sabayonlinux.org To: jo...@redhat.com CC: xin.zh...@intel.com; chris.ma...@oracle.com; thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org On Thu, Mar 3, 2011 at 8:41 PM, Josef Bacik wrote: On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: Where can I find your patch? Thanks! It's in my btrfs-work git tree, it's based on the latest git pull from linus so you can just pull it onto a linus tree and you should be good to go. The specific patch is Btrfs: simplify our write path Thanks, Josef Josef: I've been testing the kernel from you git tree (as of commit f192 Btrfs: add a comment explaining what btrfs_cont_expand does). I am still running into issues when running the portion of Openmotif that has been generating the problem with page faults when given an unaligned address. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
Looks good to me. Thanks! Another mysterious thing is that this problem can only be recreated on x86 32bit system. I can not recreate it on x86_64 system using my test case. Any one have any idea about it? -Original Message- From: Josef Bacik [mailto:jo...@redhat.com] Sent: Friday, March 04, 2011 10:41 AM To: Zhong, Xin Cc: Josef Bacik; Chris Mason; Mitch Harder; Xin Zhong; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: Where can I find your patch? Thanks! It's in my btrfs-work git tree, it's based on the latest git pull from linus so you can just pull it onto a linus tree and you should be good to go. The specific patch is Btrfs: simplify our write path Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + +/* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ +if (copied write_bytes) +nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Ok, this is working well for me. Anyone see any problems with it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
It works well for me too. From: chris.ma...@oracle.com To: chris.ma...@oracle.com CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org Subject: RE: [PATCH] btrfs file write debugging patch Date: Fri, 4 Mar 2011 07:19:39 -0500 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Ok, this is working well for me. Anyone see any problems with it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
2011/3/4 Xin Zhong thierryzh...@hotmail.com: It works well for me too. From: chris.ma...@oracle.com To: chris.ma...@oracle.com CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org Subject: RE: [PATCH] btrfs file write debugging patch Date: Fri, 4 Mar 2011 07:19:39 -0500 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Ok, this is working well for me. Anyone see any problems with it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied this patch on top of the debugging patch at the head of the thread, and I'm having trouble building gcc now. When building gcc-4.4.5, I get errors like the following: Comparing stages 2 and 3 Bootstrap comparison failure! ./cp/call.o differs ./cp/decl.o differs ./cp/pt.o differs ./cp/class.o differs ./cp/decl2.o differs snip. ./matrix-reorg.o differs ./tree-inline.o differs ./gcc.o differs ./gcc-options.o differs make[2]: *** [compare] Error 1 make[1]: *** [stage3-bubble] Error 2 make: *** [bootstrap-lean] Error 2 emake failed I've went back and rebuilt my kernel without these two debugging patches, and gcc-4.4.5 builds without error on that kernel. I haven't yet tested building gcc-4.4.5 with just the debugging patch at the head of the thread, so I'll test that, and report back. But I was wondering if anybody else can replicate this issue. BTW, I've been doing most of my testing on an x86 system. My x86_64 systems haven't had as much trouble, but I haven't been robustingly checking my x86_64 systems for these issues. I noticed that page fault handling is different by architecture. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Fri, Mar 4, 2011 at 9:33 AM, Mitch Harder mitch.har...@sabayonlinux.org wrote: 2011/3/4 Xin Zhong thierryzh...@hotmail.com: It works well for me too. From: chris.ma...@oracle.com To: chris.ma...@oracle.com CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org Subject: RE: [PATCH] btrfs file write debugging patch Date: Fri, 4 Mar 2011 07:19:39 -0500 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Ok, this is working well for me. Anyone see any problems with it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied this patch on top of the debugging patch at the head of the thread, and I'm having trouble building gcc now. When building gcc-4.4.5, I get errors like the following: Comparing stages 2 and 3 Bootstrap comparison failure! ./cp/call.o differs ./cp/decl.o differs ./cp/pt.o differs ./cp/class.o differs ./cp/decl2.o differs snip. ./matrix-reorg.o differs ./tree-inline.o differs ./gcc.o differs ./gcc-options.o differs make[2]: *** [compare] Error 1 make[1]: *** [stage3-bubble] Error 2 make: *** [bootstrap-lean] Error 2 emake failed I've went back and rebuilt my kernel without these two debugging patches, and gcc-4.4.5 builds without error on that kernel. I haven't yet tested building gcc-4.4.5 with just the debugging patch at the head of the thread, so I'll test that, and report back. But I was wondering if anybody else can replicate this issue. BTW, I've been doing most of my testing on an x86 system. My x86_64 systems haven't had as much trouble, but I haven't been robustingly checking my x86_64 systems for these issues. I noticed that page fault handling is different by architecture. Some followup... I'm encountering this issue with Bootstrap comparison failure! in a gcc-4.4.5 build when only the patch at the head of the thread is applied (leaving the recent patch to limit pages to one-by-one on page fault out). I just hadn't run across this issue until I started playing with patches to limit the pages to one-by-one on page fault errors. So it may not be associated with the last patch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
So it works well now with the two patches from Chris on your system. Am I right? Date: Fri, 4 Mar 2011 11:21:36 -0600 Subject: Re: [PATCH] btrfs file write debugging patch From: mitch.har...@sabayonlinux.org To: thierryzh...@hotmail.com CC: chris.ma...@oracle.com; xin.zh...@intel.com; linux-btrfs@vger.kernel.org On Fri, Mar 4, 2011 at 9:33 AM, Mitch Harder wrote: 2011/3/4 Xin Zhong : It works well for me too. From: chris.ma...@oracle.com To: chris.ma...@oracle.com CC: xin.zh...@intel.com; mitch.har...@sabayonlinux.org; thierryzh...@hotmail.com; linux-btrfs@vger.kernel.org Subject: RE: [PATCH] btrfs file write debugging patch Date: Fri, 4 Mar 2011 07:19:39 -0500 Excerpts from Chris Mason's message of 2011-03-03 20:51:55 -0500: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Ok, this is working well for me. Anyone see any problems with it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied this patch on top of the debugging patch at the head of the thread, and I'm having trouble building gcc now. When building gcc-4.4.5, I get errors like the following: Comparing stages 2 and 3 Bootstrap comparison failure! ./cp/call.o differs ./cp/decl.o differs ./cp/pt.o differs ./cp/class.o differs ./cp/decl2.o differs snip. ./matrix-reorg.o differs ./tree-inline.o differs ./gcc.o differs ./gcc-options.o differs make[2]: *** [compare] Error 1 make[1]: *** [stage3-bubble] Error 2 make: *** [bootstrap-lean] Error 2 emake failed I've went back and rebuilt my kernel without these two debugging patches, and gcc-4.4.5 builds without error on that kernel. I haven't yet tested building gcc-4.4.5 with just the debugging patch at the head of the thread, so I'll test that, and report back. But I was wondering if anybody else can replicate this issue. BTW, I've been doing most of my testing on an x86 system. My x86_64 systems haven't had as much trouble, but I haven't been robustingly checking my x86_64 systems for these issues. I noticed that page fault handling is different by architecture. Some followup... I'm encountering this issue with Bootstrap comparison failure! in a gcc-4.4.5 build when only the patch at the head of the thread is applied (leaving the recent patch to limit pages to one-by-one on page fault out). I just hadn't run across this issue until I started playing with patches to limit the pages to one-by-one on page fault errors. So it may not be associated with the last patch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: #include sys/types.h #include sys/stat.h #include fcntl.h static char a[4096*3]; int main() { int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666); write(fd,a+1, 4096*2); exit(0); } It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* +* if we have trouble faulting in the pages, fall +* back to one page at a time +*/ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Thu, Mar 03, 2011 at 08:51:55PM -0500, Chris Mason wrote: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: #include sys/types.h #include sys/stat.h #include fcntl.h static char a[4096*3]; int main() { int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666); write(fd,a+1, 4096*2); exit(0); } It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Btw this situation is taken care of in my write path rewrite patch, if copied == 0 we switch to one segment at a time. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
Where can I find your patch? Thanks! -Original Message- From: Josef Bacik [mailto:jo...@redhat.com] Sent: Friday, March 04, 2011 10:32 AM To: Chris Mason Cc: Zhong, Xin; Mitch Harder; Xin Zhong; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch On Thu, Mar 03, 2011 at 08:51:55PM -0500, Chris Mason wrote: Excerpts from Zhong, Xin's message of 2011-03-02 05:58:49 -0500: I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: #include sys/types.h #include sys/stat.h #include fcntl.h static char a[4096*3]; int main() { int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666); write(fd,a+1, 4096*2); exit(0); } It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! Great job guys. I'm using this on top of my debugging patch. It passes the unaligned test but I'll give it a real run tonight and look for other problems. (This is almost entirely untested, please don't use it quite yet) -chris diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 89a6a26..6a44add 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1039,6 +1038,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, copied = btrfs_copy_from_user(pos, num_pages, write_bytes, pages, i); + + /* + * if we have trouble faulting in the pages, fall + * back to one page at a time + */ + if (copied write_bytes) + nrptrs = 1; + if (copied == 0) dirty_pages = 0; else Btw this situation is taken care of in my write path rewrite patch, if copied == 0 we switch to one segment at a time. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Fri, Mar 04, 2011 at 10:42:46AM +0800, Zhong, Xin wrote: Where can I find your patch? Thanks! It's in my btrfs-work git tree, it's based on the latest git pull from linus so you can just pull it onto a linus tree and you should be good to go. The specific patch is Btrfs: simplify our write path Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
Sorry, I forgot to mention that you need to undo below commit in btrfs-unstable to recreate the problem: Btrfs: fix fiemap bugs with delalloc (+224/-42) Otherwise, it will run into enospc error. I am not sure if it's the same problem. From: xin.zh...@intel.com To: mitch.har...@sabayonlinux.org; thierryzh...@hotmail.com CC: linux-btrfs@vger.kernel.org Date: Wed, 2 Mar 2011 18:58:49 +0800 Subject: RE: [PATCH] btrfs file write debugging patch I downloaded openmotif and run the command as Mitch mentioned and was able to recreate the problem locally. And I managed to simplify the command into a very simple program which can capture the problem easily. See below code: #include #include #include static char a[4096*3]; int main() { int fd = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666); write(fd,a+1, 4096*2); exit(0); } It seems that if we give an unaligned address to btrfs write and the buffer reside on more than 2 pages. It will trigger this bug. If we give an aligned address to btrfs write, it works well no matter how many pages are given. I use ftrace to observe it. It seems iov_iter_fault_in_readable do not trigger pagefault handling when the address is not aligned. I do not quite understand the reason behind it. But the solution should be to process the page one by one. And that's also what generic file write routine does. Any suggestion are welcomed. Thanks! -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder Sent: Wednesday, March 02, 2011 5:09 AM To: Xin Zhong Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/3/1 Xin Zhong : Hi, Mitch I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: #echo ':mod:btrfs' /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. Thanks a lot for your help! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I manually ran an strace around the build command (wmldbcreate) that is causing my problem, and I am attaching the strace for that. Please note that wmldbcreate does not seem to care when an error is returned, and continues on. So the error is occurring somewhat silently in the middle, and isn't the last item. The error is probably associated with one of the 12288 byte writes. I have re-run an ftrace following the conditions above, and have hosted that file (~1.1MB compressed) on my local server at: http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz Please note I am still using some debugging modifications of my own to file.c. They server the purpose of: (1) Avoiding an infinite loop by identifying when the problem is occuring, and exiting with error after 256 loops. (2) Stopping the trace after exiting to keep from flooding the ftrace buffer. (3) Provide debugging comments (all prefaced with TPK: in the trace). Let me know if you want me to change any of the conditions. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs file write debugging patch
Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder Sent: Tuesday, March 01, 2011 4:20 AM To: Maria Wikström Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied the patch at the head of this thread (with the jiffies debugging commented out) and I'm attaching a ftrace using the function_graph tracer when I'm stuck in the loop. I've just snipped out a couple of the loops (the full trace file is quite large, and mostly repititious). I'm going to try to modify file.c with some trace_printk debugging to show the values of several of the relevant variables at various stages. I'm going to try to exit the loop after 256 tries with an EFAULT so I can stop the tracing at that point and capture a trace of the entry into the problem (the ftrace ring buffer fills up too fast for me to capture the entry point). As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with TPK: to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line 1088. I let it loop through the while (iov_iter_count(i) 0) {} loop for 256 times when copied==0 (otherwise it would loop infinitely). Then exit out and stop the trace. For reference
RE: [PATCH] btrfs file write debugging patch
Hi Mitch, I suspect there's a lock contention between flush-btrfs (lock_dellalloc_pages) and btrfs_file_aio_write. However I can not recreate it locally. Could you please try below patch? Thanks! diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, goto out; } - ret = btrfs_delalloc_reserve_space(inode, - num_pages PAGE_CACHE_SHIFT); - if (ret) - goto out; - ret = prepare_pages(root, file, pages, num_pages, pos, first_index, last_index, write_bytes); - if (ret) { - btrfs_delalloc_release_space(inode, + if (ret) + goto out; + + ret = btrfs_delalloc_reserve_space(inode, num_pages PAGE_CACHE_SHIFT); + if (ret) { + btrfs_drop_pages(pages, num_pages); goto out; } -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Zhong, Xin Sent: Tuesday, March 01, 2011 6:15 PM To: Mitch Harder; Maria Wikström Cc: Josef Bacik; Johannes Hirte; Chris Mason; linux-btrfs@vger.kernel.org Subject: RE: [PATCH] btrfs file write debugging patch Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Mitch Harder Sent: Tuesday, March 01, 2011 4:20 AM To: Maria Wikström Cc: Josef Bacik; Johannes Hirte; Chris Mason; Zhong, Xin; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs file write debugging patch 2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo
Re: [PATCH] btrfs file write debugging patch
On Tue, Mar 1, 2011 at 4:14 AM, Zhong, Xin xin.zh...@intel.com wrote: Is your system running out of memory or is there any other thread like flush-btrfs competing for the same page? There's no sign of memory pressure. Although I only have 1 GB in this box, I'm still show ~1/2 GB RAM free during this build. There's no swap space allocated, and nothing in dmesg that indicates there's a transient spike of RAM pressure. I can only see one process in your ftrace log. You may need to trace all btrfs.ko function calls instead of a single process. Thanks! That ftrace.log was run with ftrace defaults for a function trace. It should collect calls from the whole system. For the sake of consistency, I am intentionally trying to insure that very few other things are going on at the same time as this build. And I'm building with -j1 so things will happen the same way each time. Also, I supplied just the tail end of the trace log. The full log shows a few of the other build processes leading up to the problem, but the ftrace ring buffer fills up surprisingly fast. Even with a 50MB ring buffer for ftrace, I usually collect less than 1 second of information when something busy like a build is going on. Let me know if you'd like to see the full log. It's bigger, but I can find someplace to put it. But I'm pretty sure that wmldbcreate is the only thing that is going on when the breakage occurs. Otherwise I wouldn't get such consistent breakage in the same spot every time. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Tue, Mar 1, 2011 at 5:56 AM, Zhong, Xin xin.zh...@intel.com wrote: Hi Mitch, I suspect there's a lock contention between flush-btrfs (lock_dellalloc_pages) and btrfs_file_aio_write. However I can not recreate it locally. Could you please try below patch? Thanks! diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 65338a1..b9d0929 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1007,17 +1007,16 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, goto out; } - ret = btrfs_delalloc_reserve_space(inode, - num_pages PAGE_CACHE_SHIFT); - if (ret) - goto out; - ret = prepare_pages(root, file, pages, num_pages, pos, first_index, last_index, write_bytes); - if (ret) { - btrfs_delalloc_release_space(inode, + if (ret) + goto out; + + ret = btrfs_delalloc_reserve_space(inode, num_pages PAGE_CACHE_SHIFT); + if (ret) { + btrfs_drop_pages(pages, num_pages); goto out; } Thanks. I've tested this patch, but the build is still failing at the same point as before. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
Hi, Mitch I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: #echo ':mod:btrfs' /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. Thanks a lot for your help! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
2011/3/1 Xin Zhong thierryzh...@hotmail.com: Hi, Mitch I think you can config ftrace to just trace function calls of btrfs.ko which will save a lot of trace buffer space. See below command: #echo ':mod:btrfs' /sys/kernel/debug/tracing/set_ftrace_filterAnd please send out the full ftrace log again. Another helpful information might be the strace log of the wmldbcreate process. It will show us the io pattern of this command. Thanks a lot for your help! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I manually ran an strace around the build command (wmldbcreate) that is causing my problem, and I am attaching the strace for that. Please note that wmldbcreate does not seem to care when an error is returned, and continues on. So the error is occurring somewhat silently in the middle, and isn't the last item. The error is probably associated with one of the 12288 byte writes. I have re-run an ftrace following the conditions above, and have hosted that file (~1.1MB compressed) on my local server at: http://dontpanic.dyndns.org/trace-openmotif-btrfs-v15.gz Please note I am still using some debugging modifications of my own to file.c. They server the purpose of: (1) Avoiding an infinite loop by identifying when the problem is occuring, and exiting with error after 256 loops. (2) Stopping the trace after exiting to keep from flooding the ftrace buffer. (3) Provide debugging comments (all prefaced with TPK: in the trace). Let me know if you want me to change any of the conditions. wmldbcreate-strace.gz Description: GNU Zip compressed data
Re: [PATCH] btrfs file write debugging patch
On Mon, Feb 28, 2011 at 02:20:22PM -0600, Mitch Harder wrote: As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. *snip* Just my few cents. I've applied the patch from Chris Mason (Sun, 27 Feb 2011 20:46:05 -0500) and this one from Mitch (Mon, 28 Feb 2011 14:20:22 -0600) on top of vanilla 2.6.38-rc6 and it seems that it resolves my issues with hanging `svn info' during libgcrypt emerge. Piotr Szymaniak. -- - (...) Nie wyobrazam sobie, co ta gora miesa moglaby ci dac, czego ja nie moglbym ofiarowac. Oczywiscie poza piecdziesiecioma funtami rozrosnietych miesni. - Moze mnie wlasnie pociagaja rozrosniete miesnie. (...) W koncu wielu mezczyzn pociaga rozrosnieta tkanka tluszczowa piersi. -- Graham Masterton, The Wells of Hell pgp0s4aN9vbmU.pgp Description: PGP signature
RE: [PATCH] btrfs file write debugging patch
One possible issue I can see is in the random failure case #2 that copy_from_user only process half of the data. For example, if it write a 4k aligned page and copy_from_user only write 2k. Then it will not call btrfs_delalloc_release_space since num_pages and dirty_pages are both 1. In the next round, it write another 2k and btrfs_delalloc_reserve_space is called twice for the same page. Is it a problem? Thanks! -Original Message- From: Chris Mason [mailto:chris.ma...@oracle.com] Sent: Monday, February 28, 2011 9:46 AM To: Mitch Harder Cc: Maria Wikström; Zhong, Xin; Johannes Hirte; linux-btrfs@vger.kernel.org Subject: [PATCH] btrfs file write debugging patch Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7084140..89a6a26 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t pos, int num_pages, int offset = pos (PAGE_CACHE_SIZE - 1); int total_copied = 0; + if ((jiffies % 10) == 0) + return 0; + + if ((jiffies % 25) == 0) { + write_bytes /= 2; + } + while (write_bytes 0) { size_t count = min_t(size_t, PAGE_CACHE_SIZE - offset, write_bytes); @@ -763,6 +770,26 @@ out: } /* + * on error we return an unlocked page and the error value + * on success we return a locked page and 0 + */ +static int prepare_uptodate_page(struct page *page, u64 pos) +{ + int ret = 0; + if ((pos (PAGE_CACHE_SIZE - 1)) !PageUptodate(page)) { + ret = btrfs_readpage(NULL, page); + if (ret) + return ret; + lock_page(page); + if (!PageUptodate(page)) { + unlock_page(page); + return -EIO; + } + } + return 0; +} + +/* * this gets pages into the page cache and locks them down, it also properly * waits for data=ordered extents to finish before allowing the pages to be * modified. @@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, unsigned long index = pos PAGE_CACHE_SHIFT; struct inode *inode = fdentry(file)-d_inode; int err = 0; + int faili = 0; u64 start_pos; u64 last_pos; @@ -794,15 +822,24 @@ again: for (i = 0; i num_pages; i++) { pages[i] = grab_cache_page(inode-i_mapping, index + i); if (!pages[i]) { - int c; - for (c = i - 1; c = 0; c--) { - unlock_page(pages[c]); - page_cache_release(pages[c]); - } - return -ENOMEM; + faili = i - 1; + err = -ENOMEM; + goto fail; + } + + if (i == 0) + err = prepare_uptodate_page(pages[i], pos); + else if (i == num_pages - 1) + err = prepare_uptodate_page(pages[i], + pos + write_bytes); + if (err) { + page_cache_release(pages[i]); + faili = i - 1
Re: [PATCH] btrfs file write debugging patch
On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
Excerpts from Johannes Hirte's message of 2011-02-28 05:13:59 -0500: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. I think your problem really is more enospc related. Still working on that as well. But please don't try the patch without removing the debugging hunk at the top (anything that mentions jiffies). -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs file write debugging patch
2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied the patch at the head of this thread (with the jiffies debugging commented out) and I'm attaching a ftrace using the function_graph tracer when I'm stuck in the loop. I've just snipped out a couple of the loops (the full trace file is quite large, and mostly repititious). I'm going to try to modify file.c with some trace_printk debugging to show the values of several of the relevant variables at various stages. I'm going to try to exit the loop after 256 tries with an EFAULT so I can stop the tracing at that point and capture a trace of the entry into the problem (the ftrace ring buffer fills up too fast for me to capture the entry point). ftrace-btrfs-file-write-debugging.gz Description: GNU Zip compressed data
Re: [PATCH] btrfs file write debugging patch
2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied the patch at the head of this thread (with the jiffies debugging commented out) and I'm attaching a ftrace using the function_graph tracer when I'm stuck in the loop. I've just snipped out a couple of the loops (the full trace file is quite large, and mostly repititious). I'm going to try to modify file.c with some trace_printk debugging to show the values of several of the relevant variables at various stages. I'm going to try to exit the loop after 256 tries with an EFAULT so I can stop the tracing at that point and capture a trace of the entry into the problem (the ftrace ring buffer fills up too fast for me to capture the entry point). As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with TPK: to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line 1088. I let it loop through the while (iov_iter_count(i) 0) {} loop for 256 times when copied==0 (otherwise it would loop infinitely). Then exit out and stop the trace. For reference, here's a diff file for the debugging statements I've added to file.c: Let me know if you would like me to re-run this trial with different debugging lines. fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c --- fs/btrfs/file.c 2011-02-28 10:13:45.0 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-02-28 13:07:11.0 -0600 @@ -53,12 +53,14 @@ int offset = pos (PAGE_CACHE_SIZE - 1); int total_copied = 0; + /*** if ((jiffies % 10) == 0) return 0; if ((jiffies % 25) == 0) {
Re: [PATCH] btrfs file write debugging patch
2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Mitch Harder mitch.har...@sabayonlinux.org: 2011/2/28 Maria Wikström ma...@ponstudios.se: mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik: On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote: On Monday 28 February 2011 02:46:05 Chris Mason wrote: Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. At least it doesn't fix the emerge-problem for me. The behavior is now the same as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no further interaction to get the emerge-process hang with a svn-process consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the spawned svn-process stays and it needs a reboot to get rid of it. Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's looping? Thanks, Josef It behaves the same way here with btrfs-unstable. The output of cat /proc/$pid/wchan is 0. // Maria -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've applied the patch at the head of this thread (with the jiffies debugging commented out) and I'm attaching a ftrace using the function_graph tracer when I'm stuck in the loop. I've just snipped out a couple of the loops (the full trace file is quite large, and mostly repititious). I'm going to try to modify file.c with some trace_printk debugging to show the values of several of the relevant variables at various stages. I'm going to try to exit the loop after 256 tries with an EFAULT so I can stop the tracing at that point and capture a trace of the entry into the problem (the ftrace ring buffer fills up too fast for me to capture the entry point). As promised, I'm put together a modified file.c with many trace_printk debugging statements to augment the ftrace. The trace is ~128K compressed (about 31,600 lines or 2.6MB uncompressed), so I'm putting it up on my local server instead of attaching. Let me know if it would be more appropriate to send to the list as an attachment. http://dontpanic.dyndns.org/ftrace-btrfs-file-write-debug-v2.gz I preface all my trace_printk comments with TPK: to make skipping through the trace easier. The trace contains the trace of about 3 or 4 successful passes through the btrfs_file_aio_write() function to show what a successful trace looks like. The pass through the btrfs_file_aio_write() that breaks begins on line 1088. I let it loop through the while (iov_iter_count(i) 0) {} loop for 256 times when copied==0 (otherwise it would loop infinitely). Then exit out and stop the trace. For reference, here's a diff file for the debugging statements I've added to file.c: Let me know if you would like me to re-run this trial with different debugging lines. fs/btrfs/file.c /usr/src/linux/fs/btrfs/file.c --- fs/btrfs/file.c 2011-02-28 10:13:45.0 -0600 +++ /usr/src/linux/fs/btrfs/file.c 2011-02-28 13:07:11.0 -0600 @@ -53,12 +53,14 @@ int offset = pos (PAGE_CACHE_SIZE - 1); int total_copied = 0; + /*** if ((jiffies %
[PATCH] btrfs file write debugging patch
Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500: Some clarification on my previous message... After looking at my ftrace log more closely, I can see where Btrfs is trying to release the allocated pages. However, the calculation for the number of dirty_pages is equal to 1 when copied == 0. So I'm seeing at least two problems: (1) It keeps looping when copied == 0. (2) One dirty page is not being released on every loop even though copied == 0 (at least this problem keeps it from being an infinite loop by eventually exhausting reserveable space on the disk). Hi everyone, There are actually tow bugs here. First the one that Mitch hit, and a second one that still results in bad file_write results with my debugging hunks (the first two hunks below) in place. My patch fixes Mitch's bug by checking for copied == 0 after btrfs_copy_from_user and going the correct delalloc accounting. This one looks solved, but you'll notice the patch is bigger. First, I add some random failures to btrfs_copy_from_user() by failing everyone once and a while. This was much more reliable than trying to use memory pressure than making copy_from_user fail. If copy_from_user fails and we partially update a page, we end up with a page that may go away due to memory pressure. But, btrfs_file_write assumes that only the first and last page may have good data that needs to be read off the disk. This patch ditches that code and puts it into prepare_pages instead. But I'm still having some errors during long stress.sh runs. Ideas are more than welcome, hopefully some other timezones will kick in ideas while I sleep. diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7084140..89a6a26 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -54,6 +54,13 @@ static noinline int btrfs_copy_from_user(loff_t pos, int num_pages, int offset = pos (PAGE_CACHE_SIZE - 1); int total_copied = 0; + if ((jiffies % 10) == 0) + return 0; + + if ((jiffies % 25) == 0) { + write_bytes /= 2; + } + while (write_bytes 0) { size_t count = min_t(size_t, PAGE_CACHE_SIZE - offset, write_bytes); @@ -763,6 +770,26 @@ out: } /* + * on error we return an unlocked page and the error value + * on success we return a locked page and 0 + */ +static int prepare_uptodate_page(struct page *page, u64 pos) +{ + int ret = 0; + if ((pos (PAGE_CACHE_SIZE - 1)) !PageUptodate(page)) { + ret = btrfs_readpage(NULL, page); + if (ret) + return ret; + lock_page(page); + if (!PageUptodate(page)) { + unlock_page(page); + return -EIO; + } + } + return 0; +} + +/* * this gets pages into the page cache and locks them down, it also properly * waits for data=ordered extents to finish before allowing the pages to be * modified. @@ -777,6 +804,7 @@ static noinline int prepare_pages(struct btrfs_root *root, struct file *file, unsigned long index = pos PAGE_CACHE_SHIFT; struct inode *inode = fdentry(file)-d_inode; int err = 0; + int faili = 0; u64 start_pos; u64 last_pos; @@ -794,15 +822,24 @@ again: for (i = 0; i num_pages; i++) { pages[i] = grab_cache_page(inode-i_mapping, index + i); if (!pages[i]) { - int c; - for (c = i - 1; c = 0; c--) { - unlock_page(pages[c]); - page_cache_release(pages[c]); - } - return -ENOMEM; + faili = i - 1; + err = -ENOMEM; + goto fail; + } + + if (i == 0) + err = prepare_uptodate_page(pages[i], pos); + else if (i == num_pages - 1) + err = prepare_uptodate_page(pages[i], + pos + write_bytes); + if (err) { + page_cache_release(pages[i]); + faili = i - 1; + goto fail; } wait_on_page_writeback(pages[i]); } + err = 0; if (start_pos inode-i_size) { struct btrfs_ordered_extent *ordered; lock_extent_bits(BTRFS_I(inode)-io_tree, @@ -842,6 +879,14 @@ again: WARN_ON(!PageLocked(pages[i])); } return 0; +fail: + while (faili = 0) { + unlock_page(pages[faili]); + page_cache_release(pages[faili]); + faili--; + } + return err; + } static ssize_t btrfs_file_aio_write(struct kiocb *iocb, @@ -851,7 +896,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb