Re: corrupted file size on inline extent conversion?

2013-02-04 Thread Sage Weil
On Wed, 30 Jan 2013, Josef Bacik wrote:
 On Wed, Jan 30, 2013 at 11:30:49AM -0700, Mike Lowe wrote:
  I've been running rsync against a rbd device backed by btrfs filesystems 
  that are about 11% full for about 45 minutes before I checked and noticed 
  the printk message.  That was the first go with the patch.  Seems like I 
  was able to get by without any problems until the btrfs filesystems got 
  some use and filled up a little bit.
  
 
 Ok since you are seeing the message I'll go ahead and post the patch and 
 get it moving along, let me know if you still see the problem.  Thanks,

Awesome.  Mike still hasn't seen a reocurrence, so it's looking like the 
patch is good.

Thanks so much!
sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted file size on inline extent conversion?

2013-01-30 Thread Mike Lowe
Well I found this, so I think it's likely:

root@gwboss2:~# dmesg |grep bitten
[ 3196.193238] this would have bitten us in the ass
[ 3196.193784] this would have bitten us in the ass

On Jan 29, 2013, at 9:54 AM, Josef Bacik jba...@fusionio.com wrote:

 On Mon, Jan 28, 2013 at 05:12:12PM -0700, Sage Weil wrote:
 A ceph user observed a incorrect i_size on btrfs.  The pattern looks like 
 this:
 
 - some writes at low file offsets
 - a write to 4185600 len 8704 (i_size should be 4MB)
 - more writes to low offsets
 - a write to 4181504 len 4096 (abutts the write above)
 - a bit of time goes by...
 - stat returns 4186112 (4MB - 8192)
 - that's a fwe bytes to the right of the top write above.
 
 There are some logs showing the full read/write activity to the file at
 
  http://tracker.newdream.net/attachments/658/object_log.txt
 
 on issue
 
  http://tracker.newdream.net/issues/3810
 
 The kernel was 3.7.0-030700-generic (and probably also observed on 3.7.1).
 
 Is this a known bug?
 
 Not known but I took a long hard look at our ordered i size updating and I 
 think
 I spotted the bug.  Could you run this patch and see if you get the printk?  
 If
 you do then that was the problem and you should be good to go.  It definitely
 needs to be fixed, hopefully it's also your bug.  Thanks,
 
 Josef
 
 
 diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
 index cbd4838..dbd4905 100644
 --- a/fs/btrfs/ordered-data.c
 +++ b/fs/btrfs/ordered-data.c
 @@ -895,8 +895,14 @@ int btrfs_ordered_update_i_size(struct inode *inode, u64 
 offset,
* if the disk i_size is already at the inode-i_size, or
* this ordered extent is inside the disk i_size, we're done
*/
 - if (disk_i_size == i_size || offset = disk_i_size) {
 + if (disk_i_size == i_size)
   goto out;
 +
 + if (offset = disk_i_size) {
 + if (ordered  ordered-outstanding_isize  disk_i_size)
 + printk(KERN_ERR this would have bitten us in the 
 ass\n);
 + else
 + goto out;
   }
 
   /*

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted file size on inline extent conversion?

2013-01-30 Thread Josef Bacik
On Wed, Jan 30, 2013 at 11:17:25AM -0700, Mike Lowe wrote:
 Well I found this, so I think it's likely:
 
 root@gwboss2:~# dmesg |grep bitten
 [ 3196.193238] this would have bitten us in the ass
 [ 3196.193784] this would have bitten us in the ass
 

Well that makes me happy since I had almost talked myself out of this being a
possiblity.  How long did it take you to hit this problem before and how long
have you been running with this patch?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted file size on inline extent conversion?

2013-01-30 Thread Mike Lowe
I've been running rsync against a rbd device backed by btrfs filesystems that 
are about 11% full for about 45 minutes before I checked and noticed the printk 
message.  That was the first go with the patch.  Seems like I was able to get 
by without any problems until the btrfs filesystems got some use and filled up 
a little bit.

On Jan 30, 2013, at 1:22 PM, Josef Bacik jba...@fusionio.com wrote:

 On Wed, Jan 30, 2013 at 11:17:25AM -0700, Mike Lowe wrote:
 Well I found this, so I think it's likely:
 
 root@gwboss2:~# dmesg |grep bitten
 [ 3196.193238] this would have bitten us in the ass
 [ 3196.193784] this would have bitten us in the ass
 
 
 Well that makes me happy since I had almost talked myself out of this being a
 possiblity.  How long did it take you to hit this problem before and how long
 have you been running with this patch?  Thanks,
 
 Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted file size on inline extent conversion?

2013-01-30 Thread Josef Bacik
On Wed, Jan 30, 2013 at 11:30:49AM -0700, Mike Lowe wrote:
 I've been running rsync against a rbd device backed by btrfs filesystems that 
 are about 11% full for about 45 minutes before I checked and noticed the 
 printk message.  That was the first go with the patch.  Seems like I was able 
 to get by without any problems until the btrfs filesystems got some use and 
 filled up a little bit.
 

Ok since you are seeing the message I'll go ahead and post the patch and get it
moving along, let me know if you still see the problem.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted file size on inline extent conversion?

2013-01-29 Thread Josef Bacik
On Mon, Jan 28, 2013 at 05:12:12PM -0700, Sage Weil wrote:
 A ceph user observed a incorrect i_size on btrfs.  The pattern looks like 
 this:
 
 - some writes at low file offsets
 - a write to 4185600 len 8704 (i_size should be 4MB)
 - more writes to low offsets
 - a write to 4181504 len 4096 (abutts the write above)
 - a bit of time goes by...
 - stat returns 4186112 (4MB - 8192)
  - that's a fwe bytes to the right of the top write above.
 
 There are some logs showing the full read/write activity to the file at
 
   http://tracker.newdream.net/attachments/658/object_log.txt
 
 on issue
 
   http://tracker.newdream.net/issues/3810
 
 The kernel was 3.7.0-030700-generic (and probably also observed on 3.7.1).
 
 Is this a known bug?

Not known but I took a long hard look at our ordered i size updating and I think
I spotted the bug.  Could you run this patch and see if you get the printk?  If
you do then that was the problem and you should be good to go.  It definitely
needs to be fixed, hopefully it's also your bug.  Thanks,

Josef


diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index cbd4838..dbd4905 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -895,8 +895,14 @@ int btrfs_ordered_update_i_size(struct inode *inode, u64 
offset,
 * if the disk i_size is already at the inode-i_size, or
 * this ordered extent is inside the disk i_size, we're done
 */
-   if (disk_i_size == i_size || offset = disk_i_size) {
+   if (disk_i_size == i_size)
goto out;
+
+   if (offset = disk_i_size) {
+   if (ordered  ordered-outstanding_isize  disk_i_size)
+   printk(KERN_ERR this would have bitten us in the 
ass\n);
+   else
+   goto out;
}
 
/*
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


corrupted file size on inline extent conversion?

2013-01-28 Thread Sage Weil
A ceph user observed a incorrect i_size on btrfs.  The pattern looks like 
this:

- some writes at low file offsets
- a write to 4185600 len 8704 (i_size should be 4MB)
- more writes to low offsets
- a write to 4181504 len 4096 (abutts the write above)
- a bit of time goes by...
- stat returns 4186112 (4MB - 8192)
 - that's a fwe bytes to the right of the top write above.

There are some logs showing the full read/write activity to the file at

http://tracker.newdream.net/attachments/658/object_log.txt

on issue

http://tracker.newdream.net/issues/3810

The kernel was 3.7.0-030700-generic (and probably also observed on 3.7.1).

Is this a known bug?

Thanks!
sage

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html