Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0

2012-02-01 Thread Li Zefan
Jan Schmidt wrote:
 On 30.01.2012 07:33, Li Zefan wrote:
 Jan Schmidt wrote:
 I was looking at the clone range ioctl and have some remarks:

 On 27.01.2011 09:46, Li Zefan wrote:
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index f87552a..1b61dab 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file 
 *file, unsigned long srcfd,
  
memcpy(new_key, key, sizeof(new_key));
new_key.objectid = inode-i_ino;
 -  new_key.offset = key.offset + destoff - off;
 +  if (off = key.offset)
 +  new_key.offset = key.offset + destoff - off;
 +  else
 +  new_key.offset = destoff;
  ^^^
 1) This looks spurious to me. What if destoff isn't aligned? That's what
 the key.offset - off code above is for. Before the patch, the code
 didn't work at all, I agree. But this fix can only work for aligned
 requests.

 2) The error in new_key also has propagated to the extent item's backref
 and wasn't fixed there. I did a range clone and ended up with an extent
 item like that:
 item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047
 itemsize 169
 extent refs 8 gen 11103 flags 1
 [...]
 extent data backref root 257 objectid 272 offset
 18446744073709494272 count 1

 The last offset (equal to -14 * 4k) is obviously wrong. I didn't figure
 out how the variables are computed, but it looks like there's something
 wrong with the datao u64 to me.


 Unfortunately this is expected. The calculation is:

 extent_item.extent_data_ref.offset = file_pos - file_extent.extent_offset

 so you may get negative offset.
 
 I see where the negative offset comes from. But what can this offset be
 used for?
 
 The design idea was to reduce the number of extent backrefs in that
 a data backref can point to different file extents in the same file
 (in this case the count field  1). We didn't expect nagetive
 offset until range clone was implemented.
 
 Reducing the number of backrefs is a good thing. In case of count  1,
 it's clear that the offset cannot reference all of the extent data
 items. We have different design choices:
 
 a) Use the above computation and leave the filesystem with an unusable
 offset value for extent backrefs.
 
 b) Use either one of the extent data item offsets this backref references.
 
 c) Always use a predefined constant (like 0 or -1) when count  1.
 
 d) Disallow count  1 for those refs and turn them into shared refs as
 soon as count gets  1.
 

I expressed the same doubt. See this thread:

http://marc.info/?t=13142591281r=1w=2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-01 Thread Chris Mason
On Sun, Jan 29, 2012 at 04:37:54PM -0800, Marc MERLIN wrote:
 Howdy,
 
 I'm considering using brtfs for my new laptop install.
 
 Encryption is however a requirement, and ecryptfs doesn't quite cut it for
 me, so that leaves me with dmcrypt which is what I've been using with
 ext3/ext4 for years.
 
 https://btrfs.wiki.kernel.org/articles/g/o/t/Gotchas.html 
 still states that 
 'dm-crypt block devices require write-caching to be turned off on the
 underlying HDD'
 While the report was for 2.6.33, I'll assume it's still true.
 
 
 I was considering migrating to a recent 256GB SSD and 3.2.x kernel.
 
 First, I'd like to check if the 'turn off write cache' comment is still
 accurate and if it does apply to SSDs too.
 
 Second, I was wondering if anyone is running btrfs over dmcrypt on an SSD
 and what the performance is like with write cache turned off (I'm actually
 not too sure what the impact is for SSDs considering that writing to flash
 can actually be slower than writing to a hard drive).

Performance without the cache on is going to vary wildly from one SSD to
another.  Some really need it to give them nice fat writes while others
do better on smaller writes.  It's best to just test yours and see.

With a 3.2 kernel (it really must be 3.2 or higher), both btrfs and dm
are doing the right thing for barriers.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs bug

2012-02-01 Thread Mitch Harder
On Tue, Jan 31, 2012 at 11:20 PM, Thomas Weber
thomas.weber.li...@googlemail.com wrote:
 Hello Mitch,

 I have good news for you. I looked through all log files and found in the
 everything.log the following:

 Regards,
 Thomas


 Jan 31 05:12:24 localhost kernel: [87276.968049] btrfs memmove bogus
 src_offset 1870 move len 687876531 len 4096
 Jan 31 05:12:24 localhost kernel: [87276.968136] [ cut here
 ]
 Jan 31 05:12:24 localhost kernel: [87276.968222] kernel BUG at
 fs/btrfs/extent_io.c:4357!
 Jan 31 05:12:24 localhost kernel: [87276.968296] invalid opcode:  [#1]
 PREEMPT SMP

[...snip...]

This is coming from a BUG_ON(1) in the memcpy_extent_buffer() function
in extent_io.c

if (src_offset + len  dst-len) {
printk(KERN_ERR btrfs memmove bogus src_offset %lu move 
   len %lu dst len %lu\n, src_offset, len, dst-len);
BUG_ON(1);
}

So, since (1870 + 687876531)  4096, the BUG_ON was triggered.

There are two calls to memcpy_extent_buffer() from
setup_items_for_insert (the next function back in the callback shown
from the BUG_ON), so that part makes sense, at least.

I don't know if anybody else has anything to say on this, but my best
guess is that this btrfs volume has picked up some corruptions that
are feeding in some bad values.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.2.1] BUG at fs/btrfs/inode.c:1588

2012-02-01 Thread Kai Krakow
Interestingly, the filesystem was not unmountable - system hung. After 
reisub and checking again with btrfs scrub no errors where reported and it 
just rsync'ed fine this time. This does not make sense to me.

In any case here's my backup script although I see nothing special with it:

#!/bin/bash
DATE=$(date +%Y%m%d-%H%M)
BASEDIR=/mnt/usb-backup
ionice -c3 -p$$
mount ${BASEDIR}  (
  [ -d ${BASEDIR}/snapshots/system-${DATE} ] || (
time rsync -avAXH --delete --inplace --no-whole-file --stats \
  --exclude /proc/ \
  --exclude /dev/ \
  --exclude /sys/ \
  --exclude /boot/ \
  --exclude /media/ \
  --exclude /mnt/ \
  / ${BASEDIR}/current/
btrfs subvolume snapshot \
  ${BASEDIR}/current \
  ${BASEDIR}/snapshots/system-${DATE}
btrfs filesystem sync ${BASEDIR}
  )
  umount /mnt/usb-backup
)


Kai Krakow hurikhan77+bt...@gmail.com schrieb:

 Just happened while writing a huge avi file to my usb3 backup disk:
 
 [356036.596292] [ cut here ]
 [356036.596300] kernel BUG at fs/btrfs/inode.c:1588!
 [356036.596304] invalid opcode:  [#1] SMP
 [356036.596307] CPU 2
 [356036.596309] Modules linked in: vmnet(O) vmblock(O) vsock(O) vmci(O)
 vmmon(O) af_packet snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
 snd_mixer_oss nls_iso8859_15 nls_cp437 vfat fat btusb bluetooth zram(C)
 loop snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device
 gspca_sonixj gspca_main videodev v4l2_compat_ioctl32 pcspkr i2c_i801 evdev
 unix fuse xfs nfs nfs_acl auth_rpcgss lockd sunrpc reiserfs scsi_wait_scan
 hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony
 hid_cherry hid_belkin hid_apple hid_a4tech usbhid usb_storage hid sr_mod
 cdrom sg pata_cmd64x [last unloaded: microcode]
 [356036.596346]
 [356036.596349] Pid: 28747, comm: btrfs-fixup-1 Tainted: G C O
 3.2.1-gentoo-r2 #1 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3
 [356036.596355] RIP: 0010:[811605fe]  [811605fe]
 btrfs_writepage_fixup_worker+0xde/0x121
 [356036.596363] RSP: :8801e2379de0  EFLAGS: 00010246
 [356036.596366] RAX:  RBX: ea00019b1a00 RCX:
 
 [356036.596370] RDX:  RSI:  RDI:
 88008a1bbb40
 [356036.596373] RBP: 033fd000 R08: 8801e2379d2c R09:
 000180240024
 [356036.596377] R10:  R11: bf80bf80 R12:
 88008a1bbc10
 [356036.596380] R13:  R14: 8801e2379df8 R15:
 033fdfff
 [356036.596384] FS:  () GS:88043fb0()
 knlGS:
 [356036.596387] CS:  0010 DS:  ES:  CR0: 8005003b
 [356036.596391] CR2: 7f5ef966 CR3: 0003253f2000 CR4:
 000406e0
 [356036.596394] DR0:  DR1:  DR2:
 
 [356036.596398] DR3:  DR6: 0ff0 DR7:
 0400
 [356036.596401] Process btrfs-fixup-1 (pid: 28747, threadinfo
 8801e2378000, task 8802d2160650)
 [356036.596405] Stack:
 [356036.596407]  88008a1bbab0 88026d847540 
 88003a7c1f50
 [356036.596412]   88019bf62d80 88019bf62dd0
 88019bf62d98
 [356036.596417]  88019bf62da8 8802d2160650 88019bf62d88
 8117f23f
 [356036.596421] Call Trace:
 [356036.596426]  [8117f23f] ? worker_loop+0x170/0x485
 [356036.596431]  [8117f0cf] ? btrfs_queue_worker+0x272/0x272
 [356036.596435]  [8117f0cf] ? btrfs_queue_worker+0x272/0x272
 [356036.596439]  [810489fb] ? kthread+0x7a/0x82
 [356036.596445]  [81444634] ? kernel_thread_helper+0x4/0x10
 [356036.596449]  [81048981] ? kthread_worker_fn+0x135/0x135
 [356036.596453]  [81444630] ? gs_change+0xb/0xb
 [356036.596456] Code: 00 00 4c 89 f1 48 8b 3c 24 e8 67 4f 01 00 48 89 df
 [e8
 b2 70 f2 ff ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 bf 29 01 00 e9 4b ff ff ff
 0f 0b 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48 89 ee 48 8b 3c 24
 [356036.596478] RIP  [811605fe]
 btrfs_writepage_fixup_worker+0xde/0x121
 [356036.596483]  RSP 8801e2379de0
 [356036.653626] ---[ end trace 9fa19a7644192fb6 ]---
 
 btrfsck now finds many of these:
 
 jupiter ~ # btrfsck /dev/sde1
 root 256 inode 12746 errors 400
 root 256 inode 12747 errors 400
 root 256 inode 12748 errors 400
 root 256 inode 12749 errors 400
 root 256 inode 17141 errors 400
 root 256 inode 219966 errors 400
 root 256 inode 224243 errors 400
 root 256 inode 225245 errors 400
 root 256 inode 225354 errors 400
 root 256 inode 290639 errors 2000
 root 256 inode 291751 errors 2000
 
 This disk is used for nothing else then the following cycle:
 
 1. mount it (compress-force=gzip)
 2. run rsync to backup my system (using cow-friendly rsync flags)
 3. create a snapshot
 4. unmount it
 
 So that error must have been introduced simply by running rsync. It has
 plenty of free space (about 800 GB).
 
 --
 To 

Re: brtfs on top of dmcrypt with SSD. No corruption iff write cache off?

2012-02-01 Thread Marc MERLIN
On Wed, Feb 01, 2012 at 12:56:24PM -0500, Chris Mason wrote:
  Second, I was wondering if anyone is running btrfs over dmcrypt on an SSD
  and what the performance is like with write cache turned off (I'm actually
  not too sure what the impact is for SSDs considering that writing to flash
  can actually be slower than writing to a hard drive).
 
 Performance without the cache on is going to vary wildly from one SSD to
 another.  Some really need it to give them nice fat writes while others
 do better on smaller writes.  It's best to just test yours and see.
 
 With a 3.2 kernel (it really must be 3.2 or higher), both btrfs and dm
 are doing the right thing for barriers.

Thanks for the answer.
Can you confirm that I still must disable write cache on the SSD to avoid
corruption with btrfs on top of dmcrypt, or is there a chance that it just
works now?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html