[PATCH 1/1] Btrfs: Remove redundant NULL check before kfree

2015-06-22 Thread Maninder Singh
There is no need of NULL check before kfree,
removing the same

Signed-off-by: Maninder Singh 
Reviewed-by: Akhilesh Kumar 
---
 fs/btrfs/free-space-cache.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 9dbe5b5..88f1e16 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2101,8 +2101,7 @@ new_bitmap:
 
 out:
if (info) {
-   if (info->bitmap)
-   kfree(info->bitmap);
+   kfree(info->bitmap);
kmem_cache_free(btrfs_free_space_cachep, info);
}
 
@@ -3561,8 +3560,7 @@ again:
 
if (info)
kmem_cache_free(btrfs_free_space_cachep, info);
-   if (map)
-   kfree(map);
+   kfree(map);
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


BTRFS balance fails with -dusage=100

2015-06-22 Thread Moby
OpenSuSE 13.2 system with single BTRFS / mounted on top of /dev/md1.  
/dev/md1 is md raid5 across 4 SATA disks.


System details are:

Linux suse132 4.0.5-4.g56152db-default #1 SMP Thu Jun 18 15:11:06 UTC 
2015 (56152db) x86_64 x86_64 x86_64 GNU/Linux


btrfs-progs v4.1+20150622

Label: none  uuid: 33b98d97-606b-4968-a266-24a48a9fe50d
Total devices 1 FS bytes used 884.21GiB
devid1 size 1.36TiB used 889.06GiB path /dev/md1


Data, single: total=885.00GiB, used=883.12GiB
System, DUP: total=32.00MiB, used=144.00KiB
Metadata, DUP: total=2.00GiB, used=1.09GiB
GlobalReserve, single: total=384.00MiB, used=0.00B


Relevant entries from log are:
2015-06-22T22:46:32.238011-05:00 suse132 kernel: [90193.446128] BTRFS: 
bdev /dev/md1 errs: wr 9977, rd 0, flush 0, corrupt 0, gen 0
2015-06-22T22:46:32.238050-05:00 suse132 kernel: [90193.446158] BTRFS: 
bdev /dev/md1 errs: wr 9978, rd 0, flush 0, corrupt 0, gen 0
2015-06-22T22:46:32.238054-05:00 suse132 kernel: [90193.446179] BTRFS: 
bdev /dev/md1 errs: wr 9979, rd 0, flush 0, corrupt 0, gen 0


System was (still is - other than btrfs balance) running fine.  Then I 
did massive data I/O, copying and deleting and massive amounts of data 
to bring the system into it's present state.  Once I was done with the 
I/O, kicked off btrfs balance start /.

Above command failed.  Then I started doing btrfs balance -dusage=XX /
This command succeeds  with XX upto and including 99.  It fails when I 
set XX to 100.  btrfs balance also fails if I omit the -dusage option.
The errors in the log make no sense to me since the md raid device is 
not reporting any errors at all.  Also running btrfs scrub reports no 
errors at all.


Any ideas on how to get btrfs balance to succeed without errors would be 
welcome.


Regards,

--Moby
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: NULL pointer dereference during snapshot removal

2015-06-22 Thread Liu Bo
On Sat, Jun 20, 2015 at 04:53:24PM +0200, Christoph Biedl wrote:
> Hi there,
> 
> I'm having trouble with btrfs where removing a snapshot causes a
> kernel Oops at blk_get_backing_dev_info+0x10/0x1c (plus or minus a
> byte bytes). Is this a known issue? Else I'll dig further. Stack
> traces below.

Can you use gdb to locate the line of blk_get_backing_dev_info+0x10/0x1c?

Although the stack trace comes from btrfs, btrfs doesn't play with
inode's bdi.

Thanks,

-liubo
> 
> In general these snapshot operations work as expected. In a specific
> setup they fail every time. I can try to trim this down to a simple
> and public reproducer but I expect this will take some time. Basically
> this is a private Debian buildd using sbuild/schroot with btrfs
> snapshots. Building a certain package results in the trouble. That
> package is not public but does a lot of nasty things during the build,
> including probing block devices[1]. The build runs as expected, the
> cleanup however does not.
> 
> * btrfs-tools is v3.17
> * kernel is the latest 4.0.x stable series. Note even yesterday's 
>   4.0.6-rc1 is affected.
> * userland is both Debian wheezy and jessie
> * the build chroot is Debian jessie, Debian wheezy is not affected
> 
> Christoph
> 
> [1] Those who are familiar with sbuild: Build dependencies include
> dmsetup, lvm2, mdadm, and udev. Starting daemons is disabled
> by an according policy-rd.d sniplet but I expect somebody isn't
> playing nice here. An still, this must not affect btrfs is such a
> way.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0204
> pgd = ec0b8000
> [0204] *pgd=6e22f831, *pte=, *ppte=
> Internal error: Oops: 17 [#1] SMP ARM
> Modules linked in: nfsd btrfs xor raid6_pq sunxi_sid
> CPU: 1 PID: 7351 Comm: btrfs Not tainted 4.0.6-rc1 #1
> Hardware name: Allwinner sun7i (A20) Family
> task: eca16040 ti: e1022000 task.ti: e1022000
> PC is at blk_get_backing_dev_info+0x10/0x1c
> LR is at inode_to_bdi+0x38/0x48
> pc : []lr : []psr: 20070013
> sp : e1023b60  ip : e1023b70  fp : e1023b6c
> r10: e16e51c8  r9 : 7fff  r8 : 
> r7 :   r6 :   r5 : edc03890  r4 : ee027000
> r3 :   r2 :   r1 : 7fff  r0 : edc03800
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c5387d  Table: 6c0b806a  DAC: 0015
> Process btrfs (pid: 7351, stack limit = 0xe1022218)
> Stack: (0xe1023b60 to 0xe1024000)
> 3b60: e1023b84 e1023b70 c012b794 c02df058  edc03964 e1023bbc e1023b88
> 3b80: c00bd708 c012b768 7fff     7fff
> 3ba0: 0001   7fff e1023be4 e1023bc0 c00be5c0 c00bd6d0
> 3bc0:  7fff 0001 e58a2910 e16e51c8 7fff e1023c14 e1023be8
> 3be0: bf14d354 c00be5a8  7fff   fffe 
> 3c00:  e16e50b0 e1023c5c e1023c18 bf1530b8 bf14d334  7fff
> 3c20:  7fff     e16e51c8 
> 3c40:   e16e50b0 e16e50cc e1023ccc e1023c60 bf140e1c bf153028
> 3c60:   e1023cb4 e1023c78 c012ae1c c005e134 e16e5234 0007
> 3c80:   1000 ec5f7800 e1023c90 e1023c90 c09ca300 e16e51c8
> 3ca0: e16e5270 e16e51c8 e16e5270 c09ca300 bf1c28d4 015e  ec5f7800
> 3cc0: e1023cec e1023cd0 c011e338 bf140ba0 e16e51c8 ed4ba800 e16e5218 bf1c28d4
> 3ce0: e1023d0c e1023cf0 c011eed4 c011e294 e16e513c ec5f7b50 e16e51c8 
> 3d00: e1023d3c e1023d10 bf14132c c011ed5c 2dc0a000 ec942000 ec645000 ec5f7800
> 3d20: eb04fc38 eb0b9920 ec826dc0  e1023dcc e1023d40 bf173e88 bf14117c
> 3d40: 0139  ea52f388 0038 c0a15380 ec5f7800 eb04fc38 ec5f7b68
> 3d60: ede805d8 c00c3794 eb0b9990 ede6abd8 ec645000 0004  
> 3d80:   ed9f6600 00060006 00070001   
> 3da0: 00024800 ede6ab68 ec826dc0 ec645000 5000940f ede6ab68 bea3d7a8 ec826dc0
> 3dc0: e1023ef4 e1023dd0 bf177408 bf1738c8 c09cb880 ee02fe00 eea7adb4 ed81d778
> 3de0: eea7adb4 ed81d740 eea7adb4 0136c000 ed81d778 eea7adb4 e1023e1c e1023e08
> 3e00: 0103 ed5553f8 0136c000 ed81d778 e1023eb4 e1023e20 c00e11e0 c001d3b4
> 3e20: 0024 ec826dc0   ede6ab68 e1023e40 c0110680 ec826dc0
> 3e40: e1023ed0 e1023f5c ec0b8048  0040 05b0 016c 0009
> 3e60: c0112e54 c010e3e4 e1023e94 b6dd e1023f40 bea3d6b0 0079 e9dd1740
> 3e80: e1023fb0 ee02fe00 e1023eb4 e1023fb0 ed81d740 eca16040 0136c0e4 ed5553f8
> 3ea0: ed81d77c 0817 e1023f04 e1023eb8 c001c8f8 c0060268 e1023f4c e1023ec8
> 3ec0: c0113e88 c0112dc8 0043 ede6ab68 ec826dc0 bea3d7a8 5000940f 0003
> 3ee0: e1022000  e1023f7c e1023ef8 c011607c bf175fd8 e1023fac e1023f08
> 3f00: c0008588 c001c79c ede6ab68 4020 c09cbc34 ec942000 ec942000 ec826dc0
> 3f20: 4020 ede6ab68 e1023f4c e1023f38 c01134c4 c00f8348 eca16040 0003
> 3f40: e1023f94 e1023f50 e102

Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace

2015-06-22 Thread wangyf

Hi,
I have tested your PATCH v2 , but something wrong happened.

kernel: 4.1.0-rc7+ with your five patches
vitrualBox ubuntu14.10-server + LVM

I make a new btrfs.ko with your patches,
rmmod original module and insmod the new.

When I use the profile RAID1/10, mkfs successfully
But when mount the fs, dmesg dumped:
trans: 18446612133975020584 running 5
btrfs transid mismatch buffer 29507584, found 18446612133975020584 
running 5
btrfs transid mismatch buffer 29507584, found 18446612133975020584 
running 5
btrfs transid mismatch buffer 29507584, found 18446612133975020584 
running 5

... ...

When use the RAID5/6, mkfs and mount
system stoped at the 'mount -t btrfs /dev/mapper/server-dev1 /mnt' cmd.

That's all.





在 2015年06月20日 02:52, Omar Sandoval 写道:

Hi,

Here's version 2 of the missing device RAID 5/6 fixes. The original
problem was reported by a user on Bugzilla: the kernel crashed when
attempting to replace a missing device in a RAID 6 filesystem. This is
detailed and fixed in patch 4. After the initial posting, Zhao Lei
reported a similar issue when doing a scrub on a RAID 5 filesystem with
a missing device. This is fixed in the added patch 5.

My new-and-improved-and-overengineered reproducer as well as Zhao Lei's
reproducer can be found below.

Thanks!

v1: http://article.gmane.org/gmane.comp.file-systems.btrfs/45045
v1->v2:
- Add missing scrub_wr_submit() in scrub_missing_raid56_worker()
- Add clarifying comment in dev->missing case of scrub_stripe()
   (Zhaolei)
- Add fix for scrub with missing device (patch 5)

Omar Sandoval (5):
   Btrfs: remove misleading handling of missing device scrub
   Btrfs: count devices correctly in readahead during RAID 5/6 replace
   Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
   Btrfs: fix device replace of a missing RAID 5/6 device
   Btrfs: fix parity scrub of RAID 5/6 with missing device

  fs/btrfs/raid56.c |  87 ---
  fs/btrfs/raid56.h |  10 ++-
  fs/btrfs/reada.c  |   4 +-
  fs/btrfs/scrub.c  | 202 +-
  4 files changed, 259 insertions(+), 44 deletions(-)

Reproducer 1:


#!/bin/bash

usage () {
USAGE_STRING="Usage: $0 [OPTION]...
Options:
   -mfailure mode; MODE is 'eio', 'missing', or 'corrupt' (defaults to
 'missing')
   -nnumber of files to write, each twice as big as the last, the first
 being 1M in size (defaults to 4)
   -ooperation to perform; OP is 'replace' or 'scrub' (defaults to
 'replace')
   -rRAID profile; RAID is 'raid0', 'raid1', 'raid10', 'raid5', or 'raid6'
 (defaults to 'raid5')

Miscellaneous:
   -hdisplay this help message and exit"

case "$1" in
out)
echo "$USAGE_STRING"
exit 0
;;
err)
echo "$USAGE_STRING" >&2
exit 1
;;
esac
}

MODE=missing
RAID=raid5
OP=replace
NUM_FILES=4

while getopts "m:n:o:r:h" OPT; do
case "$OPT" in
m)
MODE="$OPTARG"
;;
r)
RAID="$OPTARG"
;;
o)
OP="$OPTARG"
;;
n)
NUM_FILES="$OPTARG"
if [[ ! "$NUM_FILES" =~ ^[0-9]+$ ]]; then
usage "err"
fi
;;
h)
usage "out"
;;
*)
usage "err"
;;
esac
done

case "$MODE" in
eio|missing|corrupt)
;;
*)
usage err
;;
esac

case "$RAID" in
raid[01])
NUM_RAID_DISKS=2
;;
raid10)
NUM_RAID_DISKS=4
;;
raid5)
NUM_RAID_DISKS=3
;;
raid6)
NUM_RAID_DISKS=4
;;
*)
usage err
;;
esac

case "$OP" in
replace)
NUM_DISKS=$((NUM_RAID_DISKS + 1))
;;
scrub)
NUM_DISKS=$NUM_RAID_DISKS
;;
*)
usage err
;;
esac

echo "Running $OP on $RAID with $MODE"

SRC_DISK=$((NUM_RAID_DISKS - 1))
TARGET_DISK=$((NUM_DISKS - 1))
NUM_SECTORS=$((1024 * 1024))
LOOP_DEVICES=()
DM_DEVICES=()

cleanup () {
echo "Done. Press enter to cleanup..."
read
if findmnt /mnt; then
umount /mnt
fi
for DM in "${DM_DEVICES[@]}"; do
dmsetup remove "$DM"
done
for LOOP in "${LOOP_DEVICES[@]}"; do
losetup --detach "$LOOP"
done
for ((i = 

Re: qgroup limit clearing, was Re: Btrfs progs release 4.1

2015-06-22 Thread Qu Wenruo



Tsutomu Itoh wrote on 2015/06/23 08:55 +0900:

On 2015/06/23 3:18, Christian Robottom Reis wrote:

On Mon, Jun 22, 2015 at 05:00:23PM +0200, David Sterba wrote:

   - qgroup:
 - show: distinguish no limits and 0 limit value
 - limit: ability to clear the limit


I'm using kernel 4.1-rc7 as per:

 root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# uname -a
 Linux riff 4.1.0-040100rc7-generic #201506080035 SMP Mon Jun 8
04:36:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

But apart from still having major issues with qgroups (quota enforcement
triggers even when there seems to be plenty of free space) clearing
limits with btrfs-progs 4.1 doesn't revert back to 'none', instead
confusingly setting the quota to 16EiB. Using:

 root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# btrfs
version
 btrfs-progs v4.1

I start from:

 qgroupid rfer excl max_rfer max_excl
     
 0/5   2.15GiB  1.95GiB none none
 0/261 1.42GiB  1.11GiB none100.00GiB
 0/265 1.09GiB600.59MiB none100.00GiB
 0/271   793.32MiB366.40MiB none100.00GiB
 0/274   514.96MiB142.92MiB none100.00GiB

I then issue:

 root@riff# btrfs qgroup limit -e none 261 /var
 root@riff# btrfs qgroup limit none 261 /var

I end up with:

 qgroupid rfer excl max_rfer max_excl
     
 0/5   2.15GiB  1.95GiB none none
 0/261 1.42GiB  1.11GiB 16.00EiB 16.00EiB
 0/265 1.09GiB600.59MiB none100.00GiB
 0/271   793.32MiB366.40MiB none100.00GiB
 0/274   514.96MiB142.92MiB none100.00GiB

Is that expected?



The following fix is necessary for the kernel to display it correctly.

  [PATCH] btrfs: qgroup: allow user to clear the limitation on qgroup
  http://marc.info/?l=linux-btrfs&m=143331495409594&w=2

Thanks,
Tsutomu
I'll send a new pull request containing this patch when we done the full 
test.


The pull will be mainly consisted of small cleanup and bug fixes, so it 
should be quite safe, but I still want to make sure it's completely safe 
anyway.


Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: Corrupted btrfs partition (converted from ext4) after balance

2015-06-22 Thread Qu Wenruo



Vianney Stroebel wrote on 2015/06/19 01:55 +0200:

One of my btrfs partition seems to have been corrupted.

Since I've tried to balance it, I can only mount it read-only. I have
been able to use it read-only without problem so far so the data seems
safe.

When I remove the "ro" option, the "mount" command hangs and some
programs do not function properly (iotop hangs too and Firefox cannot
load new web pages). Every few seconds a message is printed on syslog
(see attached file).

If I try to terminate the "mount" process with ctrl+c, my whole system
hangs.

This partition was converted from ext4 and I could use it fine after
that. It got corrupted when I tried to balance it a few days ago (even
though I think I had balanced it before, but I'm not sure about this).
The balance would seem to have started but "balance status" showed no
progress even after one hour.

This partition is on one hard disk (no raid). Mount options:
defaults,compress=lzo,noatime,nodiratime,noauto,ro). My system also runs
on btrfs on another disk (ssd) without any problems apart from quite
poor performance (but that's for another post).

The command "btrfs scrub start /_big -r" hangs my system.

The command Konsole output "btrfs check /dev/sdb1" outputs :
"Checking filesystem on /dev/sdb1
UUID: 21873ba7-438a-4fbf-a051-ace28bffd264
checking extents"
and stops after a few minutes with no other output.

Maybe I'm too late to point out the problem,
but you may be impatient about btrfsck.

Unlike fsck from ext4/xfs, btrfsck will always read the whole metadata 
to check the consistence, so it may takes a long time for that.


Without the full output of btrfsck, it's quite hard to call it a clear 
bug report if you want to save your data in the corrupted partition.


On the other hand, if you can provide the full output, there is a chance 
that developers interested in btrfsck can help solving your problem.


BTW, it seems that you are also impatient about the reply speed in btrfs 
mail list.


IMHO, current btrfs mail list is much like a developer mail list.
Although there are talent sysadmins like Ducan or Marc here, most of the 
developers are hardly interested in a bug report without a reproducer or 
even btrfsck output.

Not to mention they are also busy fixing but or developing new features.

So just calm down and be patient for both btrfsck and developers.

Thanks,
Qu




I did not try "btrfs check --repair".

"Btrfs-zero-log" doesn't seem to apply here.
Konsole output
I could copy the data on another freshly formatted disk and reformat
this one but I am wondering if btrfs is stable enough to be used on my
professional laptop (where I cannot afford such downtime)or if I should
go back to ext4.

So the goal of this message is not only to see if I can repair this
partition, but also to assess if btrfs corrupt partitions randomly and
irreversibly. If the root cause resides in a non-essential feature
(conversion or balancing for example), I would happily continue to use
it without this feature.

This is my first message on this mailing list. I've spent the last hours
trying to solve this.

More info:

uname -a
Linux viybel-pc 3.19.0-21-generic #21-Ubuntu SMP Sun Jun 14 18:31:11 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version
btrfs-progs v3.19.1

btrfs fi show
Label: none  uuid: 358f485d-690d-436d-ad35-3a1f47329ed7
 Total devices 1 FS bytes used 107.75GiB
 devid1 size 111.79GiB used 111.79GiB path /dev/sda1

Label: none  uuid: 21873ba7-438a-4fbf-a051-ace28bffd264
 Total devices 1 FS bytes used 606.17GiB
 devid1 size 698.63GiB used 660.03GiB path /dev/sdb1

btrfs fi df /_big
Data, single: total=431.00GiB, used=419.49GiB
System, single: total=32.00MiB, used=64.00KiB
Metadata, single: total=229.00GiB, used=186.67GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

dmesg > dmesg.log (attached)

Konsole outp
Vianney


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: qgroup limit clearing, was Re: Btrfs progs release 4.1

2015-06-22 Thread Tsutomu Itoh

On 2015/06/23 3:18, Christian Robottom Reis wrote:

On Mon, Jun 22, 2015 at 05:00:23PM +0200, David Sterba wrote:

   - qgroup:
 - show: distinguish no limits and 0 limit value
 - limit: ability to clear the limit


I'm using kernel 4.1-rc7 as per:

 root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# uname -a
 Linux riff 4.1.0-040100rc7-generic #201506080035 SMP Mon Jun 8 04:36:20 
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

But apart from still having major issues with qgroups (quota enforcement
triggers even when there seems to be plenty of free space) clearing
limits with btrfs-progs 4.1 doesn't revert back to 'none', instead
confusingly setting the quota to 16EiB. Using:

 root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# btrfs version
 btrfs-progs v4.1

I start from:

 qgroupid rfer excl max_rfer max_excl
     
 0/5   2.15GiB  1.95GiB none none
 0/261 1.42GiB  1.11GiB none100.00GiB
 0/265 1.09GiB600.59MiB none100.00GiB
 0/271   793.32MiB366.40MiB none100.00GiB
 0/274   514.96MiB142.92MiB none100.00GiB

I then issue:

 root@riff# btrfs qgroup limit -e none 261 /var
 root@riff# btrfs qgroup limit none 261 /var

I end up with:

 qgroupid rfer excl max_rfer max_excl
     
 0/5   2.15GiB  1.95GiB none none
 0/261 1.42GiB  1.11GiB 16.00EiB 16.00EiB
 0/265 1.09GiB600.59MiB none100.00GiB
 0/271   793.32MiB366.40MiB none100.00GiB
 0/274   514.96MiB142.92MiB none100.00GiB

Is that expected?



The following fix is necessary for the kernel to display it correctly.

 [PATCH] btrfs: qgroup: allow user to clear the limitation on qgroup
 http://marc.info/?l=linux-btrfs&m=143331495409594&w=2

Thanks,
Tsutomu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: raid 1 to 10 conversion

2015-06-22 Thread Suman Chakravartula
I can confirm that convert works now with 4.1 kernel and btrfs-progs

Suman

On Tue, Jun 9, 2015 at 10:31 PM, Gareth Pye  wrote:
> btrfs has a small bug at the moment where balance can't convert raid
> levels (it just does nothing), it is meant to be fixed with the next
> kernel release.
>
> On Wed, Jun 10, 2015 at 3:28 PM, Guilherme Gonçalves
>  wrote:
>> Hello!, i think i made a mistake
>> i had two 3tb drivre on a raid 1 setup, i bought two aditional 3tb
>> drives to make my raid 10 array
>> i used this commands
>>
>> btrfs -f device add /dev/sdc /mnt/nas/(i used -f because i
>> formatted my new drives using gpt)
>> btrfs -f device add /dev/sdf /mnt/nas/
>>
>> finally:
>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt/nas/
>>
>> after a couple of hours i ran:
>>
>> btrfs filesystem  df /mnt/nas/
>>
>> Data, RAID1: total=963.00GiB, used=962.69GiB
>> System, RAID1: total=32.00MiB, used=176.00KiB
>> Metadata, RAID1: total=6.00GiB, used=4.59GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> should that not read raid 10 ?
>>
>> output for btrfs fi usage /mnt/nas
>>
>> Overall:
>> Device size:  10.92TiB
>> Device allocated:   1.89TiB
>> Device unallocated:   9.02TiB
>> Device missing: 0.00B
>> Used:   1.89TiB
>> Free (estimated):   4.51TiB (min: 4.51TiB)
>> Data ratio:  2.00
>> Metadata ratio:  2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>> Data,RAID1: Size:963.00GiB, Used:962.69GiB
>>/dev/sdc 481.00GiB
>>/dev/sdd1 482.00GiB
>>/dev/sde1 482.00GiB
>>/dev/sdf 481.00GiB
>>
>> Metadata,RAID1: Size:6.00GiB, Used:4.59GiB
>>/dev/sdc   4.00GiB
>>/dev/sdd1   2.00GiB
>>/dev/sde1   2.00GiB
>>/dev/sdf   4.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:176.00KiB
>>/dev/sdd1  32.00MiB
>>/dev/sde1  32.00MiB
>>
>> Unallocated:
>>/dev/sdc   2.25TiB
>>/dev/sdd1   2.26TiB
>>/dev/sde1   2.26TiB
>>/dev/sdf   2.25TiB
>>
>>
>> I think i made a mess here...  why is system only on two drives? why
>> is it not showing raid 10?
>> If i actually failed how do i acheive this? i want all four drives in
>> a raid 10 setup.
>>
>> Thanks in advance
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Gareth Pye
> Level 2 MTG Judge, Melbourne, Australia
> "Dear God, I would like to file a bug report"
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH 3/5] btrfs: fix clone / extent-same deadlocks

2015-06-22 Thread Mark Fasheh
Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.

In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/ioctl.c | 34 --
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, 
struct inode *inode2)
swap(inode1, inode2);
 
mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-   if (inode1 != inode2)
-   mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+   mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
 }
 
 static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode 
*inode1, u64 loff1,
swap(loff1, loff2);
}
lock_extent_range(inode1, loff1, len);
-   if (inode1 != inode2)
-   lock_extent_range(inode2, loff2, len);
+   lock_extent_range(inode2, loff2, len);
 }
 
 struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
goto out_fput;
 
if (!same_inode) {
-   if (inode < src) {
-   mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-   mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
-   } else {
-   mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
-   mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-   }
+   btrfs_double_inode_lock(src, inode);
} else {
mutex_lock(&src->i_mutex);
}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, 
unsigned long srcfd,
 
lock_extent_range(src, lock_start, lock_len);
} else {
-   lock_extent_range(src, off, len);
-   lock_extent_range(inode, destoff, len);
+   btrfs_double_extent_lock(src, off, inode, destoff, len);
}
 
ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, 
unsigned long srcfd,
 
unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
} else {
-   unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
-   unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
- destoff + len - 1);
+   btrfs_double_extent_unlock(src, off, inode, destoff, len);
}
/*
 * Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
truncate_inode_pages_range(&inode->i_data, destoff,
   PAGE_CACHE_ALIGN(destoff + len) - 1);
 out_unlock:
-   if (!same_inode) {
-   if (inode < src) {
-   mutex_unlock(&src->i_mutex);
-   mutex_unlock(&inode->i_mutex);
-   } else {
-   mutex_unlock(&inode->i_mutex);
-   mutex_unlock(&src->i_mutex);
-   }
-   } else {
+   if (!same_inode)
+   btrfs_double_inode_unlock(src, inode);
+   else
mutex_unlock(&src->i_mutex);
-   }
 out_fput:
fdput(src_file);
 out_drop_write:
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH 2/5] btrfs: fix deadlock with extent-same and readpage

2015-06-22 Thread Mark Fasheh
->readpage() does page_lock() before extent_lock(), we do the opposite in
extent-same. We want to reverse the order in btrfs_extent_same() but it's
not quite straightforward since the page locks are taken inside 
btrfs_cmp_data().

So I split btrfs_cmp_data() into 3 parts with a small context structure that
is passed between them. The first, btrfs_cmp_data_prepare() gathers up the
pages needed (taking page lock as required) and puts them on our context
structure. At this point, we are safe to lock the extent range. Afterwards,
we use btrfs_cmp_data() to do the data compare as usual and 
btrfs_cmp_data_free()
to clean up our context.

Signed-off-by: Mark Fasheh 
Reviewed-by: David Sterba 
---
 fs/btrfs/ioctl.c | 148 +++
 1 file changed, 117 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2deea1f..b899584 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2755,14 +2755,11 @@ out:
return ret;
 }
 
-static struct page *extent_same_get_page(struct inode *inode, u64 off)
+static struct page *extent_same_get_page(struct inode *inode, pgoff_t index)
 {
struct page *page;
-   pgoff_t index;
struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-   index = off >> PAGE_CACHE_SHIFT;
-
page = grab_cache_page(inode->i_mapping, index);
if (!page)
return NULL;
@@ -2783,6 +2780,20 @@ static struct page *extent_same_get_page(struct inode 
*inode, u64 off)
return page;
 }
 
+static int gather_extent_pages(struct inode *inode, struct page **pages,
+  int num_pages, u64 off)
+{
+   int i;
+   pgoff_t index = off >> PAGE_CACHE_SHIFT;
+
+   for (i = 0; i < num_pages; i++) {
+   pages[i] = extent_same_get_page(inode, index + i);
+   if (!pages[i])
+   return -ENOMEM;
+   }
+   return 0;
+}
+
 static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
 {
/* do any pending delalloc/csum calc on src, one way or
@@ -2808,52 +2819,120 @@ static inline void lock_extent_range(struct inode 
*inode, u64 off, u64 len)
}
 }
 
-static void btrfs_double_unlock(struct inode *inode1, u64 loff1,
-   struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_unlock(struct inode *inode1, struct inode 
*inode2)
 {
-   unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
-   unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
-
mutex_unlock(&inode1->i_mutex);
mutex_unlock(&inode2->i_mutex);
 }
 
-static void btrfs_double_lock(struct inode *inode1, u64 loff1,
- struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
+{
+   if (inode1 < inode2)
+   swap(inode1, inode2);
+
+   mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+   if (inode1 != inode2)
+   mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+}
+
+static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
+ struct inode *inode2, u64 loff2, u64 len)
+{
+   unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
+   unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
+}
+
+static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
+struct inode *inode2, u64 loff2, u64 len)
 {
if (inode1 < inode2) {
swap(inode1, inode2);
swap(loff1, loff2);
}
-
-   mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
lock_extent_range(inode1, loff1, len);
-   if (inode1 != inode2) {
-   mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+   if (inode1 != inode2)
lock_extent_range(inode2, loff2, len);
+}
+
+struct cmp_pages {
+   int num_pages;
+   struct page **src_pages;
+   struct page **dst_pages;
+};
+
+static void btrfs_cmp_data_free(struct cmp_pages *cmp)
+{
+   int i;
+   struct page *pg;
+
+   for (i = 0; i < cmp->num_pages; i++) {
+   pg = cmp->src_pages[i];
+   if (pg)
+   page_cache_release(pg);
+   pg = cmp->dst_pages[i];
+   if (pg)
+   page_cache_release(pg);
+   }
+   kfree(cmp->src_pages);
+   kfree(cmp->dst_pages);
+}
+
+static int btrfs_cmp_data_prepare(struct inode *src, u64 loff,
+ struct inode *dst, u64 dst_loff,
+ u64 len, struct cmp_pages *cmp)
+{
+   int ret;
+   int num_pages = PAGE_CACHE_ALIGN(len) >> PAGE_CACHE_SHIFT;
+   struct page **src_pgarr, **dst_pgarr;
+
+   /*
+* We must gather up all the pages before we init

[PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same

2015-06-22 Thread Mark Fasheh
One issue users have reported is that dedupe changes mtime on files,
resulting in tools like rsync thinking that their contents have changed when
in fact the data is exactly the same. Clone still wants an mtime change, so
we special case this in the code.

With this patch an application can pass the BTRFS_SAME_NO_MTIME flag to a
dedupe request and the kernel will honor it by only changing ctime.

I have an updated version of the btrfs-extent-same test program with a
switch to provide this flag at the 'no_time' branch of:

https://github.com/markfasheh/duperemove/

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/ioctl.c   | 34 --
 include/uapi/linux/btrfs.h |  5 -
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 83f4679..8cfc65f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
 
 
 static int btrfs_clone(struct inode *src, struct inode *inode,
-  u64 off, u64 olen, u64 olen_aligned, u64 destoff);
+  u64 off, u64 olen, u64 olen_aligned, u64 destoff,
+  int no_mtime);
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2974,7 +2975,7 @@ static int extent_same_check_offsets(struct inode *inode, 
u64 off, u64 *plen,
 }
 
 static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
-struct inode *dst, u64 dst_loff)
+struct inode *dst, u64 dst_loff, int no_mtime)
 {
int ret;
u64 len = olen;
@@ -3054,7 +3055,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 olen,
/* pass original length for comparison so we stay within i_size */
ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
if (ret == 0)
-   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
+   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff,
+ no_mtime);
 
if (same_inode)
unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3088,6 +3090,7 @@ static long btrfs_ioctl_file_extent_same(struct file 
*file,
u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
bool is_admin = capable(CAP_SYS_ADMIN);
u16 count;
+   int no_mtime = 0;
 
if (!(file->f_mode & FMODE_READ))
return -EINVAL;
@@ -3139,6 +3142,12 @@ static long btrfs_ioctl_file_extent_same(struct file 
*file,
if (!S_ISREG(src->i_mode))
goto out;
 
+   ret = -EINVAL;
+   if (same->flags & ~BTRFS_SAME_FLAGS)
+   goto out;
+   if (same->flags & BTRFS_SAME_NO_MTIME)
+   no_mtime = 1;
+
/* pre-format output fields to sane values */
for (i = 0; i < count; i++) {
same->info[i].bytes_deduped = 0ULL;
@@ -3164,7 +3173,8 @@ static long btrfs_ioctl_file_extent_same(struct file 
*file,
info->status = -EACCES;
} else {
info->status = btrfs_extent_same(src, off, len, dst,
-   info->logical_offset);
+info->logical_offset,
+no_mtime);
if (info->status == 0)
info->bytes_deduped += len;
}
@@ -3219,13 +3229,17 @@ static int clone_finish_inode_update(struct 
btrfs_trans_handle *trans,
 struct inode *inode,
 u64 endoff,
 const u64 destoff,
-const u64 olen)
+const u64 olen,
+int no_mtime)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
int ret;
 
inode_inc_iversion(inode);
-   inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   if (no_mtime)
+   inode->i_ctime = CURRENT_TIME;
+   else
+   inode->i_mtime = inode->i_ctime = CURRENT_TIME;
/*
 * We round up to the block size at eof when determining which
 * extents to clone above, but shouldn't round up the file size.
@@ -3316,7 +3330,7 @@ static void clone_update_extent_map(struct inode *inode,
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,
   const u64 off, const u64 olen, const u64 olen_aligned,
-  const u64 destoff)
+  const u64 destoff, int no_mtime)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_path *path = NULL;
@@ -3640,7 +3654,7 @@ process_slot:
  ro

[PATCH 4/5] btrfs: allow dedupe of same inode

2015-06-22 Thread Mark Fasheh
clone() supports cloning within an inode so extent-same can do
the same now. This patch fixes up the locking in extent-same to
know about the single-inode case. In addition to that, we add a
check for overlapping ranges, which clone does not allow.

Signed-off-by: Mark Fasheh 
---
 fs/btrfs/ioctl.c | 76 
 1 file changed, 60 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8d6887d..83f4679 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2979,27 +2979,61 @@ static int btrfs_extent_same(struct inode *src, u64 
loff, u64 olen,
int ret;
u64 len = olen;
struct cmp_pages cmp;
+   int same_inode = 0;
+   u64 same_lock_start = 0;
+   u64 same_lock_len = 0;
 
-   /*
-* btrfs_clone() can't handle extents in the same file
-* yet. Once that works, we can drop this check and replace it
-* with a check for the same inode, but overlapping extents.
-*/
if (src == dst)
-   return -EINVAL;
+   same_inode = 1;
 
if (len == 0)
return 0;
 
-   btrfs_double_inode_lock(src, dst);
+   if (same_inode) {
+   mutex_lock(&src->i_mutex);
 
-   ret = extent_same_check_offsets(src, loff, &len, olen);
-   if (ret)
-   goto out_unlock;
+   ret = extent_same_check_offsets(src, loff, &len, olen);
+   if (ret)
+   goto out_unlock;
 
-   ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
-   if (ret)
-   goto out_unlock;
+   /*
+* Single inode case wants the same checks, except we
+* don't want our length pushed out past i_size as
+* comparing that data range makes no sense.
+*
+* extent_same_check_offsets() will do this for an
+* unaligned length at i_size, so catch it here and
+* reject the request.
+*
+* This effectively means we require aligned extents
+* for the single-inode case, whereas the other cases
+* allow an unaligned length so long as it ends at
+* i_size.
+*/
+   if (len != olen) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   /* Check for overlapping ranges */
+   if (dst_loff + len > loff && dst_loff < loff + len) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   same_lock_start = min_t(u64, loff, dst_loff);
+   same_lock_len = max_t(u64, loff, dst_loff) + len - 
same_lock_start;
+   } else {
+   btrfs_double_inode_lock(src, dst);
+
+   ret = extent_same_check_offsets(src, loff, &len, olen);
+   if (ret)
+   goto out_unlock;
+
+   ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
+   if (ret)
+   goto out_unlock;
+   }
 
/* don't make the dst file partly checksummed */
if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
@@ -3012,18 +3046,28 @@ static int btrfs_extent_same(struct inode *src, u64 
loff, u64 olen,
if (ret)
goto out_unlock;
 
-   btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+   if (same_inode)
+   lock_extent_range(src, same_lock_start, same_lock_len);
+   else
+   btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
 
/* pass original length for comparison so we stay within i_size */
ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
-   btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+   if (same_inode)
+   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
+ same_lock_start + same_lock_len - 1);
+   else
+   btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
 
btrfs_cmp_data_free(&cmp);
 out_unlock:
-   btrfs_double_inode_unlock(src, dst);
+   if (same_inode)
+   mutex_unlock(&src->i_mutex);
+   else
+   btrfs_double_inode_unlock(src, dst);
 
return ret;
 }
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()

2015-06-22 Thread Mark Fasheh
In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().

Signed-off-by: Mark Fasheh 
Reviewed-by: David Sterba 
---
 fs/btrfs/ioctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 olen,
goto out_unlock;
}
 
-   ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+   /* pass original length for comparison so we stay within i_size */
+   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
if (ret == 0)
ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH 0/5] btrfs: dedupe fixes, features V2

2015-06-22 Thread Mark Fasheh
Hi Chris,

   The following patches are based on top of my patch titled "btrfs:
Handle unaligned length in extent_same" which you have in your
'integration-4.2' branch:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=e1d227a42ea2b4664f94212bd1106b9a3413ffb8


I sent out the 1st two patches from this series last week, the last 3 are   
new to the list:

http://www.spinics.net/lists/linux-btrfs/msg44849.html


The first patch in the series fixes a bug where we were sometimes
passing the aligned length to our comparison function. We actually can
stop at the user passed length for this as we don't need to compare
data past i_size (and we only align if the extents go out to i_size).

The 2nd patch fixes a deadlock between btrfs readpage and
btrfs_extent_same. This was reported on the list some months ago -
basically we had the page and extent locking reversed. My patch fixes
up the locking to be in the right order.

The 3rd patch fixes a deadlocks in clone() (wrt extent-same) which
David found while reviewing my fixes. I also found that clone doesn't
lock extent ranges in any particular order which could obvioulsy be a
problem so that is fixed there too.

The last two patches add features which have been requested often by
users - the 4th adds the ability to dedupe within the same inode, and
the last patch adds a dedupe flag to avoid mtime updates (this helps
with backup software).

These patches have been tested with the 'btrfs-extent-same' tool that
can be found at:

https://github.com/markfasheh/duperemove/blob/nomtime/btrfs-extent-same.c

Thanks,
   --Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


qgroup limit clearing, was Re: Btrfs progs release 4.1

2015-06-22 Thread Christian Robottom Reis
On Mon, Jun 22, 2015 at 05:00:23PM +0200, David Sterba wrote:
>   - qgroup:
> - show: distinguish no limits and 0 limit value
> - limit: ability to clear the limit

I'm using kernel 4.1-rc7 as per:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# uname -a
Linux riff 4.1.0-040100rc7-generic #201506080035 SMP Mon Jun 8 04:36:20 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

But apart from still having major issues with qgroups (quota enforcement
triggers even when there seems to be plenty of free space) clearing
limits with btrfs-progs 4.1 doesn't revert back to 'none', instead
confusingly setting the quota to 16EiB. Using:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# btrfs version
btrfs-progs v4.1

I start from:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB none100.00GiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

I then issue:

root@riff# btrfs qgroup limit -e none 261 /var
root@riff# btrfs qgroup limit none 261 /var

I end up with:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB 16.00EiB 16.00EiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

Is that expected?
-- 
Christian Robottom Reis   | [+1] 612 888 4935| http://launchpad.net/~kiko
Canonical VP Hyperscale   | [+55 16] 9 9112 6430
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


qgroup limit clearing, was Re: Btrfs progs release 4.1

2015-06-22 Thread Christian Robottom Reis
On Mon, Jun 22, 2015 at 05:00:23PM +0200, David Sterba wrote:
>   - qgroup:
> - show: distinguish no limits and 0 limit value
> - limit: ability to clear the limit

I'm using kernel 4.1-rc7 as per:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# uname -a
Linux riff 4.1.0-040100rc7-generic #201506080035 SMP Mon Jun 8 04:36:20 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

But apart from still having major issues with qgroups (quota enforcement
triggers even when there seems to be plenty of free space) clearing
limits with btrfs-progs 4.1 doesn't revert back to 'none', instead
confusingly setting the quota to 16EiB. Using:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# btrfs version
btrfs-progs v4.1

I start from:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB none100.00GiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

I then issue:

root@riff# btrfs qgroup limit -e none 261 /var
root@riff# btrfs qgroup limit none 261 /var

I end up with:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB 16.00EiB 16.00EiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

Is that expected?
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko
CEO, Async Open Source  | [+55 16] 9 9112 6430 | http://launchpad.net/~kiko
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: Btrfs progs release 4.1

2015-06-22 Thread Martin Steigerwald
Wow, nice collection of changes!

Am Montag, 22. Juni 2015, 17:00:23 schrieb David Sterba:
> * new
>   - rescure zero-log
>   - btrfsune:
> - rewrite uuid on a filesystem image
> - new option to turn on NO_HOLES incompat feature

Did you think about folding btrfstune into btrfs command as well?

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: i_version vs iversion (Was: Re: [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION)

2015-06-22 Thread Dave Chinner
On Thu, Jun 18, 2015 at 04:38:56PM +0200, David Sterba wrote:
> Moving the discussion to fsdevel.
> 
> Summary: disabling MS_I_VERSION brings some speedups to btrfs, but the
> generic 'noiversion' option cannot be used to achieve that. It is
> processed before it reaches btrfs superblock callback, where
> MS_I_VERSION is forced.
> 
> The proposed fix is to add btrfs-specific i_version/noi_version to btrfs,
> to which I object.

The issue is that you can't overide IS_I_VERSION(inode) because it
looks at the superblock flag, yes?

So perhaps IS_I_VERSION should become an inode flag, set by the
filesystem at inode instantiation time, and hence filesystems can
choose on a per-inode basis if they want I_VERSION behaviour or not.
At that point, the behaviour of MS_I_VERSION becomes irrelevant to
the discussion, doesn't it?

> xfs also forces I_VERSION if it detects the superblock version 5, so it
> could use the same fix that would work for btrfs.

XFS is a special snowflake - it updates the I_VERSION only when an
inode is otherwise modified in a transaction, so turning it off
saves nothing. (And yes, timestamp updates are transactional in
XFS). Hence XFS behaviour is irrelevant to the discussion, because
we aren't ever going to turn it off

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: Btrfs progs release 4.1

2015-06-22 Thread David Sterba
On Mon, Jun 22, 2015 at 06:18:35PM +0200, Goffredo Baroncelli wrote:
> Many thanks for your work.
> BTW just for curiosity: is it a coincidence that both Torvalds and you
> released the kernel 4.1/btrfs-progs 4.1 in the same day ? I know that
> the version are coupled, but also the same day

This time around I was ready to do the release on time so there was no
reason to delay it. Previous releases were delayed because of other work
or (my) insufficient confidence in the pending changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: RAID1: system stability

2015-06-22 Thread Chris Murphy
On Mon, Jun 22, 2015 at 10:36 AM, Timofey Titovets  wrote:
> 2015-06-22 19:03 GMT+03:00 Chris Murphy :
>> On Mon, Jun 22, 2015 at 5:35 AM, Timofey Titovets  
>> wrote:
>>> Okay, logs, i did release disk /dev/sde1 and get:
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
>>> 00 00 00 08 00
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
>>> error, dev sde, sector 287140096
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
>>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>>> SubCode(0x0011) cb_idx mptscsih_io_done
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
>>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
>>> 00 00 00 08 00
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
>>> error, dev sde, sector 287140096
>>
>> So what's up with this? This only happens after you try to (software)
>> remove /dev/sde1? Or is it happening also before that? Because this
>> looks like some kind of hardware problem when the drive is reporting
>> an error for a particular sector on read, as if it's a bad sector.
>
> Nope, i've physically remove device and as you see it's produce errors
> on block layer -.-
> and this disks have 100% 'health'
>
> Because it's hot-plug device, kernel see what device now missing and
> remove all kernel objects reletad to them.

OK I actually don't know what the intended block layer behavior is
when unplugging a device, if it is supposed to vanish, or change state
somehow so that thing that depend on it can know it's "missing" or
what. So the question here is, is this working as intended? If the
layer Btrfs depends on isn't working as intended, then Btrfs is
probably going to do wild and crazy things. And I don't know that the
part of the block layer Btrfs depends on for this is the same (or
different) as what the md driver depends on.


>
>>
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev sde1, 
>>> logical block 35892256, async page read
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>>> SubCode(0x0011) cb_idx mptscsih_io_done
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>>> SubCode(0x0011) cb_idx mptscsih_io_done
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED 
>>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69 00 00 
>>> 00 08 00
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O error, 
>>> dev sde, sector 287140096
>>
>> Again same sector as before. This is not a Btrfs error message, it's
>> coming from the block layer.
>>
>>
>>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev sde1, 
>>> logical block 35892256, async page read
>>
>> I'm not a dev so take it with a grain of salt but because this
>> references a logical block, this is the layer in between Btrfs and the
>> physical device. Btrfs works on logical blocks and those have to be
>> translated to device and physical sector. Maybe what's happening is
>> there's confusion somewhere about this device not actually being
>> unavailable so Btrfs or something else is trying to read this logical
>> block again, which causes a read attempt to happen instead of a flat
>> out "this device doesn't exist" type of error. So I don't know if this
>> is a problem strictly in Btrfs missing device error handling, or if
>> there's something else that's not really working correctly.
>>
>> You could test by physically removing the device, if you have hot plug
>> support (be certain all the hardware components support it), you can
>> see if you get different results. Or you could try to reproduce the
>> software delete of the device with mdraid or lvm raid with XFS and no
>> Btrfs at all, and see if you get different results.
>>
>> It's known that the btrfs multiple device failure use case is weak
>> right now. Data isn't lost, but the error handling, notification, all
>> that is almost non-existent compared to mdadm.
>
> So sad -.-
> i've test this test case with md raid1 and system continue work
> without problem when i release one of two md device

OK well then it's either a Btrfs bug or something it directly depends
on that md does not.


> You right about usb devices, it's not produce oops.
> May be its because kernel use different modules for SAS/SATA disks and
> usb sticks.

They appear as sd devices on my system, so they're using libata and as
such they ultimately still depend on the SCSI block layer. But there
may be a very different kind of missing device error handl

Re: RAID1: system stability

2015-06-22 Thread Timofey Titovets
2015-06-22 19:03 GMT+03:00 Chris Murphy :
> On Mon, Jun 22, 2015 at 5:35 AM, Timofey Titovets  
> wrote:
>> Okay, logs, i did release disk /dev/sde1 and get:
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
>> 00 00 00 08 00
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
>> error, dev sde, sector 287140096
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>> SubCode(0x0011) cb_idx mptscsih_io_done
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
>> 00 00 00 08 00
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
>> error, dev sde, sector 287140096
>
> So what's up with this? This only happens after you try to (software)
> remove /dev/sde1? Or is it happening also before that? Because this
> looks like some kind of hardware problem when the drive is reporting
> an error for a particular sector on read, as if it's a bad sector.

Nope, i've physically remove device and as you see it's produce errors
on block layer -.-
and this disks have 100% 'health'

Because it's hot-plug device, kernel see what device now missing and
remove all kernel objects reletad to them.

>
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev sde1, 
>> logical block 35892256, async page read
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>> SubCode(0x0011) cb_idx mptscsih_io_done
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>> SubCode(0x0011) cb_idx mptscsih_io_done
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED 
>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69 00 00 
>> 00 08 00
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O error, 
>> dev sde, sector 287140096
>
> Again same sector as before. This is not a Btrfs error message, it's
> coming from the block layer.
>
>
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev sde1, 
>> logical block 35892256, async page read
>
> I'm not a dev so take it with a grain of salt but because this
> references a logical block, this is the layer in between Btrfs and the
> physical device. Btrfs works on logical blocks and those have to be
> translated to device and physical sector. Maybe what's happening is
> there's confusion somewhere about this device not actually being
> unavailable so Btrfs or something else is trying to read this logical
> block again, which causes a read attempt to happen instead of a flat
> out "this device doesn't exist" type of error. So I don't know if this
> is a problem strictly in Btrfs missing device error handling, or if
> there's something else that's not really working correctly.
>
> You could test by physically removing the device, if you have hot plug
> support (be certain all the hardware components support it), you can
> see if you get different results. Or you could try to reproduce the
> software delete of the device with mdraid or lvm raid with XFS and no
> Btrfs at all, and see if you get different results.
>
> It's known that the btrfs multiple device failure use case is weak
> right now. Data isn't lost, but the error handling, notification, all
> that is almost non-existent compared to mdadm.

So sad -.-
i've test this test case with md raid1 and system continue work
without problem when i release one of two md device

>
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>> SubCode(0x0011) cb_idx mptscsih_io_done
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>> SubCode(0x0011) cb_idx mptscsih_io_done
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED 
>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69 00 00 
>> 00 08 00
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O error, 
>> dev sde, sector 287140096
>> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev sde1, 
>> logical block 35892256, async page read
>> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel: mptbase: ioc0: 
>> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
>> SubCode(0x0011) cb_idx mptscsih_io_done
>> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  end_device-0:0:6: 

Re: Btrfs progs release 4.1

2015-06-22 Thread Goffredo Baroncelli
On 2015-06-22 17:00, David Sterba wrote:
> Hi,

Many thanks for your work.
BTW just for curiosity: is it a coincidence that both Torvalds and you released 
the kernel 4.1/btrfs-progs 4.1 in the same day ? I know that the version are 
coupled, but also the same day

BR
G.Baronelli

> 
> btrfs-progs 4.1 have been released (in time with kernel 4.1). Unusual load of
> changes.
> 
> Fixed since rc1:
>   - uuid rewrite prints the correct original UUID
>   - map-logical updated
>   - fi show size units
>   - typos
> 
> * bugfixes
>   - fsck.btrfs: no bash-isms
>   - bugzilla 97171: invalid memory access (with tests)
>   - receive:
> - cloning works with --chroot
> - capabilities not lost
>   - mkfs: do not try to register bare file images
>   - option --help accepted by the standalone utilities
> 
> * enhancements
>   - corrupt block: ability to remove csums
>   - mkfs:
> - warn if metadata redundancy is lower than for data
> - options to make the output quiet (only errors)
> - mixed case names of raid profiles accepted
> - rework the output:
>   - more comprehensive, 'key: value' format
>   - subvol:
> - show:
>   - print received uuid
>   - update the output
>   - new options to specify size units
> - sync:
>   - grab all deleted ids and print them as they're removed,
>   previous implementation only checked if there are any
>   to be deleted - change in command semantics
>   - scrub: print timestamps in days HMS format
>   - receive:
> - can specify mount point, do not rely on /proc
> - can work inside subvolumes
>   - send:
> - new option to send stream without data (NO_FILE_DATA)
>   - convert:
> - specify incompat features on the new fs
>   - qgroup:
> - show: distinguish no limits and 0 limit value
> - limit: ability to clear the limit
>   - help for 'btrfs' is shorter, 1st level command overview
>   - debug tree: print key names according to their C name
> 
> * new
>   - rescure zero-log
>   - btrfsune:
> - rewrite uuid on a filesystem image
> - new option to turn on NO_HOLES incompat feature
> 
> * deprecated
>   - standalone btrfs-zero-log
> 
> * other
>   - testing framework updates
> - uuid rewrite test
> - btrfstune feature setting test
> - zero-log tests
> - more testing image formats
>   - manual page updates
>   - ioctl.h synced with current kernel uapi version
>   - convert: preparatory works for more filesystems (reiserfs pending)
>   - use static buffers for path handling where possible
>   - add new helpers for send uilts that check memory allocations,
> switch all users, deprecate old helpers
>   - Makefile: fix build dependency generation
>   - map-logical: make it work again
> 
> Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
> Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git
> 
> Shortlog:
> 
> Anand Jain (2):
>   btrfs-progs: add info about list-all to the help
>   btrfs-progs: use function is_block_device() instead
> 
> Dimitri John Ledkov (1):
>   btrfs-progs: fsck.btrfs: Fix bashism and bad getopts processing
> 
> Dongsheng Yang (4):
>   btrfs-progs: qgroup: show 'none' when we did not limit it on this qgroup
>   btrfs-progs: qgroup: allow user to clear some limitation on qgroup.
>   btrfs-progs: qgroup limit: error out if input value is negative
>   btrfs-progs: qgroup limit: add a check for invalid input of 'T/G/M/K'
> 
> Emil Karlson (1):
>   btrfs-progs: use openat for process_clone in receive
> 
> Goffredo Baroncelli (4):
>   btrfs-progs: add strdup in btrfs_add_to_fsid() to track the device path
>   btrfs-progs: return the fsid from make_btrfs()
>   btrfs-progs: mkfs: track sizes of created block groups
>   btrfs-progs: mkfs: print the summary
> 
> Jeff Mahoney (8):
>   btrfs-progs: convert: clean up blk_iterate_data handling wrt 
> record_file_blocks
>   btrfs-progs: convert: remove unused fs argument from block_iterate_proc
>   btrfs-progs: convert: remove unused inode_key in copy_single_inode
>   btrfs-progs: convert: rename ext2_root to image_root
>   btrfs-progs: compat: define DIV_ROUND_UP if not already defined
>   btrfs-progs: convert: fix typo in btrfs_insert_dir_item call
>   btrfs-progs: convert: factor out adding dirent into 
> convert_insert_dirent
>   btrfs-progs: convert: factor out block iteration callback
> 
> Josef Bacik (3):
>   Btrfs-progs: corrupt-block: add the ability to remove csums
>   btrfs-progs: specify mountpoint for recieve
>   btrfs-progs: make receive work inside of subvolumes
> 
> Qu Wenruo (13):
>   btrfs-progs: Enhance read_tree_block to avoid memory corruption
>   btrfs-progs: btrfstune: rework change_uuid
>   btrfs-progs: btrfstune: add ability to restore unfinished fsid change
>   btrfs-progs: btrfstune: add '-U' and '-u' option to change fsid
>   btrfs-progs: Docume

Re: RAID1: system stability

2015-06-22 Thread Chris Murphy
On Mon, Jun 22, 2015 at 5:35 AM, Timofey Titovets  wrote:
> Okay, logs, i did release disk /dev/sde1 and get:
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
> 00 00 00 08 00
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
> error, dev sde, sector 287140096
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
> SubCode(0x0011) cb_idx mptscsih_io_done
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
> 00 00 00 08 00
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
> error, dev sde, sector 287140096

So what's up with this? This only happens after you try to (software)
remove /dev/sde1? Or is it happening also before that? Because this
looks like some kind of hardware problem when the drive is reporting
an error for a particular sector on read, as if it's a bad sector.




> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev
> sde1, logical block 35892256, async page read
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
> SubCode(0x0011) cb_idx mptscsih_io_done
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
> SubCode(0x0011) cb_idx mptscsih_io_done
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
> 00 00 00 08 00
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
> error, dev sde, sector 287140096

Again same sector as before. This is not a Btrfs error message, it's
coming from the block layer.


> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev
> sde1, logical block 35892256, async page read

I'm not a dev so take it with a grain of salt but because this
references a logical block, this is the layer in between Btrfs and the
physical device. Btrfs works on logical blocks and those have to be
translated to device and physical sector. Maybe what's happening is
there's confusion somewhere about this device not actually being
unavailable so Btrfs or something else is trying to read this logical
block again, which causes a read attempt to happen instead of a flat
out "this device doesn't exist" type of error. So I don't know if this
is a problem strictly in Btrfs missing device error handling, or if
there's something else that's not really working correctly.

You could test by physically removing the device, if you have hot plug
support (be certain all the hardware components support it), you can
see if you get different results. Or you could try to reproduce the
software delete of the device with mdraid or lvm raid with XFS and no
Btrfs at all, and see if you get different results.

It's known that the btrfs multiple device failure use case is weak
right now. Data isn't lost, but the error handling, notification, all
that is almost non-existent compared to mdadm.


> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
> SubCode(0x0011) cb_idx mptscsih_io_done
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
> SubCode(0x0011) cb_idx mptscsih_io_done
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
> 00 00 00 08 00
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
> error, dev sde, sector 287140096
> Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev
> sde1, logical block 35892256, async page read
> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
> LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
> SubCode(0x0011) cb_idx mptscsih_io_done
> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  end_device-0:0:6:
> mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 16, phy
> 5,sas_addr 0x5000cca00d0514bd
> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  phy-0:0:9: mptsas: ioc0:
> delete phy 5, phy-obj (0x880449541400)
> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  port-0:0:6: mptsas:
> ioc0: delete port 6, sas_addr (0x5000cca00d0514bd)
> Jun 22 14:28:41 srv-lab-ceph-node-01 kernel: scsi target0:0:5: mptsas:
> ioc0: delete device: fw_channel 0, fw_id 16, phy 5, sas_addr
> 0x5000cca00d0514bd

OK it looks like not until here does it

Btrfs progs release 4.1

2015-06-22 Thread David Sterba
Hi,

btrfs-progs 4.1 have been released (in time with kernel 4.1). Unusual load of
changes.

Fixed since rc1:
  - uuid rewrite prints the correct original UUID
  - map-logical updated
  - fi show size units
  - typos

* bugfixes
  - fsck.btrfs: no bash-isms
  - bugzilla 97171: invalid memory access (with tests)
  - receive:
- cloning works with --chroot
- capabilities not lost
  - mkfs: do not try to register bare file images
  - option --help accepted by the standalone utilities

* enhancements
  - corrupt block: ability to remove csums
  - mkfs:
- warn if metadata redundancy is lower than for data
- options to make the output quiet (only errors)
- mixed case names of raid profiles accepted
- rework the output:
  - more comprehensive, 'key: value' format
  - subvol:
- show:
  - print received uuid
  - update the output
  - new options to specify size units
- sync:
  - grab all deleted ids and print them as they're removed,
previous implementation only checked if there are any
to be deleted - change in command semantics
  - scrub: print timestamps in days HMS format
  - receive:
- can specify mount point, do not rely on /proc
- can work inside subvolumes
  - send:
- new option to send stream without data (NO_FILE_DATA)
  - convert:
- specify incompat features on the new fs
  - qgroup:
- show: distinguish no limits and 0 limit value
- limit: ability to clear the limit
  - help for 'btrfs' is shorter, 1st level command overview
  - debug tree: print key names according to their C name

* new
  - rescure zero-log
  - btrfsune:
- rewrite uuid on a filesystem image
- new option to turn on NO_HOLES incompat feature

* deprecated
  - standalone btrfs-zero-log

* other
  - testing framework updates
- uuid rewrite test
- btrfstune feature setting test
- zero-log tests
- more testing image formats
  - manual page updates
  - ioctl.h synced with current kernel uapi version
  - convert: preparatory works for more filesystems (reiserfs pending)
  - use static buffers for path handling where possible
  - add new helpers for send uilts that check memory allocations,
switch all users, deprecate old helpers
  - Makefile: fix build dependency generation
  - map-logical: make it work again

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

Anand Jain (2):
  btrfs-progs: add info about list-all to the help
  btrfs-progs: use function is_block_device() instead

Dimitri John Ledkov (1):
  btrfs-progs: fsck.btrfs: Fix bashism and bad getopts processing

Dongsheng Yang (4):
  btrfs-progs: qgroup: show 'none' when we did not limit it on this qgroup
  btrfs-progs: qgroup: allow user to clear some limitation on qgroup.
  btrfs-progs: qgroup limit: error out if input value is negative
  btrfs-progs: qgroup limit: add a check for invalid input of 'T/G/M/K'

Emil Karlson (1):
  btrfs-progs: use openat for process_clone in receive

Goffredo Baroncelli (4):
  btrfs-progs: add strdup in btrfs_add_to_fsid() to track the device path
  btrfs-progs: return the fsid from make_btrfs()
  btrfs-progs: mkfs: track sizes of created block groups
  btrfs-progs: mkfs: print the summary

Jeff Mahoney (8):
  btrfs-progs: convert: clean up blk_iterate_data handling wrt 
record_file_blocks
  btrfs-progs: convert: remove unused fs argument from block_iterate_proc
  btrfs-progs: convert: remove unused inode_key in copy_single_inode
  btrfs-progs: convert: rename ext2_root to image_root
  btrfs-progs: compat: define DIV_ROUND_UP if not already defined
  btrfs-progs: convert: fix typo in btrfs_insert_dir_item call
  btrfs-progs: convert: factor out adding dirent into convert_insert_dirent
  btrfs-progs: convert: factor out block iteration callback

Josef Bacik (3):
  Btrfs-progs: corrupt-block: add the ability to remove csums
  btrfs-progs: specify mountpoint for recieve
  btrfs-progs: make receive work inside of subvolumes

Qu Wenruo (13):
  btrfs-progs: Enhance read_tree_block to avoid memory corruption
  btrfs-progs: btrfstune: rework change_uuid
  btrfs-progs: btrfstune: add ability to restore unfinished fsid change
  btrfs-progs: btrfstune: add '-U' and '-u' option to change fsid
  btrfs-progs: Documentation: uuid change
  btrfs-progs: btrfstune: fix a bug which makes unfinished fsid change 
unrecoverable
  btrfs-progs: export read_extent_data function
  btrfs-progs: map-logical: introduce map_one_extent function
  Btrfs-progs: map-logical: introduce print_mapping_info function
  Btrfs-progs: map-logical: introduce write_extent_content function
  btrfs-progs: map-logical: Rework map-logical logics
  btrfs-progs: Allow "filesystem show" command to handle different units
  btrfs-progs: do

Re: [PATCH 1/2] btrfs-progs: Allow "filesystem show" command to handle different units

2015-06-22 Thread David Sterba
On Thu, Jun 18, 2015 at 02:46:11PM +0800, Qu Wenruo wrote:
> Now "filesystem show" command can handle different units now.
> 
> This is handy for higher level programs to get accurate output from "fi
> show" command.
> 
> Signed-off-by: Qu Wenruo 

Thanks, both applied with minor fixups.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: [PATCH] Btrfs: Check if kobject is initialized before put

2015-06-22 Thread David Sterba
On Mon, Jun 22, 2015 at 06:18:32PM +0800, Anand Jain wrote:
> Signed-off-by: Anand Jain 

Tested-by: David Sterba 

Thanks, fixes the crash in the sysfs update patchset.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: Corrupted btrfs partition (converted from ext4) after balance

2015-06-22 Thread Vianney Stroebel

So, in the btrfs mailing list, nobody will help a user who has had a whole 
partition corrupted? I think my report was clear and complete.

In IRC, the only answer I got was: "format your partition, there's nothing you can 
do and there's nothing to understand from this" (from nice people I should say).

What I understood from this experience is that btrfs is far from production-ready. How 
many people every day around the world are losing a lot of time because the 
"unstable" warning was removed? And losing data too: only a perfect backup 
system could allow someone to avoid data loss after a crash in a production system. It 
would require instantaneous replication + instantaneous versioning (without btrfs 
obviously) + instantaneous restore, which afaik no backup system has.

Thank you Ducan for your reply about btrfs' stability. But frankly, we 
shouldn't have to speculate how stable btrfs is.

I don't get how people in this mailing list and in IRC find this situation 
acceptable. A file system is too critical to be treated this lightly.

I'm going back to ext4 for the moment and from now on I will only trust 
reputable third-party sources as to when btrfs is production-ready.

Sorry for the tone. I hope nobody found this message disrespectful.

Vianney

Le 19/06/2015 09:53, Duncan a écrit :

Vianney Stroebel posted on Fri, 19 Jun 2015 01:55:01 +0200 as excerpted:


I could copy the data on another freshly formatted disk and reformat
this one but I am wondering if btrfs is stable enough to be used on my
professional laptop (where I cannot afford such downtime)or if I should
go back to ext4.

As a btrfs-using admin and list regular, not a dev, I'll reply to just
the above more general question, letting others deal with the specific
technical issue...

Good question, on which there's apparently a bit of controversy.

My own opinion, TL;DR summary?  If you're asking the question and are
unlikely to be going ahead anyway, regardless of the answer you get, then
btrfs is unlikely to be what you'd call "stable enough", at this point.

The longer version...

The devs have applied patches that have removed most of the warnings, and
some distros are now using btrfs by default, generally for the system
partitions in ordered to take advantage of btrfs snapshotting to enable
rollback, so it's obviously "stable enough" for them.

But actual non-dev btrfs user and list regular opinion on this list seems
to be somewhere between "Are you kidding?  After I just got thru dealing
with bug , no way, Jose!", and "It's definitely stabilizing and
maturing, and is noticeably better than six months ago, which was
noticeably better than six months before that, but it's equally
definitely not something I'd characterize as fully stable and mature just
yet."

An arguably more practical way of stating the latter position, which
happens to be my own, is by reference to the sysadmin's rule of backups.
This rule says that if a particular set of files isn't backed up, then by
definition, you don't care about losing it, despite any claims, possibly
after said loss, to the contrary.  Additionally, a would-be backup that
hasn't passed restorability tests isn't yet complete, and therefore
cannot be called a backup for purposes of the above rule.  If it isn't
backed up, you don't care about losing it.  Full stop.  But, because
btrfs isn't yet fully stable and mature, that rule applies double.

I'd argue that for anyone that accepts that principle, including the
doubling, and is still willing to use btrfs, it's "stable enough".
Otherwise, better look somewhere else, as what you're looking for isn't
found here.

That's the sysadmin-speak test, and result.  But there's another way of
putting it that's more developer-speak.

As any good developer will tell you, premature optimization is bad, very
bad, in no small part because optimization is a LOT of work, and
premature optimization either severely limits post-optimization
flexibility in ordered to retain that work, or must be repeated over and
over again as the problem and solution space becomes more defined by
early trial and mid-stage implementations and better solutions become
known.

For reasonably good developers, then (and if you don't consider them good
developers, why are you trusting their filesystem work?), developer's own
REAL opinion of the stability and maturity of a project is how much it
has been optimized, vs. where optimization remains on the TODO list.
Once developers are focusing on optimization, arguably they too believe
the general solution to be relatively stable and mature.  By contrast, if
major parts of the code remain unoptimized, particularly where the
current code works well enough but is known to be LESS than optimum,
developers self-evidently consider it still maturing and subject to
change that could possibly undo any current efforts at optimization.

Arguably, that's about as technically reasonable and unbiased as a
measure gets, so for those concerned about 

Re: RAID1: system stability

2015-06-22 Thread Timofey Titovets
And again if i've try
echo 1 > /sys/block/sdf/device/delete

Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: [ cut here
]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: kernel BUG at
/build/buildd/linux-3.19.0/fs/btrfs/extent_io.c:2056!
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: invalid opcode:  [#1] SMP
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: Modules linked in: 8021q
garp mrp stp llc binfmt_misc ipmi_ssif amdkfd amd_iommu_v2 gpio_ich
radeon ttm drm_kms_helper lpc_ich coretemp drm kvm_intel kvm
i5000_edac i2c_algo_bit edac_core i5k_amb shpchp ipmi_si serio_raw
8250_fintek ioatdma dca joydev mac_hid ipmi_msghandler bonding autofs4
btrfs ses enclosure raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor async_tx xor raid6_pq hid_generic raid1 e1000e
raid0 usbhid mptsas mptscsih multipath psmouse hid mptbase ptp
scsi_transport_sas pps_core linear
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: CPU: 0 PID: 1150 Comm:
kworker/u16:12 Not tainted 3.19.0-21-generic #21-Ubuntu
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: Hardware name: Intel
S5000VSA/S5000VSA, BIOS S5000.86B.15.00.0101.110920101604 11/09/2010
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: Workqueue: btrfs-endio
btrfs_endio_helper [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: task: 88044c603110
ti: 88044b4b8000 task.ti: 88044b4b8000
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: RIP:
0010:[]  []
repair_io_failure+0x1a0/0x220 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: RSP:
0018:88044b4bbba8  EFLAGS: 00010202
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: RAX: 
RBX: 1000 RCX: 
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: RDX: 
RSI: 880449841b08 RDI: 880449841a80
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: RBP: 88044b4bbc08
R08: 00109000 R09: 880449841a80
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: R10: 9000
R11: 0002 R12: 8803fa878068
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: R13: 880448f5d000
R14: 88044cde8d28 R15: 000524f09000
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: FS:
() GS:88045fc0()
knlGS:
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: CS:  0010 DS:  ES:
 CR0: 8005003b
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: CR2: 7fdcef9cafb8
CR3: 01c13000 CR4: 000407f0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: Stack:
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  880448f5d100
1000 4b4bbbd8 ea000fb66d40
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  7000
880449841a80 88044b4bbc08 880439a44b58
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  1000
880448f5d000 88044cde8d28 88044cde8bf0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: Call Trace:
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
clean_io_failure+0x19c/0x1b0 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
end_bio_extent_readpage+0x310/0x5e0 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  [] ?
__slab_free+0xa5/0x320
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  [] ?
native_sched_clock+0x2a/0x90
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
bio_endio+0x6b/0xa0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  [] ?
kmem_cache_free+0x1be/0x200
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
bio_endio_nodec+0x12/0x20
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
end_workqueue_fn+0x3f/0x50 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
normal_work_helper+0xc2/0x2b0 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
btrfs_endio_helper+0x12/0x20 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
process_one_work+0x158/0x430
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
worker_thread+0x5b/0x530
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  [] ?
rescuer_thread+0x3a0/0x3a0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
kthread+0xc9/0xe0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  [] ?
kthread_create_on_node+0x1c0/0x1c0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  []
ret_from_fork+0x58/0x90
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  [] ?
kthread_create_on_node+0x1c0/0x1c0
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: Code: f4 fe ff ff 0f 1f
80 00 00 00 00 0f 0b 66 0f 1f 44 00 00 4c 89 e7 e8 e0 e4 f3 c0 41 b9
fb ff ff ff e9 d2 fe ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c
89 e7 e8 c0 e4 f3 c0 31 f6 4c 89 ef
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel: RIP  []
repair_io_failure+0x1a0/0x220 [btrfs]
Jun 22 14:44:16 srv-lab-ceph-node-01 kernel:  RSP 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


Re: [PATCH v2 6/7] Btrfs: incremental send, don't send utimes for non-existing directory

2015-06-22 Thread Filipe David Manana
On Mon, Jun 22, 2015 at 10:08 AM, Robbie Ko  wrote:
> There's one case where we can't issue a utimes operation for a directory.

There's one where we attempt to get utimes from a directory that
doesn't exist in the send snapshot.

> First, 261 can't move to d/item1 without the rename of inode 265. So as 262.
> Thus 261 and 262 need to wait for rename.
> Second, since 263 will be deleted and there are two waiting sub-directory
> 261 and 262, rmdir_ino of 261 will set to 263 and rmdir_ino of 262 is not set.
> If 262 is processed earlier than 261, utime of both 263 and 264 will be
> updated. However, 263 should not update since it will vanish.

You can't just start explaining an example, referring to inode numbers
etc, without showing the example before. How the parent and send
snapshots look like? So move up the example (parent  and send
snapshots directory hierarchy) before explaining it. We read top down
and not bottom up.

>
> I've found that the following case is the main cause of such error
> and it's fs tree is shown via btrfs-debug-tress as below.
>
> file tree key (459 ROOT_ITEM 20487)
> node 132988928 level 1 items 3 free 490 generation 20487 owner 459
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
> key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
> key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
> key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
> leaf 132710400 items 166 free space 3639 generation 20486 owner 455
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
> item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
> inode generation 20425 transid 20442 size 32 block
> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
> item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
> inode ref index 0 namelen 2 name: ..
> ...
> item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
> location key (0 UNKNOWN.0 0) type XATTR
> namelen 8 datalen 1 name: user.a78
> data a
> binary 61
> leaf 130695168 items 133 free space 7332 generation 20480 owner 455
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
> item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
> inode generation 20428 transid 20434 size 10 block
> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
> item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
> inode ref index 11 namelen 1 name: c
> ...
>
> We can see that inode 262 is right at the end of leaf. Then send_utime() will
> use btrfs_search_slot() to find a appropriate place to put 262 where is at the
> back of 262. However, that place is uninitialized on disk. Suppose we read
> atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
> Receiving side will  got EINVAL since tv_nsec:1919251317 is greater
> than 999,999,999.

"...place to put 262..." -> it's actually inode 263, plus we aren't
attempting to put anything anywhere.

So you can explain this by mentioning that we're trying to send utimes
for a directory/inode that doesn't exist in the send snapshot. That
send_utimes() will use part of a leaf beyond its boundaries or a wrong
slot (belonging to some other unrelated inode), because
btrfs_search_slot() returns 1 when we call it to find the inode item
to extract a utimes value from, and send_utimes() is not prepared to
deal with such case because it assumes no one calls it for an inode
that doesn't exist in the send root. And that you fix the problem in
the offending caller.


>
> So fix this by don't send utimes for non-existing directory for this case.
>
> Example:
>
> Parent snapshot:
> | a/ (ino 259)
> | c (ino 264)
> | b/ (ino 260)
> | d (ino 265)
> | del/ (ino 263)
> | item1/ (ino 261)
> | item2/ (ino 262)
>
> Send snapshot:
> | a/ (ino 259)
> | b/ (ino 260)
> | c/ (ino 264)
> | item2 (ino 262)
> | d/ (ino 265)
> | item1/ (ino 261)
>
> Signed-off-by: Robbie Ko 
> ---
>
> V2:don't send utimes for non-existing directory
>
>  fs/btrfs/send.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index cd22f7d..579a4c8 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3243,8 +3243,18 @@ finish:
>  * and old parent(s).
>  */
> list_for_each_entry(cur, &pm->update_refs, list) {
> -   if (cur->dir == rmdir_ino)
> +   /*
> +* don't send utimes for non-existing directory
> +*/
> +   ret = get_inode_info(sctx->send_root, cur->dir, NULL,
> +NULL , NULL, NULL, NULL, NULL);
> +   if (ret == -ENOENT) {

Re: [PATCH v2 3/7] Btrfs: incremental send, avoid ancestor rename to descendant

2015-06-22 Thread Filipe David Manana
On Mon, Jun 22, 2015 at 10:08 AM, Robbie Ko  wrote:
> There's one more case where we can't issue a rename operation for a directory
> as soon as we process it. We move a directory from ancestor to descendant.
>
> | a
> | b
> | c
>  | d
> "Move a directory from ancestor to descendant" means moving dir. a into dir. c
>
> This case will happen after applying "[PATCH] Btrfs: incremental send,
> don't delay directory renames unnecessarily".
> Because, that patch changes behavior of wait_for_parent_move function.
>
> Example:
> Parent snapshot:
> | @tmp/ (ino 257)
> | pre/ (ino 260)
> | wait_dir (ino 261)
> | ance/ (ino 263)
> | wait_at_below_ance/ (ino 259)
> | desc/ (ino 262)
> | other_dir/ (ino 264)
>
> Send snapshot:
> | @tmp/ (ino 257)
> | other_dir/ (ino 264)
> | wait_at_below_ance/ (ino 259)
> | pre/ (ino 260)
> | wait_dir/ (ino 261)
> | desc/ (ino 262)
> | ance/ (ino 263)
>
> 1. 259 must move to @tmp/other_dir, so it is waiting on other_dir(264).
>
> 2. 260 is able to rename as ance/wait_at_below_ance/pre since
> wait_at_below_ance(259) is waiting and 260 is not the ancestor of 
> wait_at_below_ance(259).
>
> 3. 261 must move to @tmp/other_dir, so it is waiting on other_dir(264).
>
> 4. 262 is able to rename as ance/wait_at_below_ance/pre/wait_dir/desc since
> wait_dir(261) is waiting and 262 is not the ancestor of wait_dir(261).
>
> 5. 263 is rename as ance/wait_at_below_ance/pre/wait_dir/desc/ance since
> wait_dir(261) is waiting and 263 is not the ancestor of wait_dir(261).
>   At the same time, receiving side will encounter error.
>   If anyone calls get_cur_path() to any element in
> ance/wait_at_below_ance/pre/wait_dir/desc/ance like wait_dir(260),
>   there will cause path building loop like this : 261 -> 260 -> 259 ->
> 263 -> 262 -> 261
>
> So fix the problem by check path_loop for this case.
>
> Signed-off-by: Robbie Ko 
> ---
>
> V2: Always check path_loop, and check Allocation ret value.
>
>  fs/btrfs/send.c | 46 +++---
>  1 file changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 44ad144..b946067 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3088,15 +3088,23 @@ static int path_loop(struct send_ctx *sctx, struct 
> fs_path *name,
>
> *ancestor_ino = 0;
> while (ino != BTRFS_FIRST_FREE_OBJECTID) {
> +   struct waiting_dir_move *wdm;
> fs_path_reset(name);
>
> if (is_waiting_for_rm(sctx, ino))
> break;
> -   if (is_waiting_for_move(sctx, ino)) {
> +
> +   wdm = get_waiting_dir_move(sctx, ino);
> +   if (wdm) {
> if (*ancestor_ino == 0)
> *ancestor_ino = ino;
> -   ret = get_first_ref(sctx->parent_root, ino,
> -   &parent_inode, &parent_gen, name);
> +   if (wdm->orphanized) {
> +   ret = gen_unique_name(sctx, ino, gen, name);
> +   break;
> +   } else {
> +   ret = get_first_ref(sctx->parent_root, ino,
> +   
> &parent_inode, &parent_gen, name);
> +   }
> } else {
> ret = __get_cur_name_and_parent(sctx, ino, gen,
> &parent_inode,
> @@ -3743,6 +3751,38 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", 
> sctx->cur_ino);
> }
>
> /*
> +* if cur_ino is cur ancestor, can't move now,
> +* find descendant who is waiting, waiting it.
> +*/

If cur_ino is current ancestor of whom?
"find descendant" -> but below we're looking for an ancestor and
delaying the rename of the current inode (sctx->cur_ino) to happen
after the rename of that ancestor.

> +   if(can_rename) {

Again, please run checkpath.pl against the files.
Kernel coding style, add a space between if and the opening
parenthesis:  if (...) {

> +   struct fs_path *name = NULL;
> +   u64 ancestor;
> +   u64 old_send_progress = sctx->send_progress;
> +
> +   name = fs_path_alloc();
> +   if (!valid_path) {

Wrong variable. Must be:

if (!name) {
   (...)


> +   ret = -ENOMEM;
> +   goto out;
> +   }
> +
> +   sctx->send_progress = sctx->cur_ino + 1;
> +   ret = path_loop(sctx, name, sctx->cur_ino, 
> sctx->cur_inode_gen, &ancestor);

Need 

Re: RAID1: system stability

2015-06-22 Thread Timofey Titovets
Okay, logs, i did release disk /dev/sde1 and get:
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
00 00 00 08 00
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
error, dev sde, sector 287140096
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
SubCode(0x0011) cb_idx mptscsih_io_done
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
00 00 00 08 00
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
error, dev sde, sector 287140096
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev
sde1, logical block 35892256, async page read
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
SubCode(0x0011) cb_idx mptscsih_io_done
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
SubCode(0x0011) cb_idx mptscsih_io_done
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
00 00 00 08 00
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
error, dev sde, sector 287140096
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev
sde1, logical block 35892256, async page read
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
SubCode(0x0011) cb_idx mptscsih_io_done
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
SubCode(0x0011) cb_idx mptscsih_io_done
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] FAILED
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: sd 0:0:5:0: [sde] CDB:
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Read(10): 28 00 11 1d 69
00 00 00 08 00
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: blk_update_request: I/O
error, dev sde, sector 287140096
Jun 22 14:28:40 srv-lab-ceph-node-01 kernel: Buffer I/O error on dev
sde1, logical block 35892256, async page read
Jun 22 14:28:41 srv-lab-ceph-node-01 kernel: mptbase: ioc0:
LogInfo(0x31010011): Originator={PL}, Code={Open Failure},
SubCode(0x0011) cb_idx mptscsih_io_done
Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  end_device-0:0:6:
mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 16, phy
5,sas_addr 0x5000cca00d0514bd
Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  phy-0:0:9: mptsas: ioc0:
delete phy 5, phy-obj (0x880449541400)
Jun 22 14:28:41 srv-lab-ceph-node-01 kernel:  port-0:0:6: mptsas:
ioc0: delete port 6, sas_addr (0x5000cca00d0514bd)
Jun 22 14:28:41 srv-lab-ceph-node-01 kernel: scsi target0:0:5: mptsas:
ioc0: delete device: fw_channel 0, fw_id 16, phy 5, sas_addr
0x5000cca00d0514bd
Jun 22 14:28:44 srv-lab-ceph-node-01 kernel: BTRFS: lost page write
due to I/O error on /dev/sde1
Jun 22 14:28:44 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:28:44 srv-lab-ceph-node-01 kernel: BTRFS: lost page write
due to I/O error on /dev/sde1
Jun 22 14:28:44 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 11, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 12, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:13 srv-lab-ceph-node-01 kernel: BTRFS: bdev /dev/sde1
errs: wr 13, rd 0, flush 0, corrupt 0, gen 0
Jun 22 14:29:22 srv-lab-ceph-node-01 kernel: BTRFS info (device
md127): csum failed ino 1039 extent 390332416 csum 2059524288 wanted
343582415 mirror 0
Jun 22 14:29:22 srv-lab-ceph-node-01 kernel: BUG: unable to handle
kernel paging request at 87fa7ff534

Re: [PATCH v2 2/7] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path

2015-06-22 Thread Filipe David Manana
On Mon, Jun 22, 2015 at 10:08 AM, Robbie Ko  wrote:
> Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs 
> delayed rename

This is mentioned on the cover letter, so no need to repeat this on
the commit message of every patch in the series.

>
> Example1:
> There's one case where we can't issue a rename operation for a directory
> as soon as we process it. Used to delay directory renames if
> wait_parent_move or wait_for_dest_dir_move, maybe cause circular waiting.

This second sentence is confusing to say the least. What is "if
wait_parent_move"? There's nothing in send.c with that name. And the
maybe is equally confusing and redundant. You already explain below
that the problem is a circular waiting, an example and what is a
circular waiting exactly.

>
> Parent snapshot:
> | d/ (ino 257)
> | p1 (ino 258)
> | p1/ (ino 259)
>
> Send snapshot:
> | d/ (ino 257)
> | p1 (ino 259)
> | p1/ (ino 258)
>
> Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
> p1 258 is put into wait_parent_move.

"... is put into wait_parent_move" -> what is wait_parent_move?
There's nothing in send.c with that name. Is it a function, is it a
data structure, or what? Even someone familiar with send's internals
scratches his head trying to understand what does this means.

A better alternative: "Inode 258 became a child of inode 259 and both
were renamed in the send snapshot. Therefore inode 258's rename
operation is delayed to happen after 259 is renamed."
Or something along those lines.

> 259 can't be rename to d/p1, so it is put into

It should be mentioned why 259 can't be renamed.

> circular waiting happens" -> so 259's rename is delayed to happen after 258's 
> rename,
> which creates a circular dependency (258 -> 259 -> 258).
>
> Example2:
> There's one case where we can't issue a rename operation for a directory
> immediately we process it.

We are repeating this sentence in every example. Just say at the very
top that there are several more cases where we can't do the renames
immediately.

> After moving 262 outside, path of 265 is stored in the name_cache_entry.

After renaming inode 262, the name inode 265 has in the parent
snapshot is stored in the name cache.

> When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path 
> of 263
> is still the original path, however. This causes error.

What error? It's important to mention what error it is.

You should explain that after orphanizing 265 we were leaving its old
name in the cache and how that causes a problem.

>
> Parent snapshot:
> | a/ (ino 259)
> | c (ino 266)
> | d/ (ino 260)
> | ance (ino 265)
> | e (ino 261)
> | f (ino 262)
> | ance (ino 263)
>
> Send snapshot:
> | a/ (ino 259)
> | c/ (ino 266)
> | ance (ino 265)
> | d/ (ino 260)
> | ance (ino 263)
> | f/ (ino 262)
> | e (ino 261)
>
> Example3:
> There is another case for 2nd scenario where is_ancestor() can't be used.
>
> Parent snapshot:
> | a/ (ino 261)
> | c (ino 267)
> | d/ (ino 259)
> | ance/ (ino 266)
> | waiting_dir/ (ino 262)
> | pre/ (ino 264)
> | ance/ (ino 265)
>
> Send snapshot:
> | a/ (ino 261)
> | ance/ (ino 266)
> | c (ino 267)
> | waiting_dir/ (ino 262)
> | pre/ (ino 264)
> | d/ (ino 259)
> | ance/ (ino 265)
>
> First, 262 can't move to c/waiting_dir without the rename of inode 267.
> Second, 264 can move into dir 262. Although 262 is waiting, 264 is not
> parent of 262 in the parent root.
> (The second behavior will happen after applying "[PATCH] Btrfs:
> incremental send, don't delay directory renames unnecessarily")
> Finally, 265 will overwrite 266 and path for 265 should be updated
> since 266 is not the ancestor of 265.
> Here we need to check the current state of tree rather than parent
> root which  is_ancestor function does.
>
> Signed-off-by: Robbie Ko 
> ---
>
> V2:when orphanized inode always get_cur_path again.
>
>  fs/btrfs/send.c | 38 --
>  1 file changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 257753b..44ad144 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -230,7 +230,6 @@ struct pending_dir_move {
> u64 parent_ino;
> u64 ino;
> u64 gen;
> -   bool is_orphan;
> struct list_head update_refs;
>  };
>
> @@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, 
> u64 dir, u64 dir_gen,
>  * was already unlinked/moved, so we can safely assume that we will 
> not
>  * overwrite anything at this point in time.
>  */
> -   if (other_inode > sctx->send_progress) {
> +   if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, 
> other_inode)) {
> ret = get_inode_info(sctx->pa

Re: [PATCH v2 1/7] Revert "Btrfs: incremental send, remove dead code"

2015-06-22 Thread Filipe David Manana
On Mon, Jun 22, 2015 at 10:08 AM, Robbie Ko  wrote:
> This reverts commit 5f806c3ae2ff6263a10a6901f97abb74dac03d36.
>

So, this is a revert patch that alone by itself doesn't fix any problem. Fine.

However you are now pasting below the commit message from another
patch in the series (patch 3) that actually makes use of this patch
and fixes something. Just mention here that this is necessary for a
subsequent patch in the series...

Explaining here what some other patch fixes and how is confusing.

> Btrfs: incremental send, avoid ancestor rename to descendant
>
> There's one more case where we can't issue a rename operation for a directory
> as soon as we process it. We move a directory from ancestor to descendant.
>
> | a
> | b
> | c
>  | d
> "Move a directory from ancestor to descendant" means moving dir. a into dir. c
>
> This case will happen after applying "[PATCH] Btrfs: incremental send,
> don't delay directory renames unnecessarily".
> Because, that patch changes behavior of wait_for_parent_move function.
>
> Parent snapshot:
> | @tmp/ (ino 257)
> | pre/ (ino 259)
> | wait_dir (ino 260)
>   | finish_dir2/ (ino 261)
> | ance/ (ino 263)
> | finish_dir1/ (ino 258)
> | desc/ (ino 262)
> | other_dir/ (ino 264)
>
> Send snapshot:
> | @tmp/ (ino 257)
> | other_dir/ (ino 264)
> | wait_dir/ (ino 260)
> | finish_dir2/ (ino 261)
> | desc/ (ino 262)
> | ance/ (ino 263)
> | finish_dir1/ (ino 258)
> | pre/ (ino 259)
>
> 1. 259 can not move under 258 because 263 needs to move to 263 first.
> So 259 is waiting on ance(263).
>
> 2. 260 must move to @tmp/other_dir, so it is waiting on other_dir(264).
>
> 3. 262 is able to rename as pre/wait_dir/finish_dir2(261)/desc since
> wait_dir(260) is waiting and 262 is not the ancestor of wait_dir(260).
>
> 4.263 is able to rename as pre/wait_dir/finish_dir2(261)/ance since
> wait_dir(260) is waiting and 263 is not the ancestor of wait_dir(260).
>
> 5. After wait_dir(263) is finished, all pending dirs. start to run.
> /pre(259) in apply_dir_move() renames /pre as
> pre/wait_dir/finish_dir2/desc/ance/finish_dir1/pre
>   At the same time, receiving side will encounter error.
>   If anyone calls get_cur_path() to any element in
> pre/wait_dir/finish_dir2/desc/ance/finish_dir1/pre like wait_dir(260)
> ,
>   there will cause path building loop like this : 260 -> 259 -> 258 ->
> 263 -> 262 -> 261 -> 260
>
> So fix the problem by check path_loop for this case.
>
> Signed-off-by: Robbie Ko 
> ---
>  fs/btrfs/send.c | 59 
> +
>  1 file changed, 59 insertions(+)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 1c1f161..257753b 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -3080,6 +3080,48 @@ static struct pending_dir_move 
> *get_pending_dir_moves(struct send_ctx *sctx,
> return NULL;
>  }
>
> +static int path_loop(struct send_ctx *sctx, struct fs_path *name,
> +u64 ino, u64 gen, u64 *ancestor_ino)
> +{
> +   int ret = 0;
> +   u64 parent_inode = 0;
> +   u64 parent_gen = 0;
> +   u64 start_ino = ino;
> +
> +   *ancestor_ino = 0;
> +   while (ino != BTRFS_FIRST_FREE_OBJECTID) {
> +   fs_path_reset(name);
> +
> +   if (is_waiting_for_rm(sctx, ino))
> +   break;
> +   if (is_waiting_for_move(sctx, ino)) {
> +   if (*ancestor_ino == 0)
> +   *ancestor_ino = ino;
> +   ret = get_first_ref(sctx->parent_root, ino,
> +   &parent_inode, &parent_gen, name);
> +   } else {
> +   ret = __get_cur_name_and_parent(sctx, ino, gen,
> +   &parent_inode,
> +   &parent_gen, name);
> +   if (ret > 0) {
> +   ret = 0;
> +   break;
> +   }
> +   }
> +   if (ret < 0)
> +   break;
> +   if (parent_inode == start_ino) {
> +   ret = 1;
> +   if (*ancestor_ino == 0)
> +   *ancestor_ino = ino;
> +   break;
> +   }
> +   ino = parent_inode;
> +   gen = parent_gen;
> +   }
> +   return ret;
> +}
> +
>  static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
>  {
> struct fs_path *from_path = NULL;
> @@ -3091,6 +3133,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
> pending_dir_move *pm)
> struct waiting_dir_move *dm = NULL;
> 

[PATCH] Btrfs: Check if kobject is initialized before put

2015-06-22 Thread Anand Jain
Signed-off-by: Anand Jain 
---
 fs/btrfs/sysfs.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index ea81a05..603b0cc 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -523,9 +523,11 @@ static void __btrfs_sysfs_remove_fsid(struct 
btrfs_fs_devices *fs_devs)
fs_devs->device_dir_kobj = NULL;
}
 
-   kobject_del(&fs_devs->super_kobj);
-   kobject_put(&fs_devs->super_kobj);
-   wait_for_completion(&fs_devs->kobj_unregister);
+   if (fs_devs->super_kobj.state_initialized) {
+   kobject_del(&fs_devs->super_kobj);
+   kobject_put(&fs_devs->super_kobj);
+   wait_for_completion(&fs_devs->kobj_unregister);
+   }
 }
 
 /* when fs_devs is NULL it will remove all fsid kobject */
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH v2 4/7] Btrfs: incremental send, fix orphan_dir_info leak

2015-06-22 Thread Robbie Ko
There's one case where we leak a orphan_dir_info structure.

Example:

Parent snapshot:
| a/ (ino 279)
| c (ino 282)
| del/ (ino 281)
| tmp/ (ino 280)
| long/ (ino 283)
| longlong/ (ino 284)

Send snapshot:
| a/ (ino 279)
| long (ino 283)
| longlong (ino 284)
| c/ (ino 282)
| tmp/ (ino 280)

Freeing an existing orphan_dir_info for a directory, when we realize
we can't rmdir the directory because it has a descendant that wasn't
yet processed, and the orphan_dir_info was created because it had a
descendant that had its rename operation delayed.

Signed-off-by: Robbie Ko 
---

V2: modify comment

 fs/btrfs/send.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index b946067..bc9efbe 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2913,6 +2913,11 @@ static int can_rmdir(struct send_ctx *sctx, u64 dir, u64 
dir_gen,
}
 
if (loc.objectid > send_progress) {
+   struct orphan_dir_info *odi;
+
+   odi = get_orphan_dir_info(sctx, dir);
+   if (odi)
+   free_orphan_dir_info(sctx, odi);
ret = 0;
goto out;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH v2 6/7] Btrfs: incremental send, don't send utimes for non-existing directory

2015-06-22 Thread Robbie Ko
There's one case where we can't issue a utimes operation for a directory.
First, 261 can't move to d/item1 without the rename of inode 265. So as 262.
Thus 261 and 262 need to wait for rename.
Second, since 263 will be deleted and there are two waiting sub-directory
261 and 262, rmdir_ino of 261 will set to 263 and rmdir_ino of 262 is not set.
If 262 is processed earlier than 261, utime of both 263 and 264 will be
updated. However, 263 should not update since it will vanish.

I've found that the following case is the main cause of such error
and it's fs tree is shown via btrfs-debug-tress as below.

file tree key (459 ROOT_ITEM 20487)
node 132988928 level 1 items 3 free 490 generation 20487 owner 459
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
leaf 132710400 items 166 free space 3639 generation 20486 owner 455
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 20425 transid 20442 size 32 block
group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
inode ref index 0 namelen 2 name: ..
...
item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
location key (0 UNKNOWN.0 0) type XATTR
namelen 8 datalen 1 name: user.a78
data a
binary 61
leaf 130695168 items 133 free space 7332 generation 20480 owner 455
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 20428 transid 20434 size 10 block
group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
inode ref index 11 namelen 1 name: c
...

We can see that inode 262 is right at the end of leaf. Then send_utime() will
use btrfs_search_slot() to find a appropriate place to put 262 where is at the
back of 262. However, that place is uninitialized on disk. Suppose we read
atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
Receiving side will  got EINVAL since tv_nsec:1919251317 is greater
than 999,999,999.

So fix this by don't send utimes for non-existing directory for this case.

Example:

Parent snapshot:
| a/ (ino 259)
| c (ino 264)
| b/ (ino 260)
| d (ino 265)
| del/ (ino 263)
| item1/ (ino 261)
| item2/ (ino 262)

Send snapshot:
| a/ (ino 259)
| b/ (ino 260)
| c/ (ino 264)
| item2 (ino 262)
| d/ (ino 265)
| item1/ (ino 261)

Signed-off-by: Robbie Ko 
---

V2:don't send utimes for non-existing directory

 fs/btrfs/send.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index cd22f7d..579a4c8 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3243,8 +3243,18 @@ finish:
 * and old parent(s).
 */
list_for_each_entry(cur, &pm->update_refs, list) {
-   if (cur->dir == rmdir_ino)
+   /*
+* don't send utimes for non-existing directory
+*/
+   ret = get_inode_info(sctx->send_root, cur->dir, NULL,
+NULL , NULL, NULL, NULL, NULL);
+   if (ret == -ENOENT) {
+   ret = 0;
continue;
+   }
+   if (ret < 0)
+   goto out;
+
ret = send_utimes(sctx, cur->dir, cur->dir_gen);
if (ret < 0)
goto out;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH v2 7/7] Btrfs: incremental send, avoid the overhead of allocating an orphan_dir_info object unnecessarily

2015-06-22 Thread Robbie Ko
Avoid the overhead of allocating an orphan_dir_info object unnecessarily.

Signed-off-by: Robbie Ko 
---
 fs/btrfs/send.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 579a4c8..9c60421 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2785,12 +2785,6 @@ add_orphan_dir_info(struct send_ctx *sctx, u64 dir_ino)
struct rb_node *parent = NULL;
struct orphan_dir_info *entry, *odi;
 
-   odi = kmalloc(sizeof(*odi), GFP_NOFS);
-   if (!odi)
-   return ERR_PTR(-ENOMEM);
-   odi->ino = dir_ino;
-   odi->gen = 0;
-
while (*p) {
parent = *p;
entry = rb_entry(parent, struct orphan_dir_info, node);
@@ -2799,11 +2793,16 @@ add_orphan_dir_info(struct send_ctx *sctx, u64 dir_ino)
} else if (dir_ino > entry->ino) {
p = &(*p)->rb_right;
} else {
-   kfree(odi);
return entry;
}
}
 
+   odi = kmalloc(sizeof(*odi), GFP_NOFS);
+   if (!odi)
+   return ERR_PTR(-ENOMEM);
+   odi->ino = dir_ino;
+   odi->gen = 0;
+
rb_link_node(&odi->node, parent, p);
rb_insert_color(&odi->node, &sctx->orphan_dirs);
return odi;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH v2 1/7] Revert "Btrfs: incremental send, remove dead code"

2015-06-22 Thread Robbie Ko
This reverts commit 5f806c3ae2ff6263a10a6901f97abb74dac03d36.

Btrfs: incremental send, avoid ancestor rename to descendant

There's one more case where we can't issue a rename operation for a directory
as soon as we process it. We move a directory from ancestor to descendant.

| a
| b
| c
 | d
"Move a directory from ancestor to descendant" means moving dir. a into dir. c

This case will happen after applying "[PATCH] Btrfs: incremental send,
don't delay directory renames unnecessarily".
Because, that patch changes behavior of wait_for_parent_move function.

Parent snapshot:
| @tmp/ (ino 257)
| pre/ (ino 259)
| wait_dir (ino 260)
  | finish_dir2/ (ino 261)
| ance/ (ino 263)
| finish_dir1/ (ino 258)
| desc/ (ino 262)
| other_dir/ (ino 264)

Send snapshot:
| @tmp/ (ino 257)
| other_dir/ (ino 264)
| wait_dir/ (ino 260)
| finish_dir2/ (ino 261)
| desc/ (ino 262)
| ance/ (ino 263)
| finish_dir1/ (ino 258)
| pre/ (ino 259)

1. 259 can not move under 258 because 263 needs to move to 263 first.
So 259 is waiting on ance(263).

2. 260 must move to @tmp/other_dir, so it is waiting on other_dir(264).

3. 262 is able to rename as pre/wait_dir/finish_dir2(261)/desc since
wait_dir(260) is waiting and 262 is not the ancestor of wait_dir(260).

4.263 is able to rename as pre/wait_dir/finish_dir2(261)/ance since
wait_dir(260) is waiting and 263 is not the ancestor of wait_dir(260).

5. After wait_dir(263) is finished, all pending dirs. start to run.
/pre(259) in apply_dir_move() renames /pre as
pre/wait_dir/finish_dir2/desc/ance/finish_dir1/pre
  At the same time, receiving side will encounter error.
  If anyone calls get_cur_path() to any element in
pre/wait_dir/finish_dir2/desc/ance/finish_dir1/pre like wait_dir(260)
,
  there will cause path building loop like this : 260 -> 259 -> 258 ->
263 -> 262 -> 261 -> 260

So fix the problem by check path_loop for this case.

Signed-off-by: Robbie Ko 
---
 fs/btrfs/send.c | 59 +
 1 file changed, 59 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 1c1f161..257753b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3080,6 +3080,48 @@ static struct pending_dir_move 
*get_pending_dir_moves(struct send_ctx *sctx,
return NULL;
 }
 
+static int path_loop(struct send_ctx *sctx, struct fs_path *name,
+u64 ino, u64 gen, u64 *ancestor_ino)
+{
+   int ret = 0;
+   u64 parent_inode = 0;
+   u64 parent_gen = 0;
+   u64 start_ino = ino;
+
+   *ancestor_ino = 0;
+   while (ino != BTRFS_FIRST_FREE_OBJECTID) {
+   fs_path_reset(name);
+
+   if (is_waiting_for_rm(sctx, ino))
+   break;
+   if (is_waiting_for_move(sctx, ino)) {
+   if (*ancestor_ino == 0)
+   *ancestor_ino = ino;
+   ret = get_first_ref(sctx->parent_root, ino,
+   &parent_inode, &parent_gen, name);
+   } else {
+   ret = __get_cur_name_and_parent(sctx, ino, gen,
+   &parent_inode,
+   &parent_gen, name);
+   if (ret > 0) {
+   ret = 0;
+   break;
+   }
+   }
+   if (ret < 0)
+   break;
+   if (parent_inode == start_ino) {
+   ret = 1;
+   if (*ancestor_ino == 0)
+   *ancestor_ino = ino;
+   break;
+   }
+   ino = parent_inode;
+   gen = parent_gen;
+   }
+   return ret;
+}
+
 static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm)
 {
struct fs_path *from_path = NULL;
@@ -3091,6 +3133,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
struct waiting_dir_move *dm = NULL;
u64 rmdir_ino = 0;
int ret;
+   u64 ancestor = 0;
 
name = fs_path_alloc();
from_path = fs_path_alloc();
@@ -3122,6 +3165,22 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
goto out;
 
sctx->send_progress = sctx->cur_ino + 1;
+   ret = path_loop(sctx, name, pm->ino, pm->gen, &ancestor);
+   if (ret) {
+   LIST_HEAD(deleted_refs);
+   ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID);
+   ret = add_pending_dir_move(sctx, pm->ino, pm->gen, ancestor,
+  &pm->update_refs, &deleted_refs,
+

[PATCH v2 2/7] Btrfs: incremental send, avoid circular waiting and descendant overwrite ancestor need to update path

2015-06-22 Thread Robbie Ko
Base on [PATCH] Btrfs: incremental send, check if orphanized dir inode needs 
delayed rename

Example1:
There's one case where we can't issue a rename operation for a directory
as soon as we process it. Used to delay directory renames if
wait_parent_move or wait_for_dest_dir_move, maybe cause circular waiting.

Parent snapshot:
| d/ (ino 257)
| p1 (ino 258)
| p1/ (ino 259)

Send snapshot:
| d/ (ino 257)
| p1 (ino 259)
| p1/ (ino 258)

Here we can not rename 258 from d/p1 to p1/p1 without the rename of inode 259.
p1 258 is put into wait_parent_move. 259 can't be rename to d/p1, so it is put 
into
circular waiting happens" -> so 259's rename is delayed to happen after 258's 
rename,
which creates a circular dependency (258 -> 259 -> 258).

Example2:
There's one case where we can't issue a rename operation for a directory
immediately we process it.
After moving 262 outside, path of 265 is stored in the name_cache_entry.
When 263 try to overwrite 265, its ancestor, 265 is moved to orphanized. Path 
of 263
is still the original path, however. This causes error.

Parent snapshot:
| a/ (ino 259)
| c (ino 266)
| d/ (ino 260)
| ance (ino 265)
| e (ino 261)
| f (ino 262)
| ance (ino 263)

Send snapshot:
| a/ (ino 259)
| c/ (ino 266)
| ance (ino 265)
| d/ (ino 260)
| ance (ino 263)
| f/ (ino 262)
| e (ino 261)

Example3:
There is another case for 2nd scenario where is_ancestor() can't be used.

Parent snapshot:
| a/ (ino 261)
| c (ino 267)
| d/ (ino 259)
| ance/ (ino 266)
| waiting_dir/ (ino 262)
| pre/ (ino 264)
| ance/ (ino 265)

Send snapshot:
| a/ (ino 261)
| ance/ (ino 266)
| c (ino 267)
| waiting_dir/ (ino 262)
| pre/ (ino 264)
| d/ (ino 259)
| ance/ (ino 265)

First, 262 can't move to c/waiting_dir without the rename of inode 267.
Second, 264 can move into dir 262. Although 262 is waiting, 264 is not
parent of 262 in the parent root.
(The second behavior will happen after applying "[PATCH] Btrfs:
incremental send, don't delay directory renames unnecessarily")
Finally, 265 will overwrite 266 and path for 265 should be updated
since 266 is not the ancestor of 265.
Here we need to check the current state of tree rather than parent
root which  is_ancestor function does.

Signed-off-by: Robbie Ko 
---

V2:when orphanized inode always get_cur_path again.

 fs/btrfs/send.c | 38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 257753b..44ad144 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -230,7 +230,6 @@ struct pending_dir_move {
u64 parent_ino;
u64 ino;
u64 gen;
-   bool is_orphan;
struct list_head update_refs;
 };
 
@@ -1840,7 +1839,7 @@ static int will_overwrite_ref(struct send_ctx *sctx, u64 
dir, u64 dir_gen,
 * was already unlinked/moved, so we can safely assume that we will not
 * overwrite anything at this point in time.
 */
-   if (other_inode > sctx->send_progress) {
+   if (other_inode > sctx->send_progress || is_waiting_for_move(sctx, 
other_inode)) {
ret = get_inode_info(sctx->parent_root, other_inode, NULL,
who_gen, NULL, NULL, NULL, NULL);
if (ret < 0)
@@ -3014,7 +3013,6 @@ static int add_pending_dir_move(struct send_ctx *sctx,
pm->parent_ino = parent_ino;
pm->ino = ino;
pm->gen = ino_gen;
-   pm->is_orphan = is_orphan;
INIT_LIST_HEAD(&pm->list);
INIT_LIST_HEAD(&pm->update_refs);
RB_CLEAR_NODE(&pm->node);
@@ -3134,6 +3132,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
u64 rmdir_ino = 0;
int ret;
u64 ancestor = 0;
+   bool is_orphan;
 
name = fs_path_alloc();
from_path = fs_path_alloc();
@@ -3145,9 +3144,10 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
dm = get_waiting_dir_move(sctx, pm->ino);
ASSERT(dm);
rmdir_ino = dm->rmdir_ino;
+   is_orphan = dm->orphanized;
free_waiting_dir_move(sctx, dm);
 
-   if (pm->is_orphan) {
+   if (is_orphan) {
ret = gen_unique_name(sctx, pm->ino,
  pm->gen, from_path);
} else {
@@ -3171,7 +3171,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID);
ret = add_pending_dir_move(sctx, pm->ino, pm->gen, ancestor,
   &pm->update_refs, &deleted_refs,
-  pm->is_orphan);
+  is_orphan);
if (ret < 0)
  

[PATCH v2 5/7] Btrfs: incremental send, fix rmdir but dir have a unprocess item

2015-06-22 Thread Robbie Ko
There's one case where we attempt to rmdir a directory prematurely.

Example:

Parent snapshot:
| a/ (ino 279)
| c (ino 282)
| del/ (ino 281)
| tmp/ (ino 280)
| long/ (ino 283)

Send snapshot:
| a/ (ino 279)
| long (ino 283)
| c/ (ino 282)
| tmp/ (ino 280)

While process inode 281, since inode 280 is waiting for inode 282,
rmdir_ino of struct waitng_dir_move for inode 280 will assigned to 281
and an orphan_dir_info will be created for node 281 in can_rmdir().

Such that, when process inode 282, we will do following steps.
First, move inode 282 from a/c to c
Second, move inode 280 from del/tmp to c/tmp
Third, try to remove inode 281

In Third step, we pass 283 (sctx->cur_ino + 1) as the send_progress to the
can_rmdir() function and that makes it return true when it shouldn't,
because the inode 283 wasn't processed yet and it's still a child of
the directory with inode number 281, which makes the receiver run into
an ENOTEMPTY error when attempting to remove the directory.

Signed-off-by: Robbie Ko 
---

V2:modify comment

 fs/btrfs/send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index bc9efbe..cd22f7d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3213,7 +3213,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct 
pending_dir_move *pm)
/* already deleted */
goto finish;
}
-   ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino + 1);
+   ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino);
if (ret < 0)
goto out;
if (!ret)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in


[PATCH v2 3/7] Btrfs: incremental send, avoid ancestor rename to descendant

2015-06-22 Thread Robbie Ko
There's one more case where we can't issue a rename operation for a directory
as soon as we process it. We move a directory from ancestor to descendant.

| a
| b
| c
 | d
"Move a directory from ancestor to descendant" means moving dir. a into dir. c

This case will happen after applying "[PATCH] Btrfs: incremental send,
don't delay directory renames unnecessarily".
Because, that patch changes behavior of wait_for_parent_move function.

Example:
Parent snapshot:
| @tmp/ (ino 257)
| pre/ (ino 260)
| wait_dir (ino 261)
| ance/ (ino 263)
| wait_at_below_ance/ (ino 259)
| desc/ (ino 262)
| other_dir/ (ino 264)

Send snapshot:
| @tmp/ (ino 257)
| other_dir/ (ino 264)
| wait_at_below_ance/ (ino 259)
| pre/ (ino 260)
| wait_dir/ (ino 261)
| desc/ (ino 262)
| ance/ (ino 263)

1. 259 must move to @tmp/other_dir, so it is waiting on other_dir(264).

2. 260 is able to rename as ance/wait_at_below_ance/pre since
wait_at_below_ance(259) is waiting and 260 is not the ancestor of 
wait_at_below_ance(259).

3. 261 must move to @tmp/other_dir, so it is waiting on other_dir(264).

4. 262 is able to rename as ance/wait_at_below_ance/pre/wait_dir/desc since
wait_dir(261) is waiting and 262 is not the ancestor of wait_dir(261).

5. 263 is rename as ance/wait_at_below_ance/pre/wait_dir/desc/ance since
wait_dir(261) is waiting and 263 is not the ancestor of wait_dir(261).
  At the same time, receiving side will encounter error.
  If anyone calls get_cur_path() to any element in
ance/wait_at_below_ance/pre/wait_dir/desc/ance like wait_dir(260),
  there will cause path building loop like this : 261 -> 260 -> 259 ->
263 -> 262 -> 261

So fix the problem by check path_loop for this case.

Signed-off-by: Robbie Ko 
---

V2: Always check path_loop, and check Allocation ret value.

 fs/btrfs/send.c | 46 +++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 44ad144..b946067 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3088,15 +3088,23 @@ static int path_loop(struct send_ctx *sctx, struct 
fs_path *name,
 
*ancestor_ino = 0;
while (ino != BTRFS_FIRST_FREE_OBJECTID) {
+   struct waiting_dir_move *wdm;
fs_path_reset(name);
 
if (is_waiting_for_rm(sctx, ino))
break;
-   if (is_waiting_for_move(sctx, ino)) {
+
+   wdm = get_waiting_dir_move(sctx, ino);
+   if (wdm) {
if (*ancestor_ino == 0)
*ancestor_ino = ino;
-   ret = get_first_ref(sctx->parent_root, ino,
-   &parent_inode, &parent_gen, name);
+   if (wdm->orphanized) {
+   ret = gen_unique_name(sctx, ino, gen, name);
+   break;
+   } else {
+   ret = get_first_ref(sctx->parent_root, ino,
+   
&parent_inode, &parent_gen, name);
+   }
} else {
ret = __get_cur_name_and_parent(sctx, ino, gen,
&parent_inode,
@@ -3743,6 +3751,38 @@ verbose_printk("btrfs: process_recorded_refs %llu\n", 
sctx->cur_ino);
}
 
/*
+* if cur_ino is cur ancestor, can't move now,
+* find descendant who is waiting, waiting it.
+*/
+   if(can_rename) {
+   struct fs_path *name = NULL;
+   u64 ancestor;
+   u64 old_send_progress = sctx->send_progress;
+
+   name = fs_path_alloc();
+   if (!valid_path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   sctx->send_progress = sctx->cur_ino + 1;
+   ret = path_loop(sctx, name, sctx->cur_ino, 
sctx->cur_inode_gen, &ancestor);
+   if (ret) {
+   ret = add_pending_dir_move(sctx, sctx->cur_ino, 
sctx->cur_inode_gen,
+   ancestor, 
&sctx->new_refs, &sctx->deleted_refs, is_orphan);
+   if (ret < 0) {
+   sctx->send_progress = old_send_progress;
+   fs_path_free(name);
+   goto out;
+   }
+   can_rename = false;
+   *pending_move = 1;
+   }
+   

[PATCH v2 0/7] Btrfs incremental send fix serval case for rename and rm directory

2015-06-22 Thread Robbie Ko
Patch for fix btrfs send receive. These patches base on v4.1
plus following patches.
[PATCH] Btrfs: incremental send, don't delay directory renames unnecessarily
[PATCH] Btrfs: incremental send, check if orphanized dir inode needs delayed 
rename

Thanks.

Robbie Ko (7):
  Revert "Btrfs: incremental send, remove dead code"
  Btrfs: incremental send, avoid circular waiting and descendant
overwrite ancestor need to update path
  Btrfs: incremental send, avoid ancestor rename to descendant
  Btrfs: incremental send, fix orphan_dir_info leak
  Btrfs: incremental send, fix rmdir but dir have a unprocess item
  Btrfs: incremental send, don't send utimes for non-existing directory
  Btrfs: incremental send, avoid the overhead of allocating an
orphan_dir_info object unnecessarily

 fs/btrfs/send.c | 167 +++-
 1 file changed, 153 insertions(+), 14 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in