date:20180417



On 2018年03月27日 15:06, Lu Fengqi wrote:
> The testcase checks the functionality of "btrfs rescue undelete-subvol",
> including recovering an intact subvolume, and handling correctly
> incomplete subvolume.
> 
> Signed-off-by: Lu Fengqi 
> ---
> v2: add shell quoting to test script.
> 
>  .../030-undelete-subvol/deleted_subvolume.img  | Bin 0 -> 4096 bytes
>  .../030-undelete-subvol/drop_progress.raw.xz   | Bin 0 -> 23452 bytes

030 is already taken.

And just a nitpick, the file name is not obvious for its content.
In fact for "drop_progress", it have 2 orphan roots while only one can
be recovered.
For "deleted_subvolume" it has 1 orphan root and it can be recovered.

Despite that, it looks good to me.

Thanks,
Qu

>  tests/misc-tests/030-undelete-subvol/test.sh   |  34 
> +
>  3 files changed, 34 insertions(+)
>  create mode 100644 tests/misc-tests/030-undelete-subvol/deleted_subvolume.img
>  create mode 100644 tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz
>  create mode 100755 tests/misc-tests/030-undelete-subvol/test.sh
> 
> diff --git a/tests/misc-tests/030-undelete-subvol/deleted_subvolume.img 
> b/tests/misc-tests/030-undelete-subvol/deleted_subvolume.img
> new file mode 100644
> index 
> ..9168a8e33fbbb14f9ea78514721a52191b94d1a0
> GIT binary patch
> literal 4096
> zcmeH|do+|=AIB%x8p^0djl0}JQb@$O%TY$;lEx)vlF+ymV;WO45{D?%IqnhB$S}Fi
> zjN6EihKkC4NQhyKaT(V$2k-mO={@KF_pD{Fy}r-)`R-@^*504B_p@W+PlCQF!fF8j
> zZGs!m9b0|F@NPJOvGJw?_&4=#{gw~i3;@(EFS(@+TiUf5-~Erz<=ODYja#|MmTnGi
> zw`~I31pbc*g!Km3O1HeZD~KRV`{Pj~uzt3r2|UgTH}a
> zWDkSP-%;<0K6oq%LMk0=5)g}2I{;`k(RwtnGuI_hB01WRSAongy!#b`_p*d$38#+;
> zcYI;9gzxul-@Q0?=XZCRp1a_O(t&`~#!SFI<>VfKQp$%5D%q@qRO^IYiZQ11
> z*LJ*O@akHeln2DGHANbna7)texSTKS8w%9X2EBg2OP_*;ZbHjccM=V#lO%}
> zATc9#siZ*izH<@v{m4ksm!$X!TLEok)EiGD1DIiBhF{t3@g30ydV%CQ75thpDR_Sz
> zkNpn7>SYJJu-@rDqHlw-Wb-Wdq8i)|MhOAQWLvnU+THRaIvD@>=47X$XKjKl#3c>c
> z+gVL~u-1qB(5Zq5RlSf2!q|mGkUWzvayXPkcZ+}`@>>iAb(MsWDCeg9_g?z@
> z+UzN}@~(J8&7SGm+x+QKwvX*meP^nxg%)4Y+tC!c%#Li1uw)!=uIZLYL%h({>6
> zI?6W8VFdIKTctfwulDqceyAL)1|1Zhpij6oC^|RGxbla6`S|q7kxq3u8hQMPc;I2Y
> zCWZ`c;*ciOjl@3Y^DqdF9BWsR_*&1bIYVLtMwmWM2N+tNJ-7j(O%B-L35XizYGSd1
> zB~$y~k-c18T=w!oFaLH~OTx)ncn*m3#tpvpB({MHrvM%M=n2J!-i}LGsv{?
> z<~5bwAY@lJ)i%#ao zFaVHTbX7=>Op^!R$6CJL^-x?kT(roP@)*Q8XnwB2j3pd-#ynd#jp7WCx#4WSfc|{y
> z#5H*01m+|Z94ps6)UYtvKCre(ZF$dTAxE?$+j2>nxqYpZ_Rkl)A)IQAwvH~^(SzZi
> z0p9u3gpBilS%_h>xu4tFT+N#=O{4QmFL|iRrJJjnU7WSMuKUyHtOvks7;`zGw0BVH
> zJQ}jPdQ!xy(s!Us?OIYAl7nvL$7V~c{(G8ho_TID+~?_UeBFyGcNvvctBnsrrf
> zR>%@OeHtPSWh}(@*~6Mo%wR1Xc{kY<0UPWj-p+;|;pI=V<}0+sd36pSJ9HL1 zedyJJH`V;qs}eahO35L0A>GOmQHYWmh`5IF7C{XLOE6~i$Hfktz9xZp$KS@J7FBer
> zU6XsQh2^|0*VWg^M;!+bUUhGGt|)c?CClbV{)#jL~#x7@Hgi$!??u)-rbQ^;oY
> zY!$-MsH;KQ%fzKtwafY8t$(=9mvBoO{a>gHD^*bc*g7T=1Gzb*{j2yz>ok&
> z7M6gGH4Q*~983Vsv#8?bD~SiDobt+CdWEqjraBk8ESoH7vV6%oI3Ejb*$
> zuNVEyMF_j+Q>(9d(j#nI_W+HW?IqPn%^pKydSPdh>>Y3s5VIg;-RjrS*7CMI zy@k<|nP3VL@ndp87{u~HVW_{Y*$se=Z}$CDIdvvKrhnca(1M-26(9HuWE?Ohji
> zuBpmHt(QTgQ-Yq{m73~S!#xX-w!hB|`k?xhx6>;m!B6zHz=2>h)6=4D_a-K3kJ6t;
> zWMpX8X0U`fZB*msQ+^4vT8hol-6nS`h35;Ds!prmWNZ?5xhs3K9QX}xIY9op0XE<2
> zd-c9BbluK;C2_kTap|WlT=|N;-RvRtrenZQU
> zPiVGB&=zwO#labtRhCASg?G_h$wLNDK8)U6ui@n~0zl=OVap45dP71-m zMM_GKki$jYRwe^&@*Ja~QGj{(V%y{8|cwbYtaLwY@PFu7*0-Gi2}@`5%J{
> zXI9Rmc-)7felm=~CiMi=s78?Ut6J(^D8gq>(j?er4a{gMzE>X8BhF^HR&+yN
> z``L#iGzIqve?b}mDKJ3R?!Sj~hEkClMj*aQPHs-je`Hw)G=a^_UGlEFuf?-JGn
> zrt3Oz*O1f$_uqL2)~`vD^hS#j3mRtrVnrHk6fM)Az8Q5M9FVc;SjPt^dWinLHEw
> z(Yd%niJ8}8Uk@IFxs`OJKq)!4{LOe!xf}C
> zCCb1~=m9Lg$9ZHABR*mWv80B!dG3fdJeMV!XUBJIiz)c*3_^%D@4@C+zdY>o!34x-
> zf=~s)a%m<7<=N$mEk3g{K0aS)5Q~o%4 zYQBQoFFhvh0X-TkNY8+hCRNwn{k z20Fx*h7a@QQ97Te92F>(9R>QL7W2uspH*KopFWYyiKEB0WdA*o`T(uUGBdW_*E
> zzTw9jeaVG?gtI%=*AAzOt@~-PogS@pC ZeM(;pZJdD++rKZroo*A@Ch+ek@Fx_v;e7xA
> 
> literal 0
> HcmV?d1
> 
> diff --git a/tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz 
> b/tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz
> new file mode 100644
> index 
> ..3db2c80e61fa9bb99db374509dbb95910965d467
> GIT binary patch
> literal 23452
> zcmeHPXH=63mktC7z4zV}r8hyO7ZF71J@gJj=p6y+(xijbKnNWvN|D}+^dbU^sPx_m
> zAee9V?4CV)c4qg}`DSLFv-y{QZ{GXfd!PF}w_J((=^Fw77^^d{6wv`JX!HO8fOt%Q
> z7>Rs>(P<0-IA0-=um{T^q#LWS2+;ZNGGoXTU+*0WG?kjb!RY9Rd@zE!
> zprWN%HWIB*2GqTt(OQjR79XdXGC|eojjqRj@Ntje7V=+!*iK3
> ze=gkgvO#(Wb3wy1yH#b!yCt$!YIsJE^$1(}*0xh|G%dc{X8vwN4zHQ}bVmsR zo1FQ@#en!Cnsw{>RrWRKCJQwJZ4r;|fIzaNqWP$xtDSB^E`4R#+kNfF z-Pu^-_90d6FamaI%>#ZlImmLpgm-*(wek`j)@7COOkP^)r&;tzG%wF$uJ>jkzy^dj
> z*q1CnCUcRGf<3#$dF~&$F~=`u6%?zfjrjSgZ#=WTN%JruO@f4|;_Itq@hv``
>

Re: [PATCH v2 08/10] btrfs-progs: undelete-subvol: add undelete-subvol subcommand



On 2018年03月27日 15:06, Lu Fengqi wrote:
> Add the undelete-subvol subcommand for btrfs rescue. This subcommand is
> used to recover deleted subvolume left intact on the device.
> 
> Signed-off-by: Lu Fengqi 
> ---
> v2: add -s option to specify subvol_id.
> 
>  cmds-rescue.c | 70 
> +++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/cmds-rescue.c b/cmds-rescue.c
> index c40088ad374e..c5132126e922 100644
> --- a/cmds-rescue.c
> +++ b/cmds-rescue.c
> @@ -25,6 +25,7 @@
>  #include "disk-io.h"
>  #include "commands.h"
>  #include "utils.h"
> +#include "undelete-subvol.h"
>  #include "help.h"
>  
>  static const char * const rescue_cmd_group_usage[] = {
> @@ -248,6 +249,73 @@ out:
>   return !!ret;
>  }
>  
> +static const char * const cmd_rescue_undelete_subvol_usage[] = {
> + "btrfs rescue undelete-subvol [-s ] ",
> + "Undelete deleted subvolume",
> + "All deleted subvolume that still left intact on the device will be",
> + "recovered. If -s  option is given, then just recover the",
> + "subvolume which specified by .",
> + "",
> + "-s   specify the subvolume which will be recovered.",
> + NULL
> +};
> +
> +static int cmd_rescue_undelete_subvol(int argc, char **argv)
> +{
> + struct btrfs_fs_info *fs_info;
> + char *devname;
> + u64 subvol_id = 0;
> + int ret;
> +
> + while (1) {
> + int c = getopt(argc, argv, "s:");
> +
> + if (c < 0)
> + break;
> + switch (c) {
> + case 's':
> + subvol_id = arg_strtou64(optarg);
> + if (!is_fstree(subvol_id)) {
> + error("%llu is not a valid subvolume id",
> + subvol_id);
> + ret = -EINVAL;
> + goto out;
> + }
> + break;
> + default:
> + usage(cmd_rescue_undelete_subvol_usage);
> + }
> + }
> +
> + if (check_argc_exact(argc - optind, 1))
> + usage(cmd_rescue_undelete_subvol_usage);
> +
> + devname = argv[optind];
> + ret = check_mounted(devname);
> + if (ret < 0) {
> + error("could not check mount status: %s", strerror(-ret));
> + goto out;
> + } else if (ret) {
> + error("%s is currently mounted", devname);
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + fs_info = open_ctree_fs_info(devname, 0, 0, 0, OPEN_CTREE_WRITES |
> +  OPEN_CTREE_PARTIAL);

I'm not sure if using OPEN_CTREE_PARTIAL here is a good idea.

As the undelete-subvol looks like a tool to revert stupid user mistake,
and we expect a healthy fs to be provided.
And for corrupted fs, we may make the case worse.

Thanks,
Qu

> + if (!fs_info) {
> + error("could not open btrfs");
> + ret = -EIO;
> + goto out;
> + }
> +
> + ret = btrfs_undelete_intact_subvols(fs_info->tree_root, subvol_id);
> +
> + close_ctree(fs_info->tree_root);
> +out:
> + return ret;
> +}
> +
>  static const char rescue_cmd_group_info[] =
>  "toolbox for specific rescue operations";
>  
> @@ -260,6 +328,8 @@ const struct cmd_group rescue_cmd_group = {
>   { "zero-log", cmd_rescue_zero_log, cmd_rescue_zero_log_usage, 
> NULL, 0},
>   { "fix-device-size", cmd_rescue_fix_device_size,
>   cmd_rescue_fix_device_size_usage, NULL, 0},
> + { "undelete-subvol", cmd_rescue_undelete_subvol,
> + cmd_rescue_undelete_subvol_usage, NULL, 0},
>   NULL_CMD_STRUCT
>   }
>  };
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 07/10] btrfs-progs: undelete-subvol: introduce btrfs_undelete_intact_subvols



On 2018年03月27日 15:06, Lu Fengqi wrote:
> The function default will traverse the all orphan items on the tree root,
> and recover the all intact subvolumes. If subvol_id is specified, then only
> the corresponding subvolume will be recovered.
> 
> Signed-off-by: Lu Fengqi 
> ---
> v2: add subvol_id argumenta to specify subvol_id instead of recovering
> all subvolumes.
> 
>  undelete-subvol.c | 70 
> +++
>  undelete-subvol.h |  2 ++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/undelete-subvol.c b/undelete-subvol.c
> index 9243e35545c5..5b494ca086ab 100644
> --- a/undelete-subvol.c
> +++ b/undelete-subvol.c
> @@ -15,6 +15,7 @@
>  #include "transaction.h"
>  #include "disk-io.h"
>  #include "messages.h"
> +#include "undelete-subvol.h"
>  
>  /*
>   * Determines whether the subvolume is intact, according to the drop_progress
> @@ -182,3 +183,72 @@ static int link_subvol_to_lostfound(struct btrfs_root 
> *root, u64 subvol_id)
>  out:
>   return ret;
>  }
> +
> +/*
> + * Traverse all orphan items on the root tree, restore them to the lost+found
> + * directory if the corresponding subvolumes are still intact left on the 
> disk.
> + *
> + * @root the root of the root tree.

Same comment here.

> + * @subvol_idif not set to 0, skip other subvolumes and only recover 
> the
> + *   subvolume specified by @subvol_id.
> + *
> + * Return 0 if no error occurred even if no subvolume was recovered.
> + */
> +int btrfs_undelete_intact_subvols(struct btrfs_root *root, u64 subvol_id)

I prefer to remove the word "_intact".

> +{
> + struct btrfs_key key;
> + struct btrfs_path path;
> + u64 found_count = 0;
> + u64 recovered_count = 0;
> + int ret = 0;
> +
> + key.objectid = BTRFS_ORPHAN_OBJECTID;
> + key.type = BTRFS_ORPHAN_ITEM_KEY;
> + key.offset = subvol_id ? subvol_id + 1 : (u64)-1;
> +
> + btrfs_init_path();

I would prefer to do btrfs_search_slot() here, and then use
btrfs_previous_item() to iterate, other than calling btrfs_search_slot()
several times.

> + while (subvol_id != key.offset) {
> + ret = btrfs_search_slot(NULL, root, , , 0, 0);
> + if (ret < 0) {
> + error("search ORPHAN_ITEM for %llu failed.\n",
> +   key.offset);
> + break;
> + }
> +
> + path.slots[0]--;

Btrfs_previous_item() is much better here.

Especially when above btrfs_search_slot() could return 0 (key found) and
path.slots[0] could be 0.
That's also the reason why I prefer btrfs_search_slot() then
btrfs_previous_item() in the loop.

Thanks,
Qu

> + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]);
> +
> + btrfs_release_path();
> +
> + /* No more BTRFS_ORPHAN_ITEM, so we don't need to continue. */
> + if (key.type != BTRFS_ORPHAN_ITEM_KEY) {
> + ret = 0;
> + break;
> + }
> +
> + /* If subvol_id is non-zero, skip other deleted subvolume. */
> + if (subvol_id && subvol_id != key.offset) {
> + ret = -ENOENT;
> + break;
> + }
> +
> + if (!is_subvol_intact(root, key.offset))
> + continue;
> +
> + /* Here we can confirm there is an intact subvolume. */
> + found_count++;
> + ret = link_subvol_to_lostfound(root, key.offset);
> + if (ret == 0) {
> + recovered_count++;
> + printf(
> + "Recovered subvolume %llu to lost+found successfully.\n",
> + key.offset);
> + }
> +
> + }
> +
> + printf("Found %llu subvols left intact\n", found_count);
> + printf("Recovered %llu subvols\n", found_count);
> +
> + return ret;
> +}
> diff --git a/undelete-subvol.h b/undelete-subvol.h
> index 7cfd100cce37..f773210c46fe 100644
> --- a/undelete-subvol.h
> +++ b/undelete-subvol.h
> @@ -14,4 +14,6 @@
>  #ifndef __BTRFS_UNDELETE_SUBVOLUME_H__
>  #define __BTRFS_UNDELETE_SUBVOLUME_H__
>  
> +int btrfs_undelete_intact_subvols(struct btrfs_root *root, u64 subvol_id);
> +
>  #endif
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 06/10] btrfs-progs: undelete-subvol: introduce link_subvol_to_lostfound



On 2018年03月27日 15:06, Lu Fengqi wrote:
> The function will create lost+found directory, link the deleted
> subvolume specified by the subvol_id to the directory, update the
> information of root_item and cleanup the associated orphan item.
> 
> Signed-off-by: Lu Fengqi 
> ---
>  undelete-subvol.c | 76 
> +++
>  1 file changed, 76 insertions(+)
> 
> diff --git a/undelete-subvol.c b/undelete-subvol.c
> index 781057df2b84..9243e35545c5 100644
> --- a/undelete-subvol.c
> +++ b/undelete-subvol.c
> @@ -106,3 +106,79 @@ out:
>   btrfs_release_path();
>   return ret;
>  }
> +
> +/*
> + * Recover a subvolume specified by subvol_id, and link it to the lost+found
> + * directory.
> + *
> + * @root: the root of the root tree.

In that case, @fs_info would case less confusion.

Otherwise looks good.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> + * @subvol_id: specify the subvolume which will be linked, and also be the 
> part
> + * of the subvolume name.
> + *
> + * Return 0 if no error occurred.
> + */
> +static int link_subvol_to_lostfound(struct btrfs_root *root, u64 subvol_id)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_fs_info *fs_info = root->fs_info;
> + struct btrfs_root *fs_root = fs_info->fs_root;
> + char buf[BTRFS_NAME_LEN + 1] = {0}; /* 1 for snprintf null */
> + char *dir_name = "lost+found";
> + u64 lost_found_ino = 0;
> + u32 mode = 0700;
> + int ret;
> +
> + /*
> +  * For link subvolume to lost+found,
> +  * 2 for parent(256)'s dir_index and dir_item
> +  * 2 for lost+found dir's inode_item and inode_ref
> +  * 2 for lost+found dir's dir_index and dir_item for the subvolume
> +  * 2 for the subvolume's root_ref and root_backref
> +  */
> + trans = btrfs_start_transaction(fs_root, 8);
> + if (IS_ERR(trans)) {
> + error("unable to start transaction");
> + ret = PTR_ERR(trans);
> + goto out;
> + }
> +
> + /* Create lost+found directory */
> + ret = btrfs_mkdir(trans, fs_root, dir_name, strlen(dir_name),
> +   BTRFS_FIRST_FREE_OBJECTID, _found_ino,
> +   mode);
> + if (ret < 0) {
> + error("failed to create '%s' dir: %d", dir_name, ret);
> + goto out;
> + }
> +
> + /* Link the subvolume to lost+found directory */
> + snprintf(buf, BTRFS_NAME_LEN + 1, "sub%llu", subvol_id);
> + ret = btrfs_link_subvol(trans, fs_root, buf, subvol_id, lost_found_ino,
> + false);
> + if (ret) {
> + error("failed to link the subvol %llu: %d", subvol_id, ret);
> + goto out;
> + }
> +
> + /* Clear root flags BTRFS_ROOT_SUBVOL_DEAD and increase root refs */
> + ret = recover_dead_root(trans, root, subvol_id);
> + if (ret)
> + goto out;
> +
> + /* Delete the orphan item after undeletion is completed. */
> + ret = btrfs_del_orphan_item(trans, root, subvol_id);
> + if (ret) {
> + error("failed to delete the orphan_item for %llu: %d",
> + subvol_id, ret);
> + goto out;
> + }
> +
> + ret = btrfs_commit_transaction(trans, fs_root);
> + if (ret) {
> + error("transaction commit failed: %d", ret);
> + goto out;
> + }
> +
> +out:
> + return ret;
> +}
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 05/10] btrfs-progs: undelete-subvol: introduce recover_dead_root



On 2018年03月27日 15:06, Lu Fengqi wrote:
> The function will find the root_item specified by the subvol_id,
> clear the BTRFS_ROOT_SUBVOL_DEAD flag and set root_refs to one.

Sorry I didn't point it out in btrfs_link_subvol() patch, but at least
to me, refs should only be modified by btrfs_link_subvol(), just like
btrfs_unlink().

Thanks,
Qu

> 
> Signed-off-by: Lu Fengqi 
> ---
>  ctree.h   |  1 +
>  undelete-subvol.c | 55 
> +++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/ctree.h b/ctree.h
> index 4bff0b821472..7c0b8150bc4e 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -184,6 +184,7 @@ static int btrfs_csum_sizes[] = { 4 };
>  #define BTRFS_FT_MAX 9
>  
>  #define BTRFS_ROOT_SUBVOL_RDONLY (1ULL << 0)
> +#define BTRFS_ROOT_SUBVOL_DEAD   (1ULL << 48)
>  
>  /*
>   * the key defines the order in the tree, and so it also defines (optimal)
> diff --git a/undelete-subvol.c b/undelete-subvol.c
> index 00fcc4895778..781057df2b84 100644
> --- a/undelete-subvol.c
> +++ b/undelete-subvol.c
> @@ -12,6 +12,9 @@
>   */
>  
>  #include "ctree.h"
> +#include "transaction.h"
> +#include "disk-io.h"
> +#include "messages.h"
>  
>  /*
>   * Determines whether the subvolume is intact, according to the drop_progress
> @@ -51,3 +54,55 @@ out:
>   btrfs_release_path();
>   return ret;
>  }
> +
> +/*
> + * Clear BTRFS_ROOT_SUBVOL_DEAD flag and set the root_refs to one.
> + *
> + * @root the root of root tree.
> + * @subvol_idspecify the root_item which will be modified.
> + *
> + * Return 0 if no error occurred.
> + */
> +static int recover_dead_root(struct btrfs_trans_handle *trans,
> +  struct btrfs_root *root, u64 subvol_id)
> +{
> + struct btrfs_key key;
> + struct btrfs_path path;
> + struct extent_buffer *leaf;
> + struct btrfs_root_item root_item;
> + u64 root_flags;
> + u64 offset;
> + int ret;
> +
> + key.objectid = subvol_id;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = 0;
> +
> + btrfs_init_path();
> + ret = btrfs_search_slot(trans, root, , , 0, 0);
> + if (ret) {
> + error("couldn't find ROOT_ITEM for %llu failed: %d",
> + subvol_id, ret);
> + goto out;
> + }
> +
> + leaf = path.nodes[0];
> +
> + offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
> + read_extent_buffer(leaf, _item, offset, sizeof(root_item));
> +
> + /* Clear BTRFS_ROOT_SUBVOL_DEAD */
> + root_flags = btrfs_root_flags(_item);
> + btrfs_set_root_flags(_item,
> +  root_flags & ~BTRFS_ROOT_SUBVOL_DEAD);
> +
> + /* Increase the refs */
> + btrfs_set_root_refs(_item, 1);
> +
> + write_extent_buffer(leaf, _item, offset, sizeof(root_item));
> + btrfs_mark_buffer_dirty(leaf);
> +
> +out:
> + btrfs_release_path();
> + return ret;
> +}
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 04/10] btrfs-progs: undelete-subvol: introduce is_subvol_intact



On 2018年03月27日 15:06, Lu Fengqi wrote:
> The function is used to determine whether the subvolume is intact.
> 
> Signed-off-by: Lu Fengqi 
> ---
>  Makefile  |  3 ++-
>  undelete-subvol.c | 53 +
>  undelete-subvol.h | 17 +
>  3 files changed, 72 insertions(+), 1 deletion(-)
>  create mode 100644 undelete-subvol.c
>  create mode 100644 undelete-subvol.h
> 
> diff --git a/Makefile b/Makefile
> index 6065522a615f..cb38984c7386 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o 
> extent-tree.o print-tree.o \
> qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \
> kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o 
> task-utils.o \
> inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \
> -   fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
> +   fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \
> +   undelete-subvol.o
>  cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o 
> \
>  cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>  cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
> diff --git a/undelete-subvol.c b/undelete-subvol.c
> new file mode 100644
> index ..00fcc4895778
> --- /dev/null
> +++ b/undelete-subvol.c
> @@ -0,0 +1,53 @@
> +/*
> + * Copyright (C) 2018 Fujitsu.  All rights reserved.

IIRC David will remove all such copy right line.
Is there some principle about this, David?

> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +
> +#include "ctree.h"
> +
> +/*
> + * Determines whether the subvolume is intact, according to the drop_progress
> + * of the root item specified by subvol_id.
> + *
> + * Return true if the subvolume is intact, otherwise return false.
> + */
> +static bool is_subvol_intact(struct btrfs_root *root, u64 subvol_id)

Here we have 2 parameters which can represent a root, so please explain
the meaning of each parameter.

And after looking into the code, @root should be tree root, and in that
case, I prefer passing @fs_info here to get rid of any confusion.

> +{
> + struct btrfs_path path;
> + struct btrfs_key key;
> + struct extent_buffer *leaf;
> + struct btrfs_root_item root_item;
> + u64 offset;
> + int ret;
> +
> + key.objectid = subvol_id;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = 0;
> +
> + btrfs_init_path();
> + ret = btrfs_search_slot(NULL, root, , , 0, 0);

Instead of setting up keys and search manually, what about using
btrfs_read_fs_root()?

> + if (ret) {
> + ret = false;
> + goto out;
> + }
> +
> + leaf = path.nodes[0];
> + offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
> +
> + read_extent_buffer(leaf, _item, offset, sizeof(root_item));
> +
> + /* the subvolume is intact if the objectid of drop_progress == 0 */
> + ret = btrfs_disk_key_objectid(_item.drop_progress) ? false : true;
> +
> +out:
> + btrfs_release_path();
> + return ret;
> +}
> diff --git a/undelete-subvol.h b/undelete-subvol.h
> new file mode 100644
> index ..7cfd100cce37
> --- /dev/null
> +++ b/undelete-subvol.h
> @@ -0,0 +1,17 @@
> +/*
> + * Copyright (C) 2018 Fujitsu.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +
> +#ifndef __BTRFS_UNDELETE_SUBVOLUME_H__
I think the empty header is not really needed here.
What about introducing it when we have something here?

Thanks,
Qu

> +#define __BTRFS_UNDELETE_SUBVOLUME_H__
> +
> +#endif
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 03/10] btrfs-progs: use btrfs_find_free_dir_index to find free inode index



On 2018年03月27日 15:06, Lu Fengqi wrote:
> Since we have an existing function to find free inode index, we can
> reuse it here.
> 
> Signed-off-by: Lu Fengqi 

Looks good.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  inode.c | 27 +++
>  1 file changed, 7 insertions(+), 20 deletions(-)
> 
> diff --git a/inode.c b/inode.c
> index 478036562652..86905365dfd8 100644
> --- a/inode.c
> +++ b/inode.c
> @@ -628,7 +628,7 @@ int btrfs_link_subvol(struct btrfs_trans_handle *trans, 
> struct btrfs_root *root,
>   struct btrfs_inode_item *inode_item;
>   struct extent_buffer *leaf;
>   struct btrfs_key key;
> - u64 index = 2;
> + u64 index;
>   char buf[BTRFS_NAME_LEN + 1]; /* for snprintf null */
>   int len;
>   int i;
> @@ -638,28 +638,15 @@ int btrfs_link_subvol(struct btrfs_trans_handle *trans, 
> struct btrfs_root *root,
>   if (len == 0 || len > BTRFS_NAME_LEN)
>   return -EINVAL;
>  
> - /* find the free dir_index */
> - btrfs_init_path();
> - key.objectid = dirid;
> - key.type = BTRFS_DIR_INDEX_KEY;
> - key.offset = (u64)-1;
> -
> - ret = btrfs_search_slot(NULL, root, , , 0, 0);
> - if (ret <= 0) {
> - error("search for DIR_INDEX dirid %llu failed: %d",
> - (unsigned long long)dirid, ret);
> + /* add the dir_item/dir_index */
> + ret = btrfs_find_free_dir_index(root, dirid, );
> + if (ret < 0) {
> + error("unable to find free dir index dirid %llu failed: %d",
> +   (unsigned long long)dirid, ret);
>   goto fail;
>   }
>  
> - if (path.slots[0] > 0) {
> - path.slots[0]--;
> - btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]);
> - if (key.objectid == dirid && key.type == BTRFS_DIR_INDEX_KEY)
> - index = key.offset + 1;
> - }
> - btrfs_release_path();
> -
> - /* add the dir_item/dir_index */
> + btrfs_init_path();
>   key.objectid = dirid;
>   key.offset = 0;
>   key.type =  BTRFS_INODE_ITEM_KEY;
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 02/10] btrfs-progs: extract btrfs_link_subvol from btrfs_mksubvol



On 2018年03月27日 15:06, Lu Fengqi wrote:
> The original btrfs_mksubvol is too specific to specify the directory that
> the subvolume will link to. Furthermore, in this transaction, we don't only
> need to create root_ref/dir-item, but also update the refs or flags of
> root_item. Extract a generic btrfs_link_subvol that allow the caller pass a
> trans argument for later subvolume undelete.

The idea makes sense.

Although some small nitpicks inlined below.

> 
> No functional changes for the btrfs_mksubvol.
> 
> Signed-off-by: Lu Fengqi 
> ---
>  convert/main.c | 57 +
>  ctree.h|  5 +++--
>  inode.c| 46 +-
>  3 files changed, 81 insertions(+), 27 deletions(-)
> 
> diff --git a/convert/main.c b/convert/main.c
> index 6bdfab40d0b0..dd6b42a1f2b1 100644
> --- a/convert/main.c
> +++ b/convert/main.c
> @@ -1002,6 +1002,63 @@ err:
>   return ret;
>  }
>  
> +/*
> + * Link the subvolume specified by @root_objectid to the root_dir of @root.

However the function name is btrfs_mksubvol(), not btrfs_link_subvol().
After reading the code, it turns out that btrfs_mksubvol() is just an
less generic btrfs_link_subvol().

> + * @root the root of the file tree which the subvolume will
> + *   be linked to.

Here you're using btrfs_root for source subvolume.

> + * @subvol_name  the name of the subvolume which will be linked.
> + * @root_objectidspecify the subvolume which will be linked.

But here you're specifying subvolume id as destination.

It would be much better to unify both parameter to the same structure.
And personally I prefer both btrfs_root.

> + * @convert  the flag to determine whether to try to resolve
> + *   the name conflict.

This parameter is not used in this function, and the name "convert"
doesn't reflect the name conflict check.

Personally speaking, I would like the parameters to be something like
btrfs_mksubolv(struct btrfs_root *dst_root,
   struct btrfs_root *src_root)

> + *
> + * Return the root of the subvolume if success, otherwise return NULL.
> + */
> +static struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root,
> +  const char *subvol_name,
> +  u64 root_objectid,
> +  bool convert)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_root *subvol_root = NULL;
> + struct btrfs_key key;
> + u64 dirid = btrfs_root_dirid(>root_item);
> + int ret;
> +
> +
> + trans = btrfs_start_transaction(root, 1);
> + if (IS_ERR(trans)) {
> + error("unable to start transaction");
> + goto fail;
> + }
> +
> + ret = btrfs_link_subvol(trans, root, subvol_name, root_objectid, dirid,
> + true);
> + if (ret) {
> + error("unable to link subvolume %s", subvol_name);
> + goto fail;
> + }
> +
> + ret = btrfs_commit_transaction(trans, root);
> + if (ret) {
> + error("transaction commit failed: %d", ret);
> + goto fail;
> + }
> +
> + key.objectid = root_objectid;
> + key.offset = (u64)-1;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> +
> + subvol_root = btrfs_read_fs_root(root->fs_info, );
> + if (!subvol_root) {
> + error("unable to link subvolume %s", subvol_name);
> + goto fail;
> + }
> +
> +fail:
> + return subvol_root;
> +}
> +
>  /*
>   * Migrate super block to its default position and zero 0 ~ 16k
>   */
> diff --git a/ctree.h b/ctree.h
> index 0fc31dd8d998..4bff0b821472 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -2797,8 +2797,9 @@ int btrfs_del_orphan_item(struct btrfs_trans_handle 
> *trans,
> struct btrfs_root *root, u64 offset);
>  int btrfs_mkdir(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>   char *name, int namelen, u64 parent_ino, u64 *ino, int mode);
> -struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root, const char *base,
> -   u64 root_objectid, bool convert);
> +int btrfs_link_subvol(struct btrfs_trans_handle *trans, struct btrfs_root 
> *root,
> +   const char *base, u64 root_objectid, u64 dirid,
> +   bool convert);
>  
>  /* file.c */
>  int btrfs_get_extent(struct btrfs_trans_handle *trans,
> diff --git a/inode.c b/inode.c
> index 8d0812c7cf50..478036562652 100644
> --- a/inode.c
> +++ b/inode.c
> @@ -606,19 +606,28 @@ out:
>   return ret;
>  }
>  
> -struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root,
> -   const char *base, u64 root_objectid,
> -   bool convert)
> +/*
> + * Link the subvolume specified by @root_objectid to the directory specified 
> by
> + * @dirid on the

Re: [PATCH v2 00/10] undelete subvolume offline version

2018-04-17 Thread Lu Fengqi

On Tue, Mar 27, 2018 at 03:06:48PM +0800, Lu Fengqi wrote:
>This patchset will add undelete-subvol subcommand for btrfs rescue. This
>enhancement allows undeleting a subvolume on an unmounted filesystem.

Gentle ping.

Any advice about this patchset will be welcomed.

-- 
Thanks,
Lu

>
>Patchset can be fetched from github:
>https://github.com/littleroad/btrfs-progs.git undelete
>
>v1->v2: add -s option to allow user specify the subvolume which will be
>recovered.
>
>The first six patches are not modified.
>7th-8th add the -s option to specify subvol_id instead of recovering all
>subvolumes.
>9th add shell quoting to test script.
>10th just add the -s option documentation.
>
>Lu Fengqi (10):
>  btrfs-progs: copy btrfs_del_orphan_item from kernel
>  btrfs-progs: extract btrfs_link_subvol from btrfs_mksubvol
>  btrfs-progs: use btrfs_find_free_dir_index to find free inode index
>  btrfs-progs: undelete-subvol: introduce is_subvol_intact
>  btrfs-progs: undelete-subvol: introduce recover_dead_root
>  btrfs-progs: undelete-subvol: introduce link_subvol_to_lostfound
>  btrfs-progs: undelete-subvol: introduce btrfs_undelete_intact_subvols
>  btrfs-progs: undelete-subvol: add undelete-subvol subcommand
>  btrfs-progs: tests: add testcase for undelete-subvol
>  btrfs-progs: undelete-subvol: update completion and documentation
>
> Documentation/btrfs-rescue.asciidoc|  12 +
> Makefile   |   3 +-
> btrfs-completion   |   2 +-
> cmds-rescue.c  |  70 ++
> convert/main.c |  57 +
> ctree.h|   8 +-
> inode.c|  99 
> .../030-undelete-subvol/deleted_subvolume.img  | Bin 0 -> 4096 bytes
> .../030-undelete-subvol/drop_progress.raw.xz   | Bin 0 -> 23452 bytes
> tests/misc-tests/030-undelete-subvol/test.sh   |  34 +++
> undelete-subvol.c  | 254 +
> undelete-subvol.h  |  19 ++
> 12 files changed, 511 insertions(+), 47 deletions(-)
> create mode 100644 tests/misc-tests/030-undelete-subvol/deleted_subvolume.img
> create mode 100644 tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz
> create mode 100755 tests/misc-tests/030-undelete-subvol/test.sh
> create mode 100644 undelete-subvol.c
> create mode 100644 undelete-subvol.h
>
>-- 
>2.16.2
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 3/3] btrfs: Allow rmdir(2) to delete an empty subvolume

Change the behavior of rmdir(2) and allow it to delete an empty subvolume
by using btrfs_delete_subvolume() which is used by
btrfs_ioctl_snap_destroy(). The required lock for @dir and inode of
@dentry is already acquired in vfs layer.

We need some check before deleting a subvolume. Permission check is done
in vfs layer, emptiness check is in btrfs_rmdir() and additional check
(i.e. neither the subvolume is a default subvolume nor send is in progress)
is in btrfs_delete_subvolume().

Note that in btrfs_ioctl_snap_destroy(), d_delete() is called after
btrfs_delete_subvolume(). For rmdir(2), d_delete() is called in vfs
layer later.

Tested-by: Goffredo Baroncelli 
Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b421aa8dd1a4..231e4c2ea1b4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4522,7 +4522,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry 
*dentry)
if (inode->i_size > BTRFS_EMPTY_DIR_SIZE)
return -ENOTEMPTY;
if (btrfs_ino(BTRFS_I(inode)) == BTRFS_FIRST_FREE_OBJECTID)
-   return -EPERM;
+   return btrfs_delete_subvolume(dir, dentry);
 
trans = __unlink_start_trans(dir);
if (IS_ERR(trans))
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 2/3] btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy()

Factor out the secand half of btrfs_ioctl_snap_destroy() as
btrfs_delete_subvolume(), which performes some subvolume specific checks
before deletion (1. send is not in progress. 2. the subvolume
is not the default subvolume. 3. the subvolume does not contain other
subvolumes) and actual deletion process. btrfs_delete_subvolume() requires
inode_lock for both @dir and inode of @dentry. The remaining part of
btrfs_ioctl_snap_destroy() is mainly permission checks.

Note that call of d_delete() is not included in btrfs_delete_subvolume()
as this function will also be used by btrfs_rmdir() to delete an empty
subvolume and in that case d_delete() is called in vfs layer.

As a result, btrfs_unlink_subvol() and may_destroy_subvol()
become static functions. No functional change happens.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ctree.h |   6 +--
 fs/btrfs/inode.c | 143 ++-
 fs/btrfs/ioctl.c | 131 +-
 3 files changed, 144 insertions(+), 136 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 64c119349f98..34c22026e905 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3166,11 +3166,7 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
 int btrfs_add_link(struct btrfs_trans_handle *trans,
   struct btrfs_inode *parent_inode, struct btrfs_inode *inode,
   const char *name, int name_len, int add_backref, u64 index);
-int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root,
-   struct inode *dir, u64 objectid,
-   const char *name, int name_len);
-noinline int may_destroy_subvol(struct btrfs_root *root);
+int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry);
 int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 804a1913dabf..b421aa8dd1a4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4237,7 +4237,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry 
*dentry)
return ret;
 }
 
-int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
+static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len)
@@ -4321,7 +4321,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
 /*
  * helper to check if the subvolume references other subvolumes
  */
-noinline int may_destroy_subvol(struct btrfs_root *root)
+static noinline int may_destroy_subvol(struct btrfs_root *root)
 {
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_path *path;
@@ -4372,6 +4372,145 @@ noinline int may_destroy_subvol(struct btrfs_root *root)
return ret;
 }
 
+int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
+{
+   struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
+   struct btrfs_root *root = BTRFS_I(dir)->root;
+   struct inode *inode = d_inode(dentry);
+   struct btrfs_root *dest = BTRFS_I(inode)->root;
+   struct btrfs_trans_handle *trans;
+   struct btrfs_block_rsv block_rsv;
+   u64 root_flags;
+   u64 qgroup_reserved;
+   int ret;
+   int err;
+
+   /*
+* Don't allow to delete a subvolume with send in progress. This is
+* inside the i_mutex so the error handling that has to drop the bit
+* again is not run concurrently.
+*/
+   spin_lock(>root_item_lock);
+   root_flags = btrfs_root_flags(>root_item);
+   if (dest->send_in_progress == 0) {
+   btrfs_set_root_flags(>root_item,
+   root_flags | BTRFS_ROOT_SUBVOL_DEAD);
+   spin_unlock(>root_item_lock);
+   } else {
+   spin_unlock(>root_item_lock);
+   btrfs_warn(fs_info,
+  "Attempt to delete subvolume %llu during send",
+  dest->root_key.objectid);
+   err = -EPERM;
+   return err;
+   }
+
+   down_write(_info->subvol_sem);
+
+   err = may_destroy_subvol(dest);
+   if (err)
+   goto out_up_write;
+
+   btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP);
+   /*
+* One for dir inode, two for dir entries, two for root
+* ref/backref.
+*/
+   err = btrfs_subvolume_reserve_metadata(root, _rsv,
+  5, _reserved, true);
+   if (err)
+   goto out_up_write;
+
+   trans = btrfs_start_transaction(root, 0);
+   if (IS_ERR(trans)) {
+   err = PTR_ERR(trans);
+   goto out_release;
+   }
+

[PATCH v5 1/3] btrfs: Move may_destroy_subvol() from ioctl.c to inode.c

This is a preparation work to refactor btrfs_ioctl_snap_destroy()
and to allow rmdir(2) to delete an empty subvolume.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ctree.h |  1 +
 fs/btrfs/inode.c | 54 ++
 fs/btrfs/ioctl.c | 54 --
 3 files changed, 55 insertions(+), 54 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 5474ef14d6e6..64c119349f98 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3170,6 +3170,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
+noinline int may_destroy_subvol(struct btrfs_root *root);
 int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e064c49c9a9a..804a1913dabf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4318,6 +4318,60 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
return ret;
 }
 
+/*
+ * helper to check if the subvolume references other subvolumes
+ */
+noinline int may_destroy_subvol(struct btrfs_root *root)
+{
+   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct btrfs_path *path;
+   struct btrfs_dir_item *di;
+   struct btrfs_key key;
+   u64 dir_id;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   /* Make sure this root isn't set as the default subvol */
+   dir_id = btrfs_super_root_dir(fs_info->super_copy);
+   di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path,
+  dir_id, "default", 7, 0);
+   if (di && !IS_ERR(di)) {
+   btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
+   if (key.objectid == root->root_key.objectid) {
+   ret = -EPERM;
+   btrfs_err(fs_info,
+ "deleting default subvolume %llu is not 
allowed",
+ key.objectid);
+   goto out;
+   }
+   btrfs_release_path(path);
+   }
+
+   key.objectid = root->root_key.objectid;
+   key.type = BTRFS_ROOT_REF_KEY;
+   key.offset = (u64)-1;
+
+   ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0);
+   if (ret < 0)
+   goto out;
+   BUG_ON(ret == 0);
+
+   ret = 0;
+   if (path->slots[0] > 0) {
+   path->slots[0]--;
+   btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
+   if (key.objectid == root->root_key.objectid &&
+   key.type == BTRFS_ROOT_REF_KEY)
+   ret = -ENOTEMPTY;
+   }
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
 static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
struct inode *inode = d_inode(dentry);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 632e26d6f7ce..abd998247d64 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1832,60 +1832,6 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
return ret;
 }
 
-/*
- * helper to check if the subvolume references other subvolumes
- */
-static noinline int may_destroy_subvol(struct btrfs_root *root)
-{
-   struct btrfs_fs_info *fs_info = root->fs_info;
-   struct btrfs_path *path;
-   struct btrfs_dir_item *di;
-   struct btrfs_key key;
-   u64 dir_id;
-   int ret;
-
-   path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
-
-   /* Make sure this root isn't set as the default subvol */
-   dir_id = btrfs_super_root_dir(fs_info->super_copy);
-   di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path,
-  dir_id, "default", 7, 0);
-   if (di && !IS_ERR(di)) {
-   btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
-   if (key.objectid == root->root_key.objectid) {
-   ret = -EPERM;
-   btrfs_err(fs_info,
- "deleting default subvolume %llu is not 
allowed",
- key.objectid);
-   goto out;
-   }
-   btrfs_release_path(path);
-   }
-
-   key.objectid = root->root_key.objectid;
-   key.type = BTRFS_ROOT_REF_KEY;
-   key.offset = (u64)-1;
-
-   ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0);
-   if (ret < 0)
-   goto out;
-   BUG_ON(ret == 0);
-
-   ret = 0;
-   if (path->slots[0] > 0) {
-   path->slots[0]--;
-

[PATCH v5 0/3] Allow rmdir(2) to delete a subvolume

cangelog:
  v4 -> v5 ... Merge 2nd and 3rd patches and update commit log
   No code change in total
  v3 -> v4 ... Reorganize patches and update commit log
   No code change in total
  v2 -> v3 ... Use if-else block instead of two if blocks and
   add Tested-by tag in 2nd patch
  v1 -> v2 ... Split the patch to hopefully make review easier

Note: I will send a xfstest if this series is merged.

This series changes the behavior of rmdir(2) and allow it to delete an
empty subvolume.

In order so that, 1st and 2nd patch refactor btrfs_ioctl_snap_destroy()
and extract the actual deletion process as btrfs_delete_subvolume()
(remaining part in btrfs_ioctl_snap_destroy() is mainly permission checks).

Then, 3rd patch changes btrfs_rmdir() to call this function. The
required permission check is already done in vfs layer.

Tomohiro Misono (3):
  btrfs: Move may_destroy_subvol() from ioctl.c to inode.c
  btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy()
  btrfs: Allow rmdir(2) to delete an empty subvolume

 fs/btrfs/ctree.h |   5 +-
 fs/btrfs/inode.c | 197 ++-
 fs/btrfs/ioctl.c | 185 +--
 3 files changed, 198 insertions(+), 189 deletions(-)

-- 
2.14.3


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: delayed-inode: Remove wrong qgroup meta reservation calls



On 2018年04月17日 23:18, David Sterba wrote:
> On Tue, Apr 17, 2018 at 04:52:45PM +0800, Qu Wenruo wrote:
>> Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for
>> delayed inode and item") merged into mainline is not the updated version
>> submitted to the mail list in Dec 2017.
>>
>> Which lacks the following fixes:
>>
>> 1) Remove btrfs_qgroup_convert_reserved_meta() call in
>>btrfs_delayed_item_release_metadata()
>> 2) Remove btrfs_qgroup_reserve_meta_prealloc() call in
>>btrfs_delayed_inode_reserve_metadata()
>>
>> Those fixes will resolve unexpected EDQUOT problems.
>>
>> Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for 
>> delayed inode and item")
>> Signed-off-by: Qu Wenruo 
> 
> Added to 4.17 queue, thanks.
> 
>> @@ -569,6 +569,12 @@ static int btrfs_delayed_item_reserve_metadata(struct 
>> btrfs_trans_handle *trans,
>>  dst_rsv = _info->delayed_block_rsv;
>>  
>>  num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
>> +
>> +/*
>> + * Here we migrate space rsv from transaction rsv, since have
>> + * already reserved space when starting a transaction.
>> + * So no need to reserve qgroup space here.
>> + */
> 
> Please format the comments to the full line width.

Right, the already can go previous line without exceeding 80 chars.

But the "So no need to..." line is a new line so it will not take up any
space of previous line anyway.

> 
>> @@ -647,7 +657,9 @@ static int btrfs_delayed_inode_reserve_metadata(
>>"delayed_inode",
>>btrfs_ino(inode),
>>num_bytes, 1);
>> -}
>> +} else
>> +btrfs_qgroup_free_meta_prealloc(root,
>> +fs_info->nodesize);
> 
> Please don't diverge from the coding style, I'm fixing such issues but,
> you know.

For this, did you mean the bracket for else branch?

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature

Re: Hard link not persisted on fsync

2018-04-17 Thread Jayashree Mohan

Hi,

A gentle reminder on the crash consistency bug that we found on btrfs:
Link count of a file is not persisted even after a fsync. We believe a
filesystem that ensures strictly ordered metadata behaviour should be
able to persist the hard link after a fsync on the original file.
Could you comment on why btrfs does not exhibit this behavior, and if
it's something you'd want to fix?

Thanks,
Jayashree Mohan



On Mon, Apr 16, 2018 at 9:35 AM, Jayashree Mohan
 wrote:
> Hi,
>
> The following seems to be a crash consistency bug on btrfs, where in
> the link count is not persisted even after a fsync on the original
> file.
>
> Consider the following workload :
> creat foo
> link (foo, A/bar)
> fsync(foo)
> ---Crash---
>
> Now, on recovery we expect the metadata of foo to be persisted i.e
> have a link count of 2. However in btrfs, the link count is 1 and file
> A/bar is not persisted. The expected behaviour would be to persist the
> dependencies of inode foo. That is to say, shouldn't fsync of foo
> persist A/bar and correctly update the link count?
> Note that ext4, xfs and f2fs recover to the correct link count of 2
> for the above workload.
>
> Let us know what you think about this behavior.
>
> Thanks,
> Jayashree Mohan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: add chattr support for send/receive



On 04/17/2018 04:39 PM, Howard McLauchlan wrote:
> Presently btrfs send/receive does not propagate inode attribute flags;
> all chattr operations are effectively discarded upon transmission.
>
> This patch adds kernel support for inode attribute flags. Userspace
> support can be found under the commit:
>
> btrfs-progs: add chattr support for send/receive
>
> An associated xfstest can be found at:
>
> btrfs: add verify chattr support for send/receive test
>
> A caveat is that a user with an updated kernel (aware of chattrs) and an
> older version of btrfs-progs (unaware of chattrs) will fail to receive
> if a chattr is included in the send stream.
>
> Signed-off-by: Howard McLauchlan 
> ---
> Based on 4.17-rc1
>
>  fs/btrfs/ctree.h |   2 +
>  fs/btrfs/ioctl.c |   2 +-
>  fs/btrfs/send.c  | 176 +++
>  fs/btrfs/send.h  |   2 +
>  4 files changed, 154 insertions(+), 28 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 5474ef14d6e6..a0dc6a8a37eb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1436,6 +1436,8 @@ struct btrfs_map_token {
>   unsigned long offset;
>  };
>  
> +unsigned int btrfs_flags_to_ioctl(unsigned int flags);
> +
>  #define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \
>   ((bytes) >> (fs_info)->sb->s_blocksize_bits)
>  
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 632e26d6f7ce..36ce1e589f9e 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -106,7 +106,7 @@ static unsigned int btrfs_mask_flags(umode_t mode, 
> unsigned int flags)
>  /*
>   * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl.
>   */
> -static unsigned int btrfs_flags_to_ioctl(unsigned int flags)
> +unsigned int btrfs_flags_to_ioctl(unsigned int flags)
>  {
>   unsigned int iflags = 0;
>  
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 221e5cdb060b..da521a5a1843 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -101,6 +101,13 @@ struct send_ctx {
>   u64 cur_inode_last_extent;
>   u64 cur_inode_next_write_offset;
>  
> + /*
> +  * state for chattr purposes
> +  */
> + u64 cur_inode_flip_flags;
> + u64 cur_inode_receive_flags;
> + int receive_flags_valid;
> +
>   u64 send_progress;
>  
>   struct list_head new_refs;
> @@ -798,7 +805,7 @@ static int send_rmdir(struct send_ctx *sctx, struct 
> fs_path *path)
>   */
>  static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path,
> u64 ino, u64 *size, u64 *gen, u64 *mode, u64 *uid,
> -   u64 *gid, u64 *rdev)
> +   u64 *gid, u64 *rdev, u64 *flags)
>  {
>   int ret;
>   struct btrfs_inode_item *ii;
> @@ -828,6 +835,8 @@ static int __get_inode_info(struct btrfs_root *root, 
> struct btrfs_path *path,
>   *gid = btrfs_inode_gid(path->nodes[0], ii);
>   if (rdev)
>   *rdev = btrfs_inode_rdev(path->nodes[0], ii);
> + if (flags)
> + *flags = btrfs_inode_flags(path->nodes[0], ii);
>  
>   return ret;
>  }
> @@ -835,7 +844,7 @@ static int __get_inode_info(struct btrfs_root *root, 
> struct btrfs_path *path,
>  static int get_inode_info(struct btrfs_root *root,
> u64 ino, u64 *size, u64 *gen,
> u64 *mode, u64 *uid, u64 *gid,
> -   u64 *rdev)
> +   u64 *rdev, u64 *flags)
>  {
>   struct btrfs_path *path;
>   int ret;
> @@ -844,7 +853,7 @@ static int get_inode_info(struct btrfs_root *root,
>   if (!path)
>   return -ENOMEM;
>   ret = __get_inode_info(root, path, ino, size, gen, mode, uid, gid,
> -rdev);
> +rdev, flags);
>   btrfs_free_path(path);
>   return ret;
>  }
> @@ -1233,7 +1242,7 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 
> root, void *ctx_)
>* accept clones from these extents.
>*/
>   ret = __get_inode_info(found->root, bctx->path, ino, _size, NULL, 
> NULL,
> -NULL, NULL, NULL);
> +NULL, NULL, NULL, NULL);
>   btrfs_release_path(bctx->path);
>   if (ret < 0)
>   return ret;
> @@ -1593,7 +1602,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, 
> u64 ino, u64 gen)
>   u64 right_gen;
>  
>   ret = get_inode_info(sctx->send_root, ino, NULL, _gen, NULL, NULL,
> - NULL, NULL);
> + NULL, NULL, NULL);
>   if (ret < 0 && ret != -ENOENT)
>   goto out;
>   left_ret = ret;
> @@ -1602,7 +1611,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, 
> u64 ino, u64 gen)
>   right_ret = -ENOENT;
>   } else {
>   ret = get_inode_info(sctx->parent_root, ino, NULL, _gen,
> - NULL, NULL, NULL, NULL);
> +

[PATCH] btrfs-progs: add chattr support for send/receive

Presently, btrfs send/receive does not propagate inode attribute flags;
all chattr operations are effectively discarded upon transmission.

This patch adds userspace support for inode attribute flags. Kernel
support can be found under the commit:

btrfs: add chattr support for send/receive

An associated xfstest can also be found at:

btrfs: verify chattr support for send/receive test

A caveat is that a user with an updated kernel (aware of chattrs) and an
older version of btrfs-progs (unaware of chattrs) will fail to receive
if a chattr is included in the send stream.

Signed-off-by: Howard McLauchlan 
---
 cmds-receive.c | 31 +++
 send-dump.c|  8 +++-
 send-stream.c  |  5 +
 send-stream.h  |  1 +
 send.h |  2 ++
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index 68123a31..883aee4e 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctree.h"
 #include "ioctl.h"
@@ -1059,6 +1060,35 @@ static int process_update_extent(const char *path, u64 
offset, u64 len,
return 0;
 }
 
+static int process_chattr(const char *path, u64 flags, void *user)
+{
+   int ret = 0;
+   int fd = 0;
+   int _flags = flags;
+   struct btrfs_receive *rctx = user;
+   char full_path[PATH_MAX];
+
+   ret = path_cat_out(full_path, rctx->full_subvol_path, path);
+   if (ret < 0) {
+   error("chattr: path invalid: %s", path);
+   goto out;
+   }
+
+   if (g_verbose >= 2)
+   fprintf(stderr, "chattr %s - flags=0%o\n", path, (int)flags);
+
+   fd = open(full_path, O_RDONLY);
+   ret = ioctl(fd, FS_IOC_SETFLAGS, &_flags);
+
+   if (ret < 0) {
+   ret = -errno;
+   error("chattr %s failed: %s", path, strerror(-ret));
+   goto out;
+   }
+
+out:
+   return ret;
+}
 static struct btrfs_send_ops send_ops = {
.subvol = process_subvol,
.snapshot = process_snapshot,
@@ -1081,6 +,7 @@ static struct btrfs_send_ops send_ops = {
.chown = process_chown,
.utimes = process_utimes,
.update_extent = process_update_extent,
+   .chattr = process_chattr,
 };
 
 static int do_receive(struct btrfs_receive *rctx, const char *tomnt,
diff --git a/send-dump.c b/send-dump.c
index 1591e0cc..757ed5ec 100644
--- a/send-dump.c
+++ b/send-dump.c
@@ -316,6 +316,11 @@ static int print_update_extent(const char *path, u64 
offset, u64 len,
  offset, len);
 }
 
+static int print_chattr(const char *path, u64 flags, void *user)
+{
+   return PRINT_DUMP(user, path, "chattr", "flags=%llu", flags);
+}
+
 struct btrfs_send_ops btrfs_print_send_ops = {
.subvol = print_subvol,
.snapshot = print_snapshot,
@@ -337,5 +342,6 @@ struct btrfs_send_ops btrfs_print_send_ops = {
.chmod = print_chmod,
.chown = print_chown,
.utimes = print_utimes,
-   .update_extent = print_update_extent
+   .update_extent = print_update_extent,
+   .chattr = print_chattr,
 };
diff --git a/send-stream.c b/send-stream.c
index 78f2571a..2c927af2 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -453,6 +453,11 @@ static int read_and_process_cmd(struct btrfs_send_stream 
*sctx)
TLV_GET_U64(sctx, BTRFS_SEND_A_SIZE, );
ret = sctx->ops->update_extent(path, offset, tmp, sctx->user);
break;
+   case BTRFS_SEND_C_CHATTR:
+   TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, );
+   TLV_GET_U64(sctx, BTRFS_SEND_A_CHATTR, );
+   ret = sctx->ops->chattr(path, tmp, sctx->user);
+   break;
case BTRFS_SEND_C_END:
ret = 1;
break;
diff --git a/send-stream.h b/send-stream.h
index 39901f86..2446f22d 100644
--- a/send-stream.h
+++ b/send-stream.h
@@ -66,6 +66,7 @@ struct btrfs_send_ops {
  struct timespec *mt, struct timespec *ct,
  void *user);
int (*update_extent)(const char *path, u64 offset, u64 len, void *user);
+   int (*chattr)(const char *path, u64 flags, void *user);
 };
 
 int btrfs_read_and_process_send_stream(int fd,
diff --git a/send.h b/send.h
index fe613cbb..fd07495f 100644
--- a/send.h
+++ b/send.h
@@ -94,6 +94,7 @@ enum btrfs_send_cmd {
 
BTRFS_SEND_C_END,
BTRFS_SEND_C_UPDATE_EXTENT,
+   BTRFS_SEND_C_CHATTR,
__BTRFS_SEND_C_MAX,
 };
 #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1)
@@ -131,6 +132,7 @@ enum {
BTRFS_SEND_A_CLONE_PATH,
BTRFS_SEND_A_CLONE_OFFSET,
BTRFS_SEND_A_CLONE_LEN,
+   BTRFS_SEND_A_CHATTR,
 
__BTRFS_SEND_A_MAX,
 };
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH] btrfs: add chattr support for send/receive

Presently btrfs send/receive does not propagate inode attribute flags;
all chattr operations are effectively discarded upon transmission.

This patch adds kernel support for inode attribute flags. Userspace
support can be found under the commit:

btrfs-progs: add chattr support for send/receive

An associated xfstest can be found at:

btrfs: add verify chattr support for send/receive test

A caveat is that a user with an updated kernel (aware of chattrs) and an
older version of btrfs-progs (unaware of chattrs) will fail to receive
if a chattr is included in the send stream.

Signed-off-by: Howard McLauchlan 
---
Based on 4.17-rc1

 fs/btrfs/ctree.h |   2 +
 fs/btrfs/ioctl.c |   2 +-
 fs/btrfs/send.c  | 176 +++
 fs/btrfs/send.h  |   2 +
 4 files changed, 154 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 5474ef14d6e6..a0dc6a8a37eb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1436,6 +1436,8 @@ struct btrfs_map_token {
unsigned long offset;
 };
 
+unsigned int btrfs_flags_to_ioctl(unsigned int flags);
+
 #define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \
((bytes) >> (fs_info)->sb->s_blocksize_bits)
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 632e26d6f7ce..36ce1e589f9e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -106,7 +106,7 @@ static unsigned int btrfs_mask_flags(umode_t mode, unsigned 
int flags)
 /*
  * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl.
  */
-static unsigned int btrfs_flags_to_ioctl(unsigned int flags)
+unsigned int btrfs_flags_to_ioctl(unsigned int flags)
 {
unsigned int iflags = 0;
 
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 221e5cdb060b..da521a5a1843 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -101,6 +101,13 @@ struct send_ctx {
u64 cur_inode_last_extent;
u64 cur_inode_next_write_offset;
 
+   /*
+* state for chattr purposes
+*/
+   u64 cur_inode_flip_flags;
+   u64 cur_inode_receive_flags;
+   int receive_flags_valid;
+
u64 send_progress;
 
struct list_head new_refs;
@@ -798,7 +805,7 @@ static int send_rmdir(struct send_ctx *sctx, struct fs_path 
*path)
  */
 static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path,
  u64 ino, u64 *size, u64 *gen, u64 *mode, u64 *uid,
- u64 *gid, u64 *rdev)
+ u64 *gid, u64 *rdev, u64 *flags)
 {
int ret;
struct btrfs_inode_item *ii;
@@ -828,6 +835,8 @@ static int __get_inode_info(struct btrfs_root *root, struct 
btrfs_path *path,
*gid = btrfs_inode_gid(path->nodes[0], ii);
if (rdev)
*rdev = btrfs_inode_rdev(path->nodes[0], ii);
+   if (flags)
+   *flags = btrfs_inode_flags(path->nodes[0], ii);
 
return ret;
 }
@@ -835,7 +844,7 @@ static int __get_inode_info(struct btrfs_root *root, struct 
btrfs_path *path,
 static int get_inode_info(struct btrfs_root *root,
  u64 ino, u64 *size, u64 *gen,
  u64 *mode, u64 *uid, u64 *gid,
- u64 *rdev)
+ u64 *rdev, u64 *flags)
 {
struct btrfs_path *path;
int ret;
@@ -844,7 +853,7 @@ static int get_inode_info(struct btrfs_root *root,
if (!path)
return -ENOMEM;
ret = __get_inode_info(root, path, ino, size, gen, mode, uid, gid,
-  rdev);
+  rdev, flags);
btrfs_free_path(path);
return ret;
 }
@@ -1233,7 +1242,7 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 
root, void *ctx_)
 * accept clones from these extents.
 */
ret = __get_inode_info(found->root, bctx->path, ino, _size, NULL, 
NULL,
-  NULL, NULL, NULL);
+  NULL, NULL, NULL, NULL);
btrfs_release_path(bctx->path);
if (ret < 0)
return ret;
@@ -1593,7 +1602,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 
ino, u64 gen)
u64 right_gen;
 
ret = get_inode_info(sctx->send_root, ino, NULL, _gen, NULL, NULL,
-   NULL, NULL);
+   NULL, NULL, NULL);
if (ret < 0 && ret != -ENOENT)
goto out;
left_ret = ret;
@@ -1602,7 +1611,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 
ino, u64 gen)
right_ret = -ENOENT;
} else {
ret = get_inode_info(sctx->parent_root, ino, NULL, _gen,
-   NULL, NULL, NULL, NULL);
+   NULL, NULL, NULL, NULL, NULL);
if (ret < 0 && ret != -ENOENT)
goto out;
right_ret = ret;
@@ -1771,7 +1780,7 @@ static int

[PATCH] btrfs: optimize free space tree bitmap conversion

Presently, convert_free_space_to_extents() does a linear scan of the
bitmap. We can speed this up with find_next_{bit,zero_bit}_le().

This patch replaces the linear scan with find_next_{bit,zero_bit}_le().
Testing shows a 20-33% decrease in execution time for
convert_free_space_to_extents().

Suggested-by: Omar Sandoval 
Signed-off-by: Howard McLauchlan 
---
Based on 4.17-rc1

 fs/btrfs/extent_io.c   | 12 -
 fs/btrfs/extent_io.h   |  4 +--
 fs/btrfs/free-space-tree.c | 54 ++
 3 files changed, 27 insertions(+), 43 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e99b329002cf..1c0e7ce49556 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5620,12 +5620,12 @@ void copy_extent_buffer(struct extent_buffer *dst, 
struct extent_buffer *src,
}
 }
 
-void le_bitmap_set(u8 *map, unsigned int start, int len)
+void le_bitmap_set(unsigned long *map, unsigned int start, int len)
 {
-   u8 *p = map + BIT_BYTE(start);
+   char *p = ((char *)map) + BIT_BYTE(start);
const unsigned int size = start + len;
int bits_to_set = BITS_PER_BYTE - (start % BITS_PER_BYTE);
-   u8 mask_to_set = BITMAP_FIRST_BYTE_MASK(start);
+   char mask_to_set = BITMAP_FIRST_BYTE_MASK(start);
 
while (len - bits_to_set >= 0) {
*p |= mask_to_set;
@@ -5640,12 +5640,12 @@ void le_bitmap_set(u8 *map, unsigned int start, int len)
}
 }
 
-void le_bitmap_clear(u8 *map, unsigned int start, int len)
+void le_bitmap_clear(unsigned long *map, unsigned int start, int len)
 {
-   u8 *p = map + BIT_BYTE(start);
+   char *p = ((char *)map) + BIT_BYTE(start);
const unsigned int size = start + len;
int bits_to_clear = BITS_PER_BYTE - (start % BITS_PER_BYTE);
-   u8 mask_to_clear = BITMAP_FIRST_BYTE_MASK(start);
+   char mask_to_clear = BITMAP_FIRST_BYTE_MASK(start);
 
while (len - bits_to_clear >= 0) {
*p &= ~mask_to_clear;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a53009694b16..a6db233f4a40 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -84,8 +84,8 @@ static inline int le_test_bit(int nr, const u8 *addr)
return 1U & (addr[BIT_BYTE(nr)] >> (nr & (BITS_PER_BYTE-1)));
 }
 
-void le_bitmap_set(u8 *map, unsigned int start, int len);
-void le_bitmap_clear(u8 *map, unsigned int start, int len);
+void le_bitmap_set(unsigned long *map, unsigned int start, int len);
+void le_bitmap_clear(unsigned long *map, unsigned int start, int len);
 
 struct extent_state;
 struct btrfs_root;
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index 32a0f6cb5594..0ddf96b01a3a 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -138,9 +138,9 @@ static inline u32 free_space_bitmap_size(u64 size, u32 
sectorsize)
return DIV_ROUND_UP((u32)div_u64(size, sectorsize), BITS_PER_BYTE);
 }
 
-static u8 *alloc_bitmap(u32 bitmap_size)
+static unsigned long *alloc_bitmap(u32 bitmap_size)
 {
-   u8 *ret;
+   unsigned long *ret;
unsigned int nofs_flag;
 
/*
@@ -166,7 +166,8 @@ int convert_free_space_to_bitmaps(struct btrfs_trans_handle 
*trans,
struct btrfs_free_space_info *info;
struct btrfs_key key, found_key;
struct extent_buffer *leaf;
-   u8 *bitmap, *bitmap_cursor;
+   unsigned long *bitmap;
+   char *bitmap_cursor;
u64 start, end;
u64 bitmap_range, i;
u32 bitmap_size, flags, expected_extent_count;
@@ -255,7 +256,7 @@ int convert_free_space_to_bitmaps(struct btrfs_trans_handle 
*trans,
goto out;
}
 
-   bitmap_cursor = bitmap;
+   bitmap_cursor = (char *)bitmap;
bitmap_range = fs_info->sectorsize * BTRFS_FREE_SPACE_BITMAP_BITS;
i = start;
while (i < end) {
@@ -304,13 +305,10 @@ int convert_free_space_to_extents(struct 
btrfs_trans_handle *trans,
struct btrfs_free_space_info *info;
struct btrfs_key key, found_key;
struct extent_buffer *leaf;
-   u8 *bitmap;
+   unsigned long *bitmap;
u64 start, end;
-   /* Initialize to silence GCC. */
-   u64 extent_start = 0;
-   u64 offset;
u32 bitmap_size, flags, expected_extent_count;
-   int prev_bit = 0, bit, bitnr;
+   unsigned long nrbits, start_bit, end_bit;
u32 extent_count = 0;
int done = 0, nr;
int ret;
@@ -348,7 +346,7 @@ int convert_free_space_to_extents(struct btrfs_trans_handle 
*trans,
break;
} else if (found_key.type == 
BTRFS_FREE_SPACE_BITMAP_KEY) {
unsigned long ptr;
-   u8 *bitmap_cursor;
+   char *bitmap_cursor;
u32 bitmap_pos, data_size;

[RFC] Add support for BTRFS raid5/6 to GRUB

2018-04-17 Thread Goffredo Baroncelli

Hi All,

Below you can find a patch to add support for accessing files from grub in a 
RAID5/6 btrfs filesystem. This is a RFC because it is missing the support for 
recovery (i.e. if some devices are missed). In the next days (weeks ?) I will 
extend this patch to support also this case.

Comments are welcome.

BR
G.Baroncelli


---

commit 8c80a1b7c913faf50f95c5c76b4666ed17685666
Author: Goffredo Baroncelli 
Date:   Tue Apr 17 21:40:31 2018 +0200

Add initial support for btrfs raid5/6 chunk

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be195448d..4c5632acb 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
 #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
 #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20
 #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100
   grub_uint8_t dummy2[0xc];
   grub_uint16_t nstripes;
   grub_uint16_t nsubstripes;
@@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
grub_disk_addr_t addr,
  stripe_offset = low + chunk_stripe_length
* high;
  csize = chunk_stripe_length - low;
+ break;
+   }
+ case GRUB_BTRFS_CHUNK_TYPE_RAID5:
+ case GRUB_BTRFS_CHUNK_TYPE_RAID6:
+   {
+ grub_uint64_t nparities;
+ grub_uint64_t parity_pos;
+ grub_uint64_t stripe_nr, high;
+ grub_uint64_t low;
+
+ redundancy = 1;   /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+   {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+   }
+ else
+   {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+   }
+
+ stripe_nr = grub_divmod64 (off, chunk_stripe_length, );
+
+ high = grub_divmod64 (stripe_nr, nstripes - nparities, );
+ grub_divmod64 (high+nstripes-nparities, nstripes, _pos);
+ grub_divmod64 (parity_pos+nparities+stripen, nstripes, );
+
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
  break;
}
  default:


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to dump/find parity of RAID-5 file?

2018-04-17 Thread Goffredo Baroncelli

On 02/06/2017 09:40 PM, Goffredo Baroncelli wrote:
> On 2017-02-03 11:44, Lakshmipathi.G wrote:
>> Hi.
>>
>> Came across this thread 
>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg55161.html 
>> Exploring possibility of adding test-scripts around these area using 
>> dump-tree & corrupt-block.But
>> unable to figure-out how to get parity of file or find its location.  
>> dump-tree output gave,
>>
>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 145096704) itemoff 15557 
>> itemsize 144
>> length 134217728 owner 2 stripe_len 65536 type DATA|RAID5
>> io_align 65536 io_width 65536 sector_size 4096
>> num_stripes 3 sub_stripes 0
>> stripe 0 devid 3 offset 6368 # Is this 
>> parity?Seems empty?
>> dev_uuid f62df114-186c-4e48-8152-9ed15aa078b4
>> stripe 1 devid 2 offset 6368 # Contains file 
>> data-stripe-1
>> dev_uuid c0aeaab0-e57e-4f7a-9356-db1878876d9f
>> stripe 2 devid 1 offset 83034112 # Contains file 
>> data-stripe-2
>> dev_uuid 637b3666-9d8f-4ec4-9969-53b0b933b9b1
>> thanks.
> 
> IIRC, the parity is spread across the disk stripes of the chunk.
> 
> So first you have to find the logical-offset [LO] where the the file begins. 
> Then you have to map this offset to the chunk which holds the data. The chunk 
> has the following info:
> - chunk start [CS], chunk length [CL]
> - for each stripe:
>   where the stripe starts
> 
> If you subtract the chunk-start from the logical-offset [ CO == LO-CS], you 
> will find the offset where the data belongs in the chunk.
> 
> As stated above, the PARITY is spread across the chunk stripes. So (supposing 
> that the stripe size is 64K, the raid level is 5, the disks are three), 
> 
> - the first 64k of stripe 0, is data [0..64K)
> - the first 64k of stripe 1, is data [64..128K)
> - the first 64k of stripe 2 is parity, 
> 
> - the 2nd 64k of stripe 0 is parity, 
> - the 2nd 64k of stripe 1, is data [128..196K)
> - the 2nd 64k of stripe 2, is data [192..256K)
> 
> - the 3rd 64k of stripe 0, is data [256..320K)
> - the 3rd 64k of stripe 1 is parity, 
> - the 3rd 64k of stripe 2, is data [320..384K)
> and so on,

I was wrong !
- the 3rd 64k of stripe 0, is data [320..384K)
- the 3rd 64k of stripe 1 is parity, 
- the 3rd 64k of stripe 2, is data [256..320K)

Basically stripe 0 and 2 were swapped. The idea is that after parity there is 
stripe 1, then stripe 2, and so on 

DDD
III
SSS
KKK
123

01P 
P23 
5P4 
67P 

Where 
  P = parity
  0...7 are the slices of data

So 'P' parity move from left to right starting from the last disk; the data are 
in the increasing address from left to right *starting* from parity in a 
circular buffer.

I am trying to implement raid5/6 in grub, so I went deep in the raid5 layout

BR
G.Baroncelli




> 
> To find the data, You have to compare the CO to the data [...) range.
> 
> If you look to an my old patch (unfinished :-( ), you can find some example 
> to dump the different stripe
> 
> [BTRFS-PROGS][PATCH][V2] Add two new commands: 'btrfs insp physical-find' and 
> 'btrfs insp physical-dump'
> 
> 
> 
> 
> 
> 
>> Cheers.
>> Lakshmipathi.G
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: push relocation recovery into a helper thread

2018-04-17 Thread Jeff Mahoney

On a file system with many snapshots and qgroups enabled, an interrupted
balance can end up taking a long time to mount due to recovering the
relocations during mount.  It does this in the task performing the mount,
which can't be interrupted and may prevent mounting (and systems booting)
for a long time as well.  The thing is that as part of balance, this
runs in the background all the time.  This patch pushes the recovery
into a helper thread and allows the mount to continue normally.  We hold
off on resuming any paused balance operation until after the relocation
has completed, disallow any new balance operations if it's running, and
wait for it on umount and remounting read-only.

This doesn't do anything to address the relocation recovery operation
taking a long time but does allow the file system to mount.

Signed-off-by: Jeff Mahoney 
---
 fs/btrfs/ctree.h  |7 +++
 fs/btrfs/disk-io.c|7 ++-
 fs/btrfs/relocation.c |   92 +-
 fs/btrfs/super.c  |5 +-
 fs/btrfs/volumes.c|6 +++
 5 files changed, 97 insertions(+), 20 deletions(-)

--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1052,6 +1052,10 @@ struct btrfs_fs_info {
struct btrfs_work qgroup_rescan_work;
bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */
 
+   /* relocation recovery items */
+   bool relocation_recovery_started;
+   struct completion relocation_recovery_completion;
+
/* filesystem state */
unsigned long fs_state;
 
@@ -3671,7 +3675,8 @@ int btrfs_init_reloc_root(struct btrfs_t
  struct btrfs_root *root);
 int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
-int btrfs_recover_relocation(struct btrfs_root *root);
+int btrfs_recover_relocation(struct btrfs_fs_info *fs_info);
+void btrfs_wait_for_relocation_completion(struct btrfs_fs_info *fs_info);
 int btrfs_reloc_clone_csums(struct inode *inode, u64 file_pos, u64 len);
 int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, struct extent_buffer *buf,
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2999,7 +2999,7 @@ retry_root_backup:
goto fail_qgroup;
 
mutex_lock(_info->cleaner_mutex);
-   ret = btrfs_recover_relocation(tree_root);
+   ret = btrfs_recover_relocation(fs_info);
mutex_unlock(_info->cleaner_mutex);
if (ret < 0) {
btrfs_warn(fs_info, "failed to recover relocation: %d",
@@ -3017,7 +3017,8 @@ retry_root_backup:
if (IS_ERR(fs_info->fs_root)) {
err = PTR_ERR(fs_info->fs_root);
btrfs_warn(fs_info, "failed to read fs tree: %d", err);
-   goto fail_qgroup;
+   close_ctree(fs_info);
+   return err;
}
 
if (sb_rdonly(sb))
@@ -3778,6 +3779,8 @@ void close_ctree(struct btrfs_fs_info *f
/* wait for the qgroup rescan worker to stop */
btrfs_qgroup_wait_for_completion(fs_info, false);
 
+   btrfs_wait_for_relocation_completion(fs_info);
+
/* wait for the uuid_scan task to finish */
down(_info->uuid_tree_rescan_sem);
/* avoid complains from lockdep et al., set sem back to initial state */
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -4492,14 +4493,61 @@ static noinline_for_stack int mark_garba
 }
 
 /*
- * recover relocation interrupted by system crash.
- *
  * this function resumes merging reloc trees with corresponding fs trees.
  * this is important for keeping the sharing of tree blocks
  */
-int btrfs_recover_relocation(struct btrfs_root *root)
+static int
+btrfs_resume_relocation(void *data)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct btrfs_fs_info *fs_info = data;
+   struct btrfs_trans_handle *trans;
+   struct reloc_control *rc = fs_info->reloc_ctl;
+   int err, ret;
+
+   btrfs_info(fs_info, "resuming relocation");
+
+   BUG_ON(!rc);
+
+   mutex_lock(_info->cleaner_mutex);
+
+   merge_reloc_roots(rc);
+
+   unset_reloc_control(rc);
+
+   trans = btrfs_join_transaction(rc->extent_root);
+   if (IS_ERR(trans))
+   err = PTR_ERR(trans);
+   else {
+   ret = btrfs_commit_transaction(trans);
+   if (ret < 0)
+   err = ret;
+   }
+
+   kfree(rc);
+
+   if (err == 0) {
+   struct btrfs_root *fs_root;
+
+   /* cleanup orphan inode in data relocation tree */
+   fs_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID);
+   if (IS_ERR(fs_root))
+   err = PTR_ERR(fs_root);
+

Re: [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance

On Mon, Apr 16, 2018 at 05:43:46PM +0800, Anand Jain wrote:
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > Balance cannot be started on a read-only filesystem and will have to
> > finish/exit before eg. going to read-only via remount. Cancelling does
> > not need to check for that.
> > 
> > In case the filesystem is forcibly set to read-only after an error,
> > balance will finish anyway and if the cancel call is too fast it will
> > just wait for that to happen. Again does not have to check.
> 
>   What if there is a power recycle and mounted as readonly
>   after the reboot?

So the balance item would be stored on disk from previous run, mount
will set it up but balance will not be resumed (sb_rdonly check in
open_ctree happens earlier).

Calling btrfs_cancel_balance by ioctl would reset the in-memory state
and also would want to delete the balance item on disk, so yes, the
check is needed.

I'll add a comment explaining why the read-only check is there instead.
Thanks for catching it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount

On Mon, Apr 16, 2018 at 02:10:08PM +0800, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > Replace a WARN_ON with a proper check and message in case something goes
> > really wrong and resumed balance cannot set up its exclusive status.
> 
> > The check is a user friendly assertion, I don't expect to ever happen
> > under normal circumstances.
> > 
> > Also document that the paused balance starts here and owns the exclusive
> > op status.
> 
> 
> > Signed-off-by: David Sterba 
> > ---
> >   fs/btrfs/volumes.c | 16 ++--
> >   1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index eb78c1d0ce2b..843982a2cbdb 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info 
> > *fs_info)
> > struct btrfs_key key;
> > int ret;
> >   
> > +   /*
> > +* This should never happen, as the paused balance state is recovered
> > +* during mount without any chance of other exclusive ops to collide.
> > +* Let it fail early and do not continue mount.
> > +*
> > +* Otherwise, this gives the exclusive op status to balance and keeps
> > +* in paused state until user intervention (cancel or umount).
> > +*/
> > +   if (test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags)) {
> > +   btrfs_err(fs_info,
> > +   "cannot set exclusive op status to resume balance");
> > +   return -EINVAL;
> > +   }
> >
> 
>   We need the test_and_set_bit(BTRFS_FS_EXCL_OP) only if we confirm that
>   there is a pending balance. Its better to test and set at the same
>   place as WARN_ON before.

Right.  And the patch is wrong, because we need to set up
fs_info::balance_ctl in all cases, otherwise unpausing balance would not
work as expected. The only change should be a better message and maybe
not even that, as it's just preparing the state but the exclusive op is
claimed later in btrfs_resume_balance_async.

I'll revisit how the error handling is done after resuming balance or
dev-replace, this is considered a hard failure (mount, rw remount) but I
don't think it's necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted

2018-04-17 Thread Liu Bo

On Tue, Apr 17, 2018 at 03:34:04PM +0200, David Sterba wrote:
> On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> > Currently if some fatal errors occur, like all IO get -EIO, resources
> > would be cleaned up when
> > a) transaction is being committed or
> > b) BTRFS_FS_STATE_ERROR is set
> > 
> > However, in some rare cases, resources may be left alone after transaction
> > gets aborted and umount may run into some ASSERT(), e.g.
> > ASSERT(list_empty(_group->dirty_list));
> > 
> > For case a), in btrfs_commit_transaciton(), there're several places at the
> > beginning where we just call btrfs_end_transaction() without cleaning up
> > resources.  For case b), it is possible that the trans handle doesn't have
> > any dirty stuff, then only trans hanlde is marked as aborted while
> > BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
> 
> I have some doubts here. The flag BTRFS_FS_STATE_TRANS_ABORTED was added
> to avoid duplicate warnings when the transaction is aborted for the
> first time. And nothing else.
> 
> BTRFS_FS_STATE_ERROR should track the overall state and is checked in
> several places if a post-error cleanup is necessary.
> 
> Calling abort should set BTRFS_FS_STATE_ERROR immediatelly so any caller
> up the callchaing can potentially cleanup.
> 
> > This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> > all resources won't stay in memory after umount.
> 
> The question is why BTRFS_FS_STATE_ERROR is not set but we still got
> here and the filesystem is in an error state.
> 
> You're not specific about the rare cases where the resources are left
> after abort and the umount is about to start. I think the rare cases
> need to be identified and understood otherwise this seems like papering
> over the problem.

Yes, that makes sense.

btrfs_commit_transaction()
   -> btrfs_run_delayed_refs() # the very 1st one
  -> __btrfs_run_delayed_refs() # bail out with errors e.g. -EIO, -ENOSPC
  -> btrfs_abort_transaction()
 -> # set BTRFS_FS_STATE_TRANS_ABORTED
 -> __btrfs_abort_transaction() # !trans->dirty and no new_bgs,
   then return without setting
   BTRFS_FS_STATA_ERROR

While trans handle is aborted, cur_trans is not aborted, so that other
threads would join this @cur_trans and it's likely that they would go
thru the same process.  Eventually umount is called,

close_ctree()
   -> btrfs_commit_super()
   -> join_transaction()
   -> btrfs_commit_transaction() # bail out due to the same reason
   -> ...  # btrfs_error_commit_super is not called due to STATE_ERROR not 
being set.


So as we can see, if btrfs_commit_transaction() enters COMMIT phrase,
then cleanup_transaction would handle the rest well, otherwise we'd
get this.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: add comment about BTRFS_FS_EXCL_OP

On Mon, Apr 16, 2018 at 06:28:31PM +0800, Anand Jain wrote:
> Adds comments about BTRFS_FS_EXCL_OP to existing comments
> about the device locks.

Thanks, a few comments below.

> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3a3912a9dec9..18306165ffaa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -202,6 +202,22 @@ static int __btrfs_map_block(struct btrfs_fs_info 
> *fs_info,
>   *   device_list_mutex
>   * chunk_mutex
>   *   balance_mutex
> + *
> + * BTRFS_FS_EXCL_OP
> + * 
> + * Maintains the exclusivity of the following device operations.
> + *
> + * - Balance (*)
> + * - Add

device add

> + * - Remove

device remove

> + * - Replace (*)

device replace

> + * - Resize
> + *
> + * The device operation with (*) indicate it can continue after a CLI pause
> + * (only balance can be paused using CLI) or remount or unmount-mount 
> sequence
> + * or a power-recycle, as long as the FS is still writeable. And during the

this would be more readable if is' a bullet list like above

> + * course of the pause, the BTRFS_FS_EXCL_OP remains set. BTRFS_FS_EXCL_OP 
> flag
> + * is set and cleared using atomic functions.

Please rephrase that without the CLI and use 'ioctl' instead, as this
is the interface.

>   */
>  
>  DEFINE_MUTEX(uuid_mutex);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/4] btrfs: Refactor btrfs_ioctl_snap_destroy() by using btrfs_delete_subvolume()

On Tue, Apr 17, 2018 at 09:21:16AM +0900, Misono Tomohiro wrote:
> On 2018/04/17 2:53, David Sterba wrote:
> > On Wed, Apr 11, 2018 at 02:20:49PM +0900, Misono Tomohiro wrote:
> >> Use btrfs_delete_subvolume() to refactor btrfs_ioctl_snap_destroy().
> >> The permission check is still done in btrfs_ioctl_snap_destroy(). Also,
> >> call of d_delete() is still required since btrfs_delete_subvolume()
> >> does not call it.
> >>
> >> As a result, btrfs_unlink_subvol() and may_destroy_subvol()
> >> become static functions. No functional change happens.
> >>
> >> Signed-off-by: Tomohiro Misono 
> > 
> > Why is this still split into two patches? Factoring out a function
> > should happen in one patch, ie 2 and 3 in one go. Do you have a reason
> > not to do it like that?
> 
> I thought this reduces code change in one patch and might be good, but
> I'm completely fine with merging 2nd and 3rd patches.

This makes the rewiew harder, if the code is moved the diff can be long
but we have both pieces to compare. Moving code is considered one
logical change and should not introduce any other changes, besides
renaming or formatting fixups.

> Should I send v5?

Yes please, this is not a trivial change and the changelogs need to be
adjusted. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[no subject]

2018-04-17 Thread Timo Nentwig


unsubscribe linux-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Filesystem full with unallocated space?

2018-04-17 Thread Matthew Lai


Hello!

I am getting ENOSPC on my root filesystem with plenty of unallocated 
space according to "fi usage". Any idea why that may be? This is a root 
partition for Debian Stable. Not doing anything unusual that I'm aware 
of. No snapshots.


Thanks!

Matthew

uname -a:
Linux bigfoot 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) 
x86_64 GNU/Linux


btrfs --version:
btrfs-progs v4.7.3

btrfs fi show:
Label: none  uuid: 2364c63f-e20c-410f-90b4-05f722ee1c77
    Total devices 1 FS bytes used 176.27GiB
    devid    1 size 246.33GiB used 185.01GiB path /dev/sda2

btrfs fi df /:
Data, single: total=183.00GiB, used=175.78GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=504.81MiB
GlobalReserve, single: total=211.39MiB, used=0.00B

dmesg (I had tried balancing - "balance start -dusage=95 /" completed 
successfully and didn't help):

62411.020348] usb 2-1.2.4: USB disconnect, device number 12
[62656.848845] usb 2-1.2: new high-speed USB device number 14 using 
ehci-pci
[62656.958066] usb 2-1.2: New USB device found, idVendor=0451, 
idProduct=8044
[62656.958071] usb 2-1.2: New USB device strings: Mfr=0, Product=0, 
SerialNumber=1

[62656.958073] usb 2-1.2: SerialNumber: A80108717AA4
[62656.958692] hub 2-1.2:1.0: USB hub found
[62656.958828] hub 2-1.2:1.0: 4 ports detected
[62657.244860] usb 2-1.2.2: new high-speed USB device number 15 using 
ehci-pci
[62657.360966] usb 2-1.2.2: New USB device found, idVendor=2109, 
idProduct=2813
[62657.360970] usb 2-1.2.2: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0

[62657.360972] usb 2-1.2.2: Product: USB2.0 Hub
[62657.360974] usb 2-1.2.2: Manufacturer: VIA Labs, Inc.
[62657.361955] hub 2-1.2.2:1.0: USB hub found
[62657.362734] hub 2-1.2.2:1.0: 4 ports detected
[62657.456890] usb 2-1.2.3: new high-speed USB device number 16 using 
ehci-pci
[62657.692872] usb 2-1.2.2.1: new full-speed USB device number 17 using 
ehci-pci
[62657.815204] usb 2-1.2.2.1: New USB device found, idVendor=1050, 
idProduct=0211
[62657.815209] usb 2-1.2.2.1: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0

[62657.815227] usb 2-1.2.2.1: Product: Yubico WinUSB Gnubby (gnubby1)
[62657.815228] usb 2-1.2.2.1: Manufacturer: Yubico
[62657.908881] usb 2-1.2.2.2: new full-speed USB device number 18 using 
ehci-pci
[62658.032771] usb 2-1.2.2.2: New USB device found, idVendor=046d, 
idProduct=c52b
[62658.032775] usb 2-1.2.2.2: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0

[62658.032778] usb 2-1.2.2.2: Product: USB Receiver
[62658.032780] usb 2-1.2.2.2: Manufacturer: Logitech
[62658.040715] logitech-djreceiver 0003:046D:C52B.000D: hiddev0,hidraw4: 
USB HID v1.11 Device [Logitech USB Receiver] on 
usb-:00:1d.0-1.2.2.2/input2
[62658.188708] input: Logitech M705 as 
/devices/pci:00/:00:1d.0/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2.2/2-1.2.2.2:1.2/0003:046D:C52B.000D/0003:046D:101B.000E/input/input15
[62658.189329] logitech-hidpp-device 0003:046D:101B.000E: input,hidraw5: 
USB HID v1.11 Mouse [Logitech M705] on usb-:00:1d.0-1.2.2.2:2
[62658.248907] usb 2-1.2.2.3: new full-speed USB device number 19 using 
ehci-pci
[62658.370856] usb 2-1.2.2.3: New USB device found, idVendor=2a03, 
idProduct=0042
[62658.370859] usb 2-1.2.2.3: New USB device strings: Mfr=1, Product=2, 
SerialNumber=220

[62658.370861] usb 2-1.2.2.3: Product: Arduino Mega
[62658.370862] usb 2-1.2.2.3: Manufacturer: Arduino Srl
[62658.370863] usb 2-1.2.2.3: SerialNumber: 75533353437351C07182
[62658.371497] cdc_acm 2-1.2.2.3:1.0: ttyACM0: USB ACM device
[62659.062915] input: Logitech K270 as 
/devices/pci:00/:00:1d.0/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2.2/2-1.2.2.2:1.2/0003:046D:C52B.000D/0003:046D:4003.000F/input/input16
[62659.063686] logitech-hidpp-device 0003:046D:4003.000F: input,hidraw6: 
USB HID v1.11 Keyboard [Logitech K270] on usb-:00:1d.0-1.2.2.2:3
[62659.807183] usb 2-1.2.3: New USB device found, idVendor=046d, 
idProduct=082d
[62659.807188] usb 2-1.2.3: New USB device strings: Mfr=0, Product=2, 
SerialNumber=1

[62659.807190] usb 2-1.2.3: Product: HD Pro Webcam C920
[62659.807192] usb 2-1.2.3: SerialNumber: C034F41F
[62659.807647] uvcvideo: Found UVC 1.00 device HD Pro Webcam C920 
(046d:082d)
[62659.808304] uvcvideo 2-1.2.3:1.0: Entity type for entity Processing 3 
was not initialized!
[62659.808308] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 6 
was not initialized!
[62659.808311] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 12 
was not initialized!
[62659.808314] uvcvideo 2-1.2.3:1.0: Entity type for entity Camera 1 was 
not initialized!
[62659.808316] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 8 
was not initialized!
[62659.808319] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 9 
was not initialized!
[62659.808321] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 10 
was not initialized!
[62659.808324] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 11 
was not initialized!
[62659.808551] input:

Re: [PATCH] btrfs: Fix wrong btrfs_delalloc_release_extents parameter

On Tue, Apr 17, 2018 at 06:43:58PM +0800, Qu Wenruo wrote:
> Commit 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type
> for delalloc") merged into mainline is not the latest version submitted
> to mail list in Dec 2017.
> 
> It has a fatal wrong @qgroup_free parameter, which results increasing
> qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT.
> 
> Fix it by applying the correct diff on top of current branch.
> 
> Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for 
> delalloc")
> Signed-off-by: Qu Wenruo 

Added to 4.17 queue, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: delayed-inode: Remove wrong qgroup meta reservation calls

On Tue, Apr 17, 2018 at 04:52:45PM +0800, Qu Wenruo wrote:
> Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for
> delayed inode and item") merged into mainline is not the updated version
> submitted to the mail list in Dec 2017.
> 
> Which lacks the following fixes:
> 
> 1) Remove btrfs_qgroup_convert_reserved_meta() call in
>btrfs_delayed_item_release_metadata()
> 2) Remove btrfs_qgroup_reserve_meta_prealloc() call in
>btrfs_delayed_inode_reserve_metadata()
> 
> Those fixes will resolve unexpected EDQUOT problems.
> 
> Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for 
> delayed inode and item")
> Signed-off-by: Qu Wenruo 

Added to 4.17 queue, thanks.

> @@ -569,6 +569,12 @@ static int btrfs_delayed_item_reserve_metadata(struct 
> btrfs_trans_handle *trans,
>   dst_rsv = _info->delayed_block_rsv;
>  
>   num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> +
> + /*
> +  * Here we migrate space rsv from transaction rsv, since have
> +  * already reserved space when starting a transaction.
> +  * So no need to reserve qgroup space here.
> +  */

Please format the comments to the full line width.

> @@ -647,7 +657,9 @@ static int btrfs_delayed_inode_reserve_metadata(
> "delayed_inode",
> btrfs_ino(inode),
> num_bytes, 1);
> - }
> + } else
> + btrfs_qgroup_free_meta_prealloc(root,
> + fs_info->nodesize);

Please don't diverge from the coding style, I'm fixing such issues but,
you know.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fstests: btrfs/130 make it workable on small systems

This test case takes a long time to complete at the default
LOAD_FACTOR=1, so reduce the nr_extents to 256, so for larger
systems it can still use higher LOAD_FACTOR.

Signed-off-by: Anand Jain 
---
 tests/btrfs/130 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/130 b/tests/btrfs/130
index 96c8f9d9d526..05f92c11a998 100755
--- a/tests/btrfs/130
+++ b/tests/btrfs/130
@@ -60,7 +60,7 @@ _require_scratch_reflink
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 
-nr_extents=$((4096 * $LOAD_FACTOR))
+nr_extents=$((256 * $LOAD_FACTOR))
 
 # Use 128K blocksize, the default value of both deduperemove or
 # inband dedupe
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] fstests: btrfs/130 fix Invalid argument

btrfs-progs patch[1] replaced read(2) write(2) with splice(2) and caused the
append-redirect to stop working.

Before:
 btrfs send /btrfs/ro_send > /dev/null
 At subvol /btrfs/ro_snap

 btrfs send /btrfs/ro_send >> /dev/null
 At subvol /btrfs/ro_snap

After:
 btrfs send /btrfs/ro_send > /dev/null
 At subvol /btrfs/ro_snap

 btrfs send /btrfs/ro_send >> /dev/null
 At subvol /btrfs/ro_snap
 ERROR: failed to read stream from kernel: Invalid argument

Further in the test case the line..
btrfs/130
  ::
 _run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1

which intended to redirect send output to /dev/null, but ended up append
redirect to the $seqres.full file. And so this test case failed as
'Invalid argument' for sometime now.

Still as append of a btrfs send output doesn't make sense, so fix the fstests.

Also adds logs going into $seqres.full.

[1]
ba23855cdc8961bbaef1fcad4854d494cdb3afd3
 btrfs-progs: send: use splice syscall instead of read/write to transfer buffer

Signed-off-by: Anand Jain 
---
 tests/btrfs/130 | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/btrfs/130 b/tests/btrfs/130
index edb7397a1ceb..96c8f9d9d526 100755
--- a/tests/btrfs/130
+++ b/tests/btrfs/130
@@ -67,12 +67,14 @@ nr_extents=$((4096 * $LOAD_FACTOR))
 blocksize=$((128 * 1024))
 file=$SCRATCH_MNT/foobar
 
+echo LOAD_FACTOR=$LOAD_FACTOR nr_extents=$nr_extents blocksize=$blocksize >> 
$seqres.full
+
 # create the initial file, whose file extents are all point to one extent
 _pwrite_byte 0xcdcdcdcd 0 $blocksize  $file | _filter_xfs_io
 
 for i in $(seq 1 $(($nr_extents - 1))); do
_reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \
-   > /dev/null 2>&1
+   >> $seqres.full 2>&1
 done
 
 # create a RO snapshot, so we can send out the snapshot
@@ -82,7 +84,8 @@ _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT 
$SCRATCH_MNT/ro_snap
 # 1) OOM since memory is allocated inside a O(n^3) loop
 # 2) Softlock since time consuming backref walk is called without scheduling.
 # the send destination is not important, just send will cause the problem
-_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1
+echo "# $BTRFS_UTIL_PROG send $SCRATCH_MNT/ro_snap > /dev/null" >> $seqres.full
+$BTRFS_UTIL_PROG send $SCRATCH_MNT/ro_snap > /dev/null 2>>$seqres.full
 
 # success, all done
 status=0
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Do super block verification before writing it to disk



On 2018年04月17日 22:32, Anand Jain wrote:
> 
> 
> On 04/17/2018 05:58 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年04月17日 17:05, Anand Jain wrote:
>>>
 v3:
     Update commit message to show the corruption in details.
     Modify the kernel error message to show corruption is detected
 before
     transaction commitment.
>>>   Nice. Thanks. more below.
>>>
 @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device
 *device,
      btrfs_set_super_bytenr(sb, bytenr);
    +    /* check the validation of the primary sb before writing */
 +    if (i == 0) {
 +    ret = btrfs_check_super_valid(device->fs_info, sb);
 +    if (ret) {
 +    btrfs_err(device->fs_info,
 +"superblock corruption detected before transaction commitment for
 device %llu",
 +  device->devid);
 +    return -EUCLEAN;
 +    }
>>>
>>>   Why not move this entire check further below, after we have the ready
>>>   crc and use btrfs_check_super_csum(), instead of
>>>   btrfs_check_super_valid()? so that we verify only what is known to be
>>>   corrupted that is ..
>>
>> The problem is, we don't know the cause yet, so we must check the whole
>> superblock.
>>
>> For example, if the corruption is caused by some wild pointer of other
>> kernel module, and we're just unlucky that one day it corrupts nodesize,
>> then we can't detect it if we only check certain members.
> 
> Right I notice that.
> 
> But without btrfs_check_super_csum(), it leaves out checking one of the
> member (csum_type) which is know to be corrupted at the two instances,
> so it can also include btrfs_check_super_csum().
> 
> There were two cases, both of them corrupted the same offset, its not
> just a coincidence that both of these reported corrupted the same
> offset.

Yep, but since we're here to do extra verification, checking everything
is never a bad idea. By this we don't need to bother checking other
members when new corruption pops out.

> 
> Also the incompatible features flags (169) are still valid in both the
> cases. It looks as if we wrote u32 to a u64. I notice that we provide
> the options to write the incompatible flags through mount option, sysfs
> and ioctl.

While I don't think that's the cause of these reported corruption.
I'd prefer some under/over flow of memory which corrupted
fs_info->super_copy somehow. It may be btrfs or it may not.

It's pretty hard to determine with just 2 reports.
Especially for ben's report, he is using latest vega graphics IIRC, who
knows what could went wrong with latest amd drm codes.

> 
> 
>>> btrfs_super_block {
>>> ::
>>>  __le64 incompat_flags;
>>>  __le16 csum_type;
>>> ::
>>> }
>>>
>>>   And also can you dump contents of incompat_flags and csum_type at both
>>>     fs_info->super_copy
>>>   and
>>>     fs_info->super_for_commit
>>
>> Not really needed, as when corruption happens, it's super_copy
>> corrupted, not something went wrong after we called memcpy()
> 
> As shown below, we aren't memcpy()-ing in the btrfs_sync_log() thread,
> did you check if btrfs_sync_log() can not be the last person to write
> at umount?

I checked the dump super output, where log_tree output is all 0, means
no log tree, hence not btrfs_sync_log() caused the problem.

From Ben's:
--
chunk_root  5518540881920
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
--

And from Ken's
--
chunk_root21004288
chunk_root_level1
log_root0
log_root_transid0
log_root_level0
--

So at least for current only reports, it's not btrfs_sync_log() causing
the problem.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>
>>>
>>>   Because at each commit transaction we
>>>
>>> btrfs_commit_transaction()
>>> {
>>>    ::
>>>  memcpy(fs_info->super_for_commit, fs_info->super_copy,
>>>     sizeof(*fs_info->super_copy));
>>>    ::
>>>  ret = write_all_supers(fs_info, 0);
>>>
>>> }
>>>
>>>   And also the sync log can write the
>>>
>>> btrfs_sync_log()
>>> {
>>> ::
>>>  ret = write_all_supers(fs_info, 1);
>>>
>>>
>>>   Finally locks between these two threads needs a review as well.
>>>
>>> Thanks, Anand
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] btrfs: Do super block verification before writing it to disk




On 04/17/2018 05:58 PM, Qu Wenruo wrote:



On 2018年04月17日 17:05, Anand Jain wrote:



v3:
    Update commit message to show the corruption in details.
    Modify the kernel error message to show corruption is detected before
    transaction commitment.

  Nice. Thanks. more below.


@@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device
*device,
     btrfs_set_super_bytenr(sb, bytenr);
   +    /* check the validation of the primary sb before writing */
+    if (i == 0) {
+    ret = btrfs_check_super_valid(device->fs_info, sb);
+    if (ret) {
+    btrfs_err(device->fs_info,
+"superblock corruption detected before transaction commitment for
device %llu",
+  device->devid);
+    return -EUCLEAN;
+    }


  Why not move this entire check further below, after we have the ready
  crc and use btrfs_check_super_csum(), instead of
  btrfs_check_super_valid()? so that we verify only what is known to be
  corrupted that is ..


The problem is, we don't know the cause yet, so we must check the whole
superblock.

For example, if the corruption is caused by some wild pointer of other
kernel module, and we're just unlucky that one day it corrupts nodesize,
then we can't detect it if we only check certain members.


Right I notice that.

But without btrfs_check_super_csum(), it leaves out checking one of the
member (csum_type) which is know to be corrupted at the two instances,
so it can also include btrfs_check_super_csum().

There were two cases, both of them corrupted the same offset, its not
just a coincidence that both of these reported corrupted the same
offset.

Also the incompatible features flags (169) are still valid in both the
cases. It looks as if we wrote u32 to a u64. I notice that we provide
the options to write the incompatible flags through mount option, sysfs
and ioctl.



btrfs_super_block {
::
     __le64 incompat_flags;
     __le16 csum_type;
::
}

  And also can you dump contents of incompat_flags and csum_type at both
    fs_info->super_copy
  and
    fs_info->super_for_commit


Not really needed, as when corruption happens, it's super_copy
corrupted, not something went wrong after we called memcpy()


As shown below, we aren't memcpy()-ing in the btrfs_sync_log() thread,
did you check if btrfs_sync_log() can not be the last person to write
at umount?

Thanks, Anand


Thanks,
Qu



  Because at each commit transaction we

btrfs_commit_transaction()
{
   ::
     memcpy(fs_info->super_for_commit, fs_info->super_copy,
    sizeof(*fs_info->super_copy));
   ::
     ret = write_all_supers(fs_info, 0);

}

  And also the sync log can write the

btrfs_sync_log()
{
::
     ret = write_all_supers(fs_info, 1);


  Finally locks between these two threads needs a review as well.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted

On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> Currently if some fatal errors occur, like all IO get -EIO, resources
> would be cleaned up when
> a) transaction is being committed or
> b) BTRFS_FS_STATE_ERROR is set
> 
> However, in some rare cases, resources may be left alone after transaction
> gets aborted and umount may run into some ASSERT(), e.g.
> ASSERT(list_empty(_group->dirty_list));
> 
> For case a), in btrfs_commit_transaciton(), there're several places at the
> beginning where we just call btrfs_end_transaction() without cleaning up
> resources.  For case b), it is possible that the trans handle doesn't have
> any dirty stuff, then only trans hanlde is marked as aborted while
> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.

I have some doubts here. The flag BTRFS_FS_STATE_TRANS_ABORTED was added
to avoid duplicate warnings when the transaction is aborted for the
first time. And nothing else.

BTRFS_FS_STATE_ERROR should track the overall state and is checked in
several places if a post-error cleanup is necessary.

Calling abort should set BTRFS_FS_STATE_ERROR immediatelly so any caller
up the callchaing can potentially cleanup.

> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> all resources won't stay in memory after umount.

The question is why BTRFS_FS_STATE_ERROR is not set but we still got
here and the filesystem is in an error state.

You're not specific about the rare cases where the resources are left
after abort and the umount is about to start. I think the rare cases
need to be identified and understood otherwise this seems like papering
over the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Add udev-md-raid-safe-timeouts.rules

2018-04-17 Thread Austin S. Hemmelgarn

On 2018-04-16 13:10, Chris Murphy wrote:

Adding linux-usb@ and linux-scsi@
(This email does contain the thread initiating email, but some replies
are on the other lists.)

On Mon, Apr 16, 2018 at 5:43 AM, Austin S. Hemmelgarn
wrote:

On 2018-04-15 21:04, Chris Murphy wrote:

I just ran into this:

https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec

This solution is inadequate, can it be made more generic? This isn't
an md specific problem, it affects Btrfs and LVM as well. And in fact
raid0, and even none raid setups.

There is no good reason to prevent deep recovery, which is what
happens with the default command timer of 30 seconds, with this class
of drive. Basically that value is going to cause data loss for the
single device and also raid0 case, where the reset happens before deep
recovery has a chance. And even if deep recovery fails to return user
data, what we need to see is the proper error message: read error UNC,
rather than a link reset message which just obfuscates the problem.

This has been discussed at least once here before (probably more times, hard
to be sure since it usually comes up as a side discussion in an only
marginally related thread). Last I knew, the consensus here was that it
needs to be changed upstream in the kernel, not by adding a udev rule
because while the value is technically system policy, the default policy is
brain-dead for anything but the original disks it was i9ntended for (30
seconds works perfectly fine for actual SCSI devices because they behave
sanely in the face of media errors, but it's horribly inadequate for ATA
devices).

To re-iterate what I've said before on the subject:

For ATA drives it should probably be 150 seconds. That's 30 seconds beyond
the typical amount of time most consumer drives will keep retrying a sector,
so even if it goes the full time to try and recover a sector this shouldn't
trigger. The only people this change should negatively impact are those who
have failing drives which support SCT ERC and have it enabled, but aren't
already adjusting this timeout.

For physical SCSI devices, it should continue to be 30 seconds. SCSI disks
are sensible here and don't waste your time trying to recover a sector. For
PV-SCSI devices, it should probably be adjusted too, but I don't know what a
reasonable value is.

For USB devices it should probably be higher than 30 seconds, but again I
have no idea what a reasonable value is.

I don't know how all of this is designed but it seems like there's
only one location for the command timer, and the SCSI driver owns it,
and then everyone else (ATA and USB and for all I know SAN) are on top
of that and lack any ability to have separate timeouts.
On the note of SAN, iSCSI is part of the SCSI subsystem, so it gets
applied directly there. I'm pretty sure NBD has it's own thing, and I
think the same is true of ATAoE.

As far as USB, UMS is essentially a stripped down version of SCSI with
it's own limitations, and UAS _is_ SCSI, with both of those having
pretty much always been routed through the SCSI subsystem.

The nice thing about the udev rule is that it tests for SCT ERC before
making a change. There certainly are enterprise and almost enterprise
"NAS" SATA drives that have short SCT ERC times enabled out of the box
- and the udev method makes them immune to the change.
The kernel could just as easily look for that too though. From what
I've seen however, other failure sources that wouldn't trigger SCT ERC
on SATA drives are really rare, usually it means a bad cable, bad drive
electronics, or a bad storage controller, so i don't think having it set
really high for SCT ERC enabled drives is likely to be much of an issue
most of the time.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Add udev-md-raid-safe-timeouts.rules

2018-04-17 Thread Austin S. Hemmelgarn

On 2018-04-16 11:02, Wol's lists wrote:

On 16/04/18 12:43, Austin S. Hemmelgarn wrote:

On 2018-04-15 21:04, Chris Murphy wrote:

I just ran into this:
https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec

This solution is inadequate, can it be made more generic? This isn't
an md specific problem, it affects Btrfs and LVM as well. And in fact
raid0, and even none raid setups.

This has been discussed at least once here before (probably more
times, hard to be sure since it usually comes up as a side discussion
in an only marginally related thread).

Sorry, but where is "here"? This message is cross-posted to about three
lists at least ...
Oops, didn't see the extra lists listed. In this case, discussed
previously on the BTRFS ML.

Last I knew, the consensus here was
that it needs to be changed upstream in the kernel, not by adding a
udev rule because while the value is technically system policy, the
default policy is brain-dead for anything but the original disks it
was i9ntended for (30 seconds works perfectly fine for actual SCSI
devices because they behave sanely in the face of media errors, but
it's horribly inadequate for ATA devices).

To re-iterate what I've said before on the subject:

imho (and it's probably going to be a pain to implement :-) there should
be a soft time-out and a hard time-out. The soft time-out should trigger
"drive is taking too long to respond" messages that end up in a log - so
that people who actually care can keep a track of this sort of thing.
The hard timeout should be the current set-up, where the kernel just
gives up.
Agreed, although as pointed out by Roger in his reply to this, it kind
of already works this way in some cases.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Fix wrong btrfs_delalloc_release_extents parameter

Commit 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type
for delalloc") merged into mainline is not the latest version submitted
to mail list in Dec 2017.

It has a fatal wrong @qgroup_free parameter, which results increasing
qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT.

Fix it by applying the correct diff on top of current branch.

Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for 
delalloc")
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index dd1b79750d6a..9701aa7361f3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1763,7 +1763,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb 
*iocb,
unlock_extent_cached(_I(inode)->io_tree,
 lockstart, lockend, _state);
btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes,
-  (ret != 0));
+  true);
if (ret) {
btrfs_drop_pages(pages, num_pages);
break;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel unaligned access at ... btrfs_real_readdir+0x51c/0x718 [btrfs]

2018-04-17 Thread René Rebe

Hi,

On 16 Apr 2018, at 21:15, David Sterba  wrote:

> On Mon, Apr 16, 2018 at 09:55:45PM +0200, René Rebe wrote:
>> Hi,
>> 
>> On 04/16/2018 06:48 PM, David Sterba wrote:
>>> The warnings are valid, there's unaligned access introduced by patch
>>> 
>>> 23b5ec74943f44378b68c0edd8e210a86318ea5e
>>> btrfs: fix readdir deadlock with pagefault
>>> 
>>> The directory entries (struct dir_entry) are copied to a temporary
>>> buffer as they fit, ie. no alignment, and the members accessed in
>>> several places.
>>> 
>>> The following patch adds the proper unaligned access, only compile-tested.
>>> Please test and let me know, thanks!
>> Would have loved to immediately give it a try, however, sorry,
>> I forgot to mention I'm on the latest stable release -4.16.2-
>> on a first glance this does not look like it does just apply.
>> 
>> I would re-base myself if I would not also have a glibc initialization 
>> bug to hunt and debug, too :-/
>> 
>> If you happen to also rebase it for current -stable, ... ;-)
> 
> Sure, attached a 4.16.2 version.
> <0001-test-readdir-unaligned-access.patch>

Cool, thanks, built and so far it works, the warnings are gone.

https://www.youtube.com/watch?v=XYsKct4T2xk

Greetings form Berlin,
René

-- 
 ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | 
http://t2-project.org | http://rene.rebe.de

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Do super block verification before writing it to disk



On 2018年04月17日 17:05, Anand Jain wrote:
> 
>> v3:
>>    Update commit message to show the corruption in details.
>>    Modify the kernel error message to show corruption is detected before
>>    transaction commitment.
>  Nice. Thanks. more below.
> 
>> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device
>> *device,
>>     btrfs_set_super_bytenr(sb, bytenr);
>>   +    /* check the validation of the primary sb before writing */
>> +    if (i == 0) {
>> +    ret = btrfs_check_super_valid(device->fs_info, sb);
>> +    if (ret) {
>> +    btrfs_err(device->fs_info,
>> +"superblock corruption detected before transaction commitment for
>> device %llu",
>> +  device->devid);
>> +    return -EUCLEAN;
>> +    }
> 
>  Why not move this entire check further below, after we have the ready
>  crc and use btrfs_check_super_csum(), instead of
>  btrfs_check_super_valid()? so that we verify only what is known to be
>  corrupted that is ..

The problem is, we don't know the cause yet, so we must check the whole
superblock.

For example, if the corruption is caused by some wild pointer of other
kernel module, and we're just unlucky that one day it corrupts nodesize,
then we can't detect it if we only check certain members.

> 
> btrfs_super_block {
> ::
>     __le64 incompat_flags;
>     __le16 csum_type;
> ::
> }
> 
>  And also can you dump contents of incompat_flags and csum_type at both
>    fs_info->super_copy
>  and
>    fs_info->super_for_commit

Not really needed, as when corruption happens, it's super_copy
corrupted, not something went wrong after we called memcpy()

Thanks,
Qu

> 
>  Because at each commit transaction we
> 
> btrfs_commit_transaction()
> {
>   ::
>     memcpy(fs_info->super_for_commit, fs_info->super_copy,
>    sizeof(*fs_info->super_copy));
>   ::
>     ret = write_all_supers(fs_info, 0);
> 
> }
> 
>  And also the sync log can write the
> 
> btrfs_sync_log()
> {
> ::
>     ret = write_all_supers(fs_info, 1);
> 
> 
>  Finally locks between these two threads needs a review as well.
> 
> Thanks, Anand
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] btrfs: Do super block verification before writing it to disk




v3:
   Update commit message to show the corruption in details.
   Modify the kernel error message to show corruption is detected before
   transaction commitment.

 Nice. Thanks. more below.


@@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device,
  
  		btrfs_set_super_bytenr(sb, bytenr);
  
+		/* check the validation of the primary sb before writing */

+   if (i == 0) {
+   ret = btrfs_check_super_valid(device->fs_info, sb);
+   if (ret) {
+   btrfs_err(device->fs_info,
+"superblock corruption detected before transaction commitment for device %llu",
+ device->devid);
+   return -EUCLEAN;
+   }


 Why not move this entire check further below, after we have the ready
 crc and use btrfs_check_super_csum(), instead of
 btrfs_check_super_valid()? so that we verify only what is known to be
 corrupted that is ..

btrfs_super_block {
::
__le64 incompat_flags;
__le16 csum_type;
::
}

 And also can you dump contents of incompat_flags and csum_type at both
   fs_info->super_copy
 and
   fs_info->super_for_commit

 Because at each commit transaction we

btrfs_commit_transaction()
{
  ::
memcpy(fs_info->super_for_commit, fs_info->super_copy,
   sizeof(*fs_info->super_copy));
  ::
ret = write_all_supers(fs_info, 0);

}

 And also the sync log can write the

btrfs_sync_log()
{
::
ret = write_all_supers(fs_info, 1);


 Finally locks between these two threads needs a review as well.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: delayed-inode: Remove wrong qgroup meta reservation calls

Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for
delayed inode and item") merged into mainline is not the updated version
submitted to the mail list in Dec 2017.

Which lacks the following fixes:

1) Remove btrfs_qgroup_convert_reserved_meta() call in
   btrfs_delayed_item_release_metadata()
2) Remove btrfs_qgroup_reserve_meta_prealloc() call in
   btrfs_delayed_inode_reserve_metadata()

Those fixes will resolve unexpected EDQUOT problems.

Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed 
inode and item")
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/delayed-inode.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 86ec2edc05e8..ac4fa7218c57 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -569,6 +569,12 @@ static int btrfs_delayed_item_reserve_metadata(struct 
btrfs_trans_handle *trans,
dst_rsv = _info->delayed_block_rsv;
 
num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
+
+   /*
+* Here we migrate space rsv from transaction rsv, since have
+* already reserved space when starting a transaction.
+* So no need to reserve qgroup space here.
+*/
ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
if (!ret) {
trace_btrfs_space_reservation(fs_info, "delayed_item",
@@ -590,7 +596,10 @@ static void btrfs_delayed_item_release_metadata(struct 
btrfs_root *root,
return;
 
rsv = _info->delayed_block_rsv;
-   btrfs_qgroup_convert_reserved_meta(root, item->bytes_reserved);
+   /*
+* Check btrfs_delayed_item_reserve_metadata() to see why we don't need
+* to release/reserve qgroup space.
+*/
trace_btrfs_space_reservation(fs_info, "delayed_item",
  item->key.objectid, item->bytes_reserved,
  0);
@@ -615,9 +624,6 @@ static int btrfs_delayed_inode_reserve_metadata(
 
num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
 
-   ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
-   if (ret < 0)
-   return ret;
/*
 * btrfs_dirty_inode will update the inode under btrfs_join_transaction
 * which doesn't reserve space for speed.  This is a problem since we
@@ -629,6 +635,10 @@ static int btrfs_delayed_inode_reserve_metadata(
 */
if (!src_rsv || (!trans->bytes_reserved &&
 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
+   ret = btrfs_qgroup_reserve_meta_prealloc(root,
+   fs_info->nodesize, true);
+   if (ret < 0)
+   return ret;
ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
  BTRFS_RESERVE_NO_FLUSH);
/*
@@ -647,7 +657,9 @@ static int btrfs_delayed_inode_reserve_metadata(
  "delayed_inode",
  btrfs_ino(inode),
  num_bytes, 1);
-   }
+   } else
+   btrfs_qgroup_free_meta_prealloc(root,
+   fs_info->nodesize);
return ret;
}
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: libbtrfsutil: python: use top=0 as default value in SubvolumeIterator